This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: V4 [PATCH] x86: Add <sys/platform/x86.h>

From: "H.J. Lu" <hjl dot tools at gmail dot com>
To: Florian Weimer <fweimer at redhat dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Sat, 22 Sep 2018 09:21:54 -0700
Subject: Re: V4 [PATCH] x86: Add <sys/platform/x86.h>
References: <20180919131313.GA7503@intel.com> <87o9csflaf.fsf@oldenburg.str.redhat.com> <CAMe9rOoErqoq5b_csxptqyaiDdRxbKZWQUA+NcX9mLLeqyPjNQ@mail.gmail.com> <87o9cse3us.fsf@oldenburg.str.redhat.com> <CAMe9rOovshZ=MmTubNWf+PNOfu=qN74wDpv-V-VucE2meVzKVQ@mail.gmail.com> <87d0t8e2f6.fsf@oldenburg.str.redhat.com>

On Thu, Sep 20, 2018 at 6:14 AM, Florian Weimer <fweimer@redhat.com> wrote:
> * H. J. Lu:
>
>> On Thu, Sep 20, 2018 at 5:43 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> * H. J. Lu:
>>>
>>>> On Thu, Sep 20, 2018 at 4:41 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>>>> * H. J. Lu:
>>>>>
>>>>>> Add <sys/platform/x86.h> to provide an API to access x86 specific platform
>>>>>> features.  It makes raw and derived CPUID information available to
>>>>>> applications so that programmers can avoid complicated steps to retrieve
>>>>>> the same information with CPUID instructions.
>>>>>
>>>>> Sorry, I still think that this patch semantically depends on delayed
>>>>> IFUNC processing.
>>
>> I don't believe this is an issue at all since new functions are in
>> ld.so.  Can you show this is a real issue with a testcase? I can
>> add it to my test.
>
> It's the relocation for x86_get_arch_feature in the IFUNC resolver I'm
> worried about.  Without delayed processing, the IFUNC resolver may be
> called at any time, and the relocation for x86_get_arch_feature may not
> have been processed yet.  So it's not about calling into an unrelocated
> or uninitialized ld.so (which I agree would not happen), but the call
> itself going wrong because it's not been properly relocated yet.
>
> Usually, dependency order as established by DT_NEEDED entries will
> ensure that IFUNC resolvers are only called in already-relocated
> objects, but LD_PRELOAD, symbol interposition, or missing DT_NEEDED
> entries can invalidate that assumption and make the dependency ordering
> insufficient.
>
> I hope this makes it clear what I think will break when this feature is
> used from IFUNC resolvers.

I don't think this is an issue.  I wrote a testcase to use <sys/platform/x86.h>
macros in IFUNC resolver in a LD_PRELOAD library:

https://github.com/hjl-tools/glibc/commit/94c992985919cfd5f5f7db431a6149376ab3d587

Is that what you have in mind?  If not, can you show me with a testcase?

>>>>> Furthermore, I strongly prefer a simplified interface, like:
>>>>>
>>>>>   enum {
>>>>>     X86_FEATURE_MMX,
>>>>>     X86_FEATURE_SSE,
>>>>>     X86_FEATURE_SSE2,
>>>>>     …
>>>>>     X86_FEATURE_AVX512F,
>>>>>     …
>>>>>   };
>>>>>
>>>>>   _Bool x86_feature_usable (int index) __attribute__ ((pure));
>>>>>
>>>>> Applications could then do something like this:
>>>>>
>>>>>   if (x86_feature_usable (X86_FEATURE_AVX512F))
>>>>>     return &__matmul_avx512f;
>>>>>
>>>>> I don't think there is value in exposing the CPUID data directly.  If
>>>>> programmers need it, they can use <cpuid.h> from GCC instead, without
>>>>> needing to worry how glibc populates its data structures.
>>
>> There are several advantages of my approach:
>>
>> 1. When a new feature is added to previously reserved bit, the
>> program using the new feature will work with the older libc.so
>> whose CPUID data has the reserved bit.
>
> It depends on the CPUID feature bit whether that's a good idea.  For the
> AVX512F, that would be wrong because it the CPUID data does not tell you
> whether the feature is actually usable.
>
> There is no ABI impact for future feature flags, so we can backport them
> easily enough.
>
>> 2. We only need to add 2 small functions:
>>
>> (gdb) disass x86_get_arch_feature
>> Dump of assembler code for function x86_get_arch_feature:
>>    0x0000000000017100 <+0>: xor    %eax,%eax
>>    0x0000000000017102 <+2>: cmp    $0x2,%edi
>>    0x0000000000017105 <+5>: ja     0x17117 <x86_get_arch_feature+23>
>>    0x0000000000017107 <+7>: mov    %edi,%edi
>>    0x0000000000017109 <+9>: lea    0x105f0(%rip),%rax        # 0x27700
>> <_rtld_local_ro>
>>    0x0000000000017110 <+16>: mov    0xb4(%rax,%rdi,4),%eax
>>    0x0000000000017117 <+23>: retq
>> End of assembler dump.
>> (gdb) disass x86_get_cpuid_registers
>> Dump of assembler code for function x86_get_cpuid_registers:
>>    0x00000000000170d0 <+0>: lea    0xc479(%rip),%rax        # 0x23550
>> <zero_cpuid_registers.8564>
>>    0x00000000000170d7 <+7>: cmp    $0x2,%edi
>>    0x00000000000170da <+10>: ja     0x170f0 <x86_get_cpuid_registers+32>
>>    0x00000000000170dc <+12>: mov    %edi,%edi
>>    0x00000000000170de <+14>: lea    0x1061b(%rip),%rax        #
>> 0x27700 <_rtld_local_ro>
>>    0x00000000000170e5 <+21>: add    $0x7,%rdi
>>    0x00000000000170e9 <+25>: shl    $0x4,%rdi
>>    0x00000000000170ed <+29>: add    %rdi,%rax
>>    0x00000000000170f0 <+32>: retq
>> End of assembler dump.
>> (gdb)
>
> The implementation of x86_feature_usable would be quite similar, but it
> would be necessary to readjust the way the usable bits are stored in
> ld.so.  That is true.
>
>> 3. Write CPUID program isn't easy.   You can try to write a program
>> to detect a CPUID feature to see what it takes.  I can't remember it
>> exactly without looking at glibc sources which I wrote.
>
> I completely agree, which is why I think we should *not* expose CPUID
> data, but higher-level information about which features are available.
> In some cases, it may be possible to use CPUID bits directly (for CMOV
> on i386, for example), but in many other cases, it will be necessary to
> take environmental factors into account, e.g. how the kernel has set up
> XCR0 when it created the process.
>
> As far as I know, it's the latter aspect that is not known in advance:
> Future CPUID bits might well have dependencies on XCR0 (or other CPU
> state) as well.  An application which tries to do detection with older
> glibcs without the corresponding *_Usable support would necessarily go
> wrong.  My API proposal avoids that because it just cannot happen with
> this interface, at the cost that you need to upgrade glibc first before
> you can use any previously-unknown bits.
>

I redesigned my interface:

/* HAS_CPU_FEATURE  evaluates to true if CPU supports the feature.  */
#define HAS_CPU_FEATURE(name) \
 ((x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
   & (bit_cpu_##name)) != 0)
/* CPU_FEATURE_USABLE evaluates to true if the feature is usable.  */
#define CPU_FEATURE_USABLE(name) \
  ((need_arch_feature_##name \
    && (x86_get_arch_feature (index_arch_##name##_Usable) \
& (bit_arch_##name##_Usable)) != 0) \
   || (!need_arch_feature_##name && HAS_CPU_FEATURE(name)))

 -- Macro: CPU_FEATURE_USABLE(name)

     Evaluate to true if the architecture feature ‘name’ is usable.  The
     supported features are:

     ‘ADX’             ‘AES’             ‘AVX’             ‘AVX2’
     ‘AVX512_4FMAPS’   ‘AVX512_4VNNIW’   ‘AVX512_BITALG’   ‘AVX512_IFMA’
     ‘AVX512_VBMI’     ‘AVX512_VBMI2’    ‘AVX512_VNNI’     ‘AVX512_VPOPCNTDQ’
     ‘AVX512BW’        ‘AVX512CD’        ‘AVX512ER’        ‘AVX512DQ’
     ‘AVX512F’         ‘AVX512PF’        ‘AVX512VL’        ‘BMI1’
     ‘BMI2’            ‘CLDEMOTE’        ‘CLFLUSHOPT’      ‘CLFSH’
     ‘CLWB’            ‘CMOV’            ‘CMPXCHG16B’      ‘CX8’
     ‘ERMS’            ‘F16C’            ‘FMA’             ‘FMA4’
     ‘FPU’             ‘FSGSBASE’        ‘FSRM’            ‘FXSR’
     ‘GFNI’            ‘HLE’             ‘LAHF64_SAHF64’   ‘LZCNT’
     ‘MOVBE’           ‘MOVDIRI’         ‘MOVDIR64B’       ‘MSR’
     ‘OSXSAVE’         ‘PCLMULQDQ’       ‘POPCNT’          ‘PREFETCHW’
     ‘PREFETCHWT1’     ‘RDPID’           ‘RDRAND’          ‘RDSEED’
     ‘RDTSCP’          ‘RTM’             ‘SEP’             ‘SHA’
     ‘SSE’             ‘SSE2’            ‘SSE3’            ‘SSE4_1’
     ‘SSE4_2’          ‘SSE4A’           ‘SSSE3’           ‘SYSCALL_SYSRET’
     ‘TBM’             ‘TSC’             ‘VAES’            ‘VPCLMULQDQ’
     ‘XOP’             ‘XSAVE’

If CPU_FEATURE_USABLE(XXX) is true, XXX can be used.

HAS_CPU_FEATURE is defined since it provides useful information.

-- 
H.J.

References:
- V4 [PATCH] x86: Add <sys/platform/x86.h>
  - From: H.J. Lu
- Re: V4 [PATCH] x86: Add <sys/platform/x86.h>
  - From: Florian Weimer
- Re: V4 [PATCH] x86: Add <sys/platform/x86.h>
  - From: H.J. Lu
- Re: V4 [PATCH] x86: Add <sys/platform/x86.h>
  - From: Florian Weimer
- Re: V4 [PATCH] x86: Add <sys/platform/x86.h>
  - From: H.J. Lu
- Re: V4 [PATCH] x86: Add <sys/platform/x86.h>
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]