This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: V4 [PATCH] x86: Add <sys/platform/x86.h>
On Thu, Sep 20, 2018 at 6:14 AM, Florian Weimer <fweimer@redhat.com> wrote:
> * H. J. Lu:
>
>> On Thu, Sep 20, 2018 at 5:43 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> * H. J. Lu:
>>>
>>>> On Thu, Sep 20, 2018 at 4:41 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>>>> * H. J. Lu:
>>>>>
>>>>>> Add <sys/platform/x86.h> to provide an API to access x86 specific platform
>>>>>> features. It makes raw and derived CPUID information available to
>>>>>> applications so that programmers can avoid complicated steps to retrieve
>>>>>> the same information with CPUID instructions.
>>>>>
>>>>> Sorry, I still think that this patch semantically depends on delayed
>>>>> IFUNC processing.
>>
>> I don't believe this is an issue at all since new functions are in
>> ld.so. Can you show this is a real issue with a testcase? I can
>> add it to my test.
>
> It's the relocation for x86_get_arch_feature in the IFUNC resolver I'm
> worried about. Without delayed processing, the IFUNC resolver may be
> called at any time, and the relocation for x86_get_arch_feature may not
> have been processed yet. So it's not about calling into an unrelocated
> or uninitialized ld.so (which I agree would not happen), but the call
> itself going wrong because it's not been properly relocated yet.
>
> Usually, dependency order as established by DT_NEEDED entries will
> ensure that IFUNC resolvers are only called in already-relocated
> objects, but LD_PRELOAD, symbol interposition, or missing DT_NEEDED
> entries can invalidate that assumption and make the dependency ordering
> insufficient.
>
> I hope this makes it clear what I think will break when this feature is
> used from IFUNC resolvers.
I don't think this is an issue. I wrote a testcase to use <sys/platform/x86.h>
macros in IFUNC resolver in a LD_PRELOAD library:
https://github.com/hjl-tools/glibc/commit/94c992985919cfd5f5f7db431a6149376ab3d587
Is that what you have in mind? If not, can you show me with a testcase?
>>>>> Furthermore, I strongly prefer a simplified interface, like:
>>>>>
>>>>> enum {
>>>>> X86_FEATURE_MMX,
>>>>> X86_FEATURE_SSE,
>>>>> X86_FEATURE_SSE2,
>>>>> …
>>>>> X86_FEATURE_AVX512F,
>>>>> …
>>>>> };
>>>>>
>>>>> _Bool x86_feature_usable (int index) __attribute__ ((pure));
>>>>>
>>>>> Applications could then do something like this:
>>>>>
>>>>> if (x86_feature_usable (X86_FEATURE_AVX512F))
>>>>> return &__matmul_avx512f;
>>>>>
>>>>> I don't think there is value in exposing the CPUID data directly. If
>>>>> programmers need it, they can use <cpuid.h> from GCC instead, without
>>>>> needing to worry how glibc populates its data structures.
>>
>> There are several advantages of my approach:
>>
>> 1. When a new feature is added to previously reserved bit, the
>> program using the new feature will work with the older libc.so
>> whose CPUID data has the reserved bit.
>
> It depends on the CPUID feature bit whether that's a good idea. For the
> AVX512F, that would be wrong because it the CPUID data does not tell you
> whether the feature is actually usable.
>
> There is no ABI impact for future feature flags, so we can backport them
> easily enough.
>
>> 2. We only need to add 2 small functions:
>>
>> (gdb) disass x86_get_arch_feature
>> Dump of assembler code for function x86_get_arch_feature:
>> 0x0000000000017100 <+0>: xor %eax,%eax
>> 0x0000000000017102 <+2>: cmp $0x2,%edi
>> 0x0000000000017105 <+5>: ja 0x17117 <x86_get_arch_feature+23>
>> 0x0000000000017107 <+7>: mov %edi,%edi
>> 0x0000000000017109 <+9>: lea 0x105f0(%rip),%rax # 0x27700
>> <_rtld_local_ro>
>> 0x0000000000017110 <+16>: mov 0xb4(%rax,%rdi,4),%eax
>> 0x0000000000017117 <+23>: retq
>> End of assembler dump.
>> (gdb) disass x86_get_cpuid_registers
>> Dump of assembler code for function x86_get_cpuid_registers:
>> 0x00000000000170d0 <+0>: lea 0xc479(%rip),%rax # 0x23550
>> <zero_cpuid_registers.8564>
>> 0x00000000000170d7 <+7>: cmp $0x2,%edi
>> 0x00000000000170da <+10>: ja 0x170f0 <x86_get_cpuid_registers+32>
>> 0x00000000000170dc <+12>: mov %edi,%edi
>> 0x00000000000170de <+14>: lea 0x1061b(%rip),%rax #
>> 0x27700 <_rtld_local_ro>
>> 0x00000000000170e5 <+21>: add $0x7,%rdi
>> 0x00000000000170e9 <+25>: shl $0x4,%rdi
>> 0x00000000000170ed <+29>: add %rdi,%rax
>> 0x00000000000170f0 <+32>: retq
>> End of assembler dump.
>> (gdb)
>
> The implementation of x86_feature_usable would be quite similar, but it
> would be necessary to readjust the way the usable bits are stored in
> ld.so. That is true.
>
>> 3. Write CPUID program isn't easy. You can try to write a program
>> to detect a CPUID feature to see what it takes. I can't remember it
>> exactly without looking at glibc sources which I wrote.
>
> I completely agree, which is why I think we should *not* expose CPUID
> data, but higher-level information about which features are available.
> In some cases, it may be possible to use CPUID bits directly (for CMOV
> on i386, for example), but in many other cases, it will be necessary to
> take environmental factors into account, e.g. how the kernel has set up
> XCR0 when it created the process.
>
> As far as I know, it's the latter aspect that is not known in advance:
> Future CPUID bits might well have dependencies on XCR0 (or other CPU
> state) as well. An application which tries to do detection with older
> glibcs without the corresponding *_Usable support would necessarily go
> wrong. My API proposal avoids that because it just cannot happen with
> this interface, at the cost that you need to upgrade glibc first before
> you can use any previously-unknown bits.
>
I redesigned my interface:
/* HAS_CPU_FEATURE evaluates to true if CPU supports the feature. */
#define HAS_CPU_FEATURE(name) \
((x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
& (bit_cpu_##name)) != 0)
/* CPU_FEATURE_USABLE evaluates to true if the feature is usable. */
#define CPU_FEATURE_USABLE(name) \
((need_arch_feature_##name \
&& (x86_get_arch_feature (index_arch_##name##_Usable) \
& (bit_arch_##name##_Usable)) != 0) \
|| (!need_arch_feature_##name && HAS_CPU_FEATURE(name)))
-- Macro: CPU_FEATURE_USABLE(name)
Evaluate to true if the architecture feature ‘name’ is usable. The
supported features are:
‘ADX’ ‘AES’ ‘AVX’ ‘AVX2’
‘AVX512_4FMAPS’ ‘AVX512_4VNNIW’ ‘AVX512_BITALG’ ‘AVX512_IFMA’
‘AVX512_VBMI’ ‘AVX512_VBMI2’ ‘AVX512_VNNI’ ‘AVX512_VPOPCNTDQ’
‘AVX512BW’ ‘AVX512CD’ ‘AVX512ER’ ‘AVX512DQ’
‘AVX512F’ ‘AVX512PF’ ‘AVX512VL’ ‘BMI1’
‘BMI2’ ‘CLDEMOTE’ ‘CLFLUSHOPT’ ‘CLFSH’
‘CLWB’ ‘CMOV’ ‘CMPXCHG16B’ ‘CX8’
‘ERMS’ ‘F16C’ ‘FMA’ ‘FMA4’
‘FPU’ ‘FSGSBASE’ ‘FSRM’ ‘FXSR’
‘GFNI’ ‘HLE’ ‘LAHF64_SAHF64’ ‘LZCNT’
‘MOVBE’ ‘MOVDIRI’ ‘MOVDIR64B’ ‘MSR’
‘OSXSAVE’ ‘PCLMULQDQ’ ‘POPCNT’ ‘PREFETCHW’
‘PREFETCHWT1’ ‘RDPID’ ‘RDRAND’ ‘RDSEED’
‘RDTSCP’ ‘RTM’ ‘SEP’ ‘SHA’
‘SSE’ ‘SSE2’ ‘SSE3’ ‘SSE4_1’
‘SSE4_2’ ‘SSE4A’ ‘SSSE3’ ‘SYSCALL_SYSRET’
‘TBM’ ‘TSC’ ‘VAES’ ‘VPCLMULQDQ’
‘XOP’ ‘XSAVE’
If CPU_FEATURE_USABLE(XXX) is true, XXX can be used.
HAS_CPU_FEATURE is defined since it provides useful information.
--
H.J.