This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Add <sys/platform/x86.h> to glibc 2.29


On Tue, Jul 31, 2018 at 8:06 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 07/31/2018 10:48 AM, H.J. Lu wrote:
>> On Tue, Jul 31, 2018 at 6:32 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> On 07/29/2018 12:16 PM, H.J. Lu wrote:
>>>> I am proposing to add <sys/platform/x86.h> to glibc 2.29 to provide
>>>> an API to access x86 specific platform features:
>>>>
>>>> enum
>>>>   {
>>>>     /* The integer bit array index of architecture feature bits.  */
>>>>     FEATURE_INDEX_1 = 0,
>>>>     FEATURE_INDEX_2,
>>>>     /* The current maximum size of the feature integer bit array.  */
>>>>     FEATURE_INDEX_MAX
>>>>   };
>>>>
>>>> enum
>>>>   {
>>>>     COMMON_CPUID_INDEX_1 = 0,
>>>>     COMMON_CPUID_INDEX_7,
>>>>     COMMON_CPUID_INDEX_80000001,
>>>>     COMMON_CPUID_INDEX_D_ECX_1,
>>>>     COMMON_CPUID_INDEX_80000007,
>>>>     COMMON_CPUID_INDEX_80000008,
>>>>     /* Keep the following line at the end.  */
>>>>     COMMON_CPUID_INDEX_MAX
>>>>   };
>>>>
>>>> enum cpu_features_kind
>>>>   {
>>>>     arch_kind_unknown = 0,
>>>>     arch_kind_intel,
>>>>     arch_kind_amd,
>>>>     arch_kind_other
>>>>   };
>>>>
>>>> struct cpuid_registers
>>>>   {
>>>>     unsigned int eax;
>>>>     unsigned int ebx;
>>>>     unsigned int ecx;
>>>>     unsigned int edx;
>>>>   };
>>>>
>>>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>>>      __attribute__ ((const));
>>>>
>>>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>>>      (unsigned int) __attribute__ ((const));
>>>>
>>>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>>>      __attribute__ ((const));
>>>>
>>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>>
>>>> /* HAS_* evaluates to true if we may use the feature at runtime.  */
>>>> #define HAS_CPU_FEATURE(name) \
>>>>    ((__x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
>>>>      & (bit_cpu_##name)) != 0)
>>>
>>> I don't like these as macros.
>>>
>>> I would rather an inline function that does all the work within the function
>>> and returns a result of the appropriate type e.g. boolean.
>>
>> CPUID instruction takes EAX and optionally ECX as input,  It writes to
>> EAX, EBX, ECD and EDX.  Each CPU feature is determined by CPUID
>> with specific input values and a bit in EAX, EBX, ECX or EDX.   Take
>> AVX2 as example:
>>
>> #define bit_cpu_AVX2 (1u << 5)
>> #define index_cpu_AVX2 COMMON_CPUID_INDEX_7
>> #define reg_AVX2 ebx
>>
>> It is determined by the bit 5 of EBX from CPUID with EAX == 7.
>> HAS_CPU_FEATURE (AVX2) is expanded to
>>
>>    ((__x86_get_cpuid_registers (index_cpu_AVX2)->reg_AVX2 \
>>      & (bit_cpu_AVX2)) != 0)
>>
>> which is equivalent to
>>
>>   ((__x86_get_cpuid_registers (COMMON_CPUID_INDEX_7)->ebx \
>>      & ((1u << 5))) != 0)
>>
>> It is perfect for macro, not so convenient for inline function.
>
> It's not perfect.
>
> It's difficult to debug.

sysdeps/x86/tst-get-cpu-features.c should have 100% coverage of all
features.  Glibc users can just use them.

> You are using the c pre-processor here to evaluate the input argument
> into another 2 or 3 constant values, this is the kind of propagation
> of values that the compiler can do too.

Not easily with inline function in a public header file.

> I'd like to see an attempt at a functional definition of this rather
> than a macro, and the best argument against a static inline function
> is performance. If the macro generates much better code then that's
> a real reason to use it.

In this case, macro generates very good code and very easy to
maintain.

>>>> #define HAS_ARCH_FEATURE(name) \
>>>>    ((__x86_get_arch_feature (index_arch_##name) & (bit_arch_##name)) != 0)
>>>>
>>>
>>> Likewise.
>>>
>>>> Users can use <sys/platform/x86.h> to detect if a CPU feature is
>>>> supported at run-time with
>>>>
>>>> HAS_CPU_FEATURE (AVX)
>>>
>>> Likewise.
>>>
>>>> which returns true if glibc detects that AVX is available.  Before AVX
>>>> can be used, he/she should check
>>>>
>>>> HAS_ARCH_FEATURE (AVX_Usable)
>>>
>>> Likewise.
>>>
>>> These macros like this are harder to debug than static inline functions.
>>>
>>>> which returns true if glibc detects that AVX is available and usable.
>>>>
>>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>>
>>>> can be used to convert between TSCs and nanoseconds if
>>>>
>>>> HAS_ARCH_FEATURE (TSC_To_NS_Usable)
>>>>
>>>> returns true.
>>>>
>>>> Backward binary compatibility is provided by:
>>>>
>>>> 1. __x86_get_cpu_kind will get a new version if a new CPU kind is added
>>>> to cpu_features_kind.
>>>
>>> This function takes void and returns the kind of CPU as an enum value. OK.
>>>
>>>> 2. __x86_get_cpuid_registers returns a pointer to a cpuid_registers
>>>> with all zeros if cpuid index >= COMMON_CPUID_INDEX_MAX.
>>>
>>> Could you expand on this in more detail please.
>>
>> Internally, we have
>>
>> struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
>>
>> _x86_get_cpuid_registers is defined as
>>
>> const struct cpuid_registers *
>> __x86_get_cpuid_registers (unsigned int i)
>> {
>>   const static struct cpuid_registers zero_cpuid_registers = { 0 };
>>   if (i >= COMMON_CPUID_INDEX_MAX)
>>     return &zero_cpuid_registers;
>>   return &GLRO(dl_x86_cpu_features).cpuid[i];
>> }
>>
>
> I don't like returning all zero for an error condition, we should have
> a distinct error and return value..

It isn't an error.  It simply means that this glibc doesn't support
this CPUID input.  It is OK to return 0 in EAX, EBX, ECX and EDX
and HAS_CPU_FEATURE (XXX) evaluates to false.

> Are we certain that an all zero cpuid register return is an invalid
> result for cpuid in all cases or could it be confused with a valid
> cpuid result but with the options in question disabled?

HAS_CPU_FEATURE (XXX) evaluates to true if glibc detects that
XXX is supported.  It is OK for HAS_CPU_FEATURE (XXX) to be
false if glibc doesn't know XXX on processors which has XXX.

BTW, there are many reserved bits in EAX, EBX, ECX and EDX,
which may be changed to new CPU features in future processors.
It is easy to add these features to x86.h with macro without recompiling
glibc.

>>>> 3. __x86_get_arch_feature returns 0 if architecture index >=
>>>> FEATURE_INDEX_MAX.
>>>
>>> Likewise.
>>>
>>>> This means:
>>>>
>>>> 1. Applications linked against the new __x86_get_cpu_kind will only
>>>> run with the newer libc.so.
>>>
>>> How? Symbol versioning?
>>
>> Yes.
>
> OK.
>
>>>> 2. Applications linked against the new __x86_get_cpuid_registers and
>>>> __x86_get_arch_feature will run with the older libc.so.  But new
>>>> CPU and architecture features won't be detected by the older libc.so.
>>>
>>> The caller must attempt to copy or inspect all the data, but it's structure
>>> size may be larger, and the old glibc structure may be smaller, how do you
>>> prevent the newer program from reading out of bounds when run in an older
>>> glibc? This question is only relevant for __x86_get_cpuid_registers which
>>> returns a pointer to a fixed size structure. The structure can grow, but
>>> the caller needs to know exactly how big the returned structure is to avoid
>>> reading outside of bounds.
>>
>> There is no ABI issue here.   __x86_get_cpuid_registers returns a pointer to
>> an element of struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX],
>> whose size is fixed as
>>
>> struct cpuid_registers
>>   {
>>     unsigned int eax;
>>     unsigned int ebx;
>>     unsigned int ecx;
>>     unsigned int edx;
>>   };
>
> Ah! Right, it's the cached cpuid return result so it's following the ISA rules
> for the registers it touches. I expect this can't change at this point, and if
> we got a *new* cpuid instruction we'd just make a new function to implement that.

True.

>> which is the cached output of CPUID.  As shown above if the input index
>>> = COMMON_CPUID_INDEX_MAX. we return a pointer to
>>
>> const static struct cpuid_registers zero_cpuid_registers = { 0 };
>
>
> --
> Cheers,
> Carlos.



-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]