This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: Add <sys/platform/x86.h> to glibc 2.29

From: Carlos O'Donell <carlos at redhat dot com>
To: "H.J. Lu" <hjl dot tools at gmail dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 31 Jul 2018 11:06:39 -0400
Subject: Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
References: <20180729161623.GA17944@gmail.com> <f6206157-256a-060d-80cb-f184b40146ca@redhat.com> <CAMe9rOpuzrU1487ddFU139xfUhDFaJ6G+ofXLLMWVGk5RFdyRw@mail.gmail.com>

On 07/31/2018 10:48 AM, H.J. Lu wrote:
> On Tue, Jul 31, 2018 at 6:32 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 07/29/2018 12:16 PM, H.J. Lu wrote:
>>> I am proposing to add <sys/platform/x86.h> to glibc 2.29 to provide
>>> an API to access x86 specific platform features:
>>>
>>> enum
>>>   {
>>>     /* The integer bit array index of architecture feature bits.  */
>>>     FEATURE_INDEX_1 = 0,
>>>     FEATURE_INDEX_2,
>>>     /* The current maximum size of the feature integer bit array.  */
>>>     FEATURE_INDEX_MAX
>>>   };
>>>
>>> enum
>>>   {
>>>     COMMON_CPUID_INDEX_1 = 0,
>>>     COMMON_CPUID_INDEX_7,
>>>     COMMON_CPUID_INDEX_80000001,
>>>     COMMON_CPUID_INDEX_D_ECX_1,
>>>     COMMON_CPUID_INDEX_80000007,
>>>     COMMON_CPUID_INDEX_80000008,
>>>     /* Keep the following line at the end.  */
>>>     COMMON_CPUID_INDEX_MAX
>>>   };
>>>
>>> enum cpu_features_kind
>>>   {
>>>     arch_kind_unknown = 0,
>>>     arch_kind_intel,
>>>     arch_kind_amd,
>>>     arch_kind_other
>>>   };
>>>
>>> struct cpuid_registers
>>>   {
>>>     unsigned int eax;
>>>     unsigned int ebx;
>>>     unsigned int ecx;
>>>     unsigned int edx;
>>>   };
>>>
>>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>>      __attribute__ ((const));
>>>
>>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>>      (unsigned int) __attribute__ ((const));
>>>
>>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>>      __attribute__ ((const));
>>>
>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>
>>> /* HAS_* evaluates to true if we may use the feature at runtime.  */
>>> #define HAS_CPU_FEATURE(name) \
>>>    ((__x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
>>>      & (bit_cpu_##name)) != 0)
>>
>> I don't like these as macros.
>>
>> I would rather an inline function that does all the work within the function
>> and returns a result of the appropriate type e.g. boolean.
> 
> CPUID instruction takes EAX and optionally ECX as input,  It writes to
> EAX, EBX, ECD and EDX.  Each CPU feature is determined by CPUID
> with specific input values and a bit in EAX, EBX, ECX or EDX.   Take
> AVX2 as example:
> 
> #define bit_cpu_AVX2 (1u << 5)
> #define index_cpu_AVX2 COMMON_CPUID_INDEX_7
> #define reg_AVX2 ebx
> 
> It is determined by the bit 5 of EBX from CPUID with EAX == 7.
> HAS_CPU_FEATURE (AVX2) is expanded to
> 
>    ((__x86_get_cpuid_registers (index_cpu_AVX2)->reg_AVX2 \
>      & (bit_cpu_AVX2)) != 0)
> 
> which is equivalent to
> 
>   ((__x86_get_cpuid_registers (COMMON_CPUID_INDEX_7)->ebx \
>      & ((1u << 5))) != 0)
> 
> It is perfect for macro, not so convenient for inline function.

It's not perfect.

It's difficult to debug.

You are using the c pre-processor here to evaluate the input argument
into another 2 or 3 constant values, this is the kind of propagation
of values that the compiler can do too.

I'd like to see an attempt at a functional definition of this rather
than a macro, and the best argument against a static inline function
is performance. If the macro generates much better code then that's
a real reason to use it.

>>> #define HAS_ARCH_FEATURE(name) \
>>>    ((__x86_get_arch_feature (index_arch_##name) & (bit_arch_##name)) != 0)
>>>
>>
>> Likewise.
>>
>>> Users can use <sys/platform/x86.h> to detect if a CPU feature is
>>> supported at run-time with
>>>
>>> HAS_CPU_FEATURE (AVX)
>>
>> Likewise.
>>
>>> which returns true if glibc detects that AVX is available.  Before AVX
>>> can be used, he/she should check
>>>
>>> HAS_ARCH_FEATURE (AVX_Usable)
>>
>> Likewise.
>>
>> These macros like this are harder to debug than static inline functions.
>>
>>> which returns true if glibc detects that AVX is available and usable.
>>>
>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>
>>> can be used to convert between TSCs and nanoseconds if
>>>
>>> HAS_ARCH_FEATURE (TSC_To_NS_Usable)
>>>
>>> returns true.
>>>
>>> Backward binary compatibility is provided by:
>>>
>>> 1. __x86_get_cpu_kind will get a new version if a new CPU kind is added
>>> to cpu_features_kind.
>>
>> This function takes void and returns the kind of CPU as an enum value. OK.
>>
>>> 2. __x86_get_cpuid_registers returns a pointer to a cpuid_registers
>>> with all zeros if cpuid index >= COMMON_CPUID_INDEX_MAX.
>>
>> Could you expand on this in more detail please.
> 
> Internally, we have
> 
> struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
> 
> _x86_get_cpuid_registers is defined as
> 
> const struct cpuid_registers *
> __x86_get_cpuid_registers (unsigned int i)
> {
>   const static struct cpuid_registers zero_cpuid_registers = { 0 };
>   if (i >= COMMON_CPUID_INDEX_MAX)
>     return &zero_cpuid_registers;
>   return &GLRO(dl_x86_cpu_features).cpuid[i];
> }
> 

I don't like returning all zero for an error condition, we should have
a distinct error and return value..

Are we certain that an all zero cpuid register return is an invalid
result for cpuid in all cases or could it be confused with a valid
cpuid result but with the options in question disabled?

>>> 3. __x86_get_arch_feature returns 0 if architecture index >=
>>> FEATURE_INDEX_MAX.
>>
>> Likewise.
>>
>>> This means:
>>>
>>> 1. Applications linked against the new __x86_get_cpu_kind will only
>>> run with the newer libc.so.
>>
>> How? Symbol versioning?
> 
> Yes.

OK.

>>> 2. Applications linked against the new __x86_get_cpuid_registers and
>>> __x86_get_arch_feature will run with the older libc.so.  But new
>>> CPU and architecture features won't be detected by the older libc.so.
>>
>> The caller must attempt to copy or inspect all the data, but it's structure
>> size may be larger, and the old glibc structure may be smaller, how do you
>> prevent the newer program from reading out of bounds when run in an older
>> glibc? This question is only relevant for __x86_get_cpuid_registers which
>> returns a pointer to a fixed size structure. The structure can grow, but
>> the caller needs to know exactly how big the returned structure is to avoid
>> reading outside of bounds.
> 
> There is no ABI issue here.   __x86_get_cpuid_registers returns a pointer to
> an element of struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX],
> whose size is fixed as
> 
> struct cpuid_registers
>   {
>     unsigned int eax;
>     unsigned int ebx;
>     unsigned int ecx;
>     unsigned int edx;
>   };

Ah! Right, it's the cached cpuid return result so it's following the ISA rules
for the registers it touches. I expect this can't change at this point, and if
we got a *new* cpuid instruction we'd just make a new function to implement that.

> which is the cached output of CPUID.  As shown above if the input index
>> = COMMON_CPUID_INDEX_MAX. we return a pointer to
> 
> const static struct cpuid_registers zero_cpuid_registers = { 0 };


-- 
Cheers,
Carlos.

Follow-Ups:
- Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  - From: H.J. Lu

References:
- RFC: Add <sys/platform/x86.h> to glibc 2.29
  - From: H.J. Lu
- Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  - From: Carlos O'Donell
- Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]