This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
On 07/31/2018 10:48 AM, H.J. Lu wrote:
> On Tue, Jul 31, 2018 at 6:32 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 07/29/2018 12:16 PM, H.J. Lu wrote:
>>> I am proposing to add <sys/platform/x86.h> to glibc 2.29 to provide
>>> an API to access x86 specific platform features:
>>>
>>> enum
>>> {
>>> /* The integer bit array index of architecture feature bits. */
>>> FEATURE_INDEX_1 = 0,
>>> FEATURE_INDEX_2,
>>> /* The current maximum size of the feature integer bit array. */
>>> FEATURE_INDEX_MAX
>>> };
>>>
>>> enum
>>> {
>>> COMMON_CPUID_INDEX_1 = 0,
>>> COMMON_CPUID_INDEX_7,
>>> COMMON_CPUID_INDEX_80000001,
>>> COMMON_CPUID_INDEX_D_ECX_1,
>>> COMMON_CPUID_INDEX_80000007,
>>> COMMON_CPUID_INDEX_80000008,
>>> /* Keep the following line at the end. */
>>> COMMON_CPUID_INDEX_MAX
>>> };
>>>
>>> enum cpu_features_kind
>>> {
>>> arch_kind_unknown = 0,
>>> arch_kind_intel,
>>> arch_kind_amd,
>>> arch_kind_other
>>> };
>>>
>>> struct cpuid_registers
>>> {
>>> unsigned int eax;
>>> unsigned int ebx;
>>> unsigned int ecx;
>>> unsigned int edx;
>>> };
>>>
>>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>> __attribute__ ((const));
>>>
>>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>> (unsigned int) __attribute__ ((const));
>>>
>>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>> __attribute__ ((const));
>>>
>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>
>>> /* HAS_* evaluates to true if we may use the feature at runtime. */
>>> #define HAS_CPU_FEATURE(name) \
>>> ((__x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
>>> & (bit_cpu_##name)) != 0)
>>
>> I don't like these as macros.
>>
>> I would rather an inline function that does all the work within the function
>> and returns a result of the appropriate type e.g. boolean.
>
> CPUID instruction takes EAX and optionally ECX as input, It writes to
> EAX, EBX, ECD and EDX. Each CPU feature is determined by CPUID
> with specific input values and a bit in EAX, EBX, ECX or EDX. Take
> AVX2 as example:
>
> #define bit_cpu_AVX2 (1u << 5)
> #define index_cpu_AVX2 COMMON_CPUID_INDEX_7
> #define reg_AVX2 ebx
>
> It is determined by the bit 5 of EBX from CPUID with EAX == 7.
> HAS_CPU_FEATURE (AVX2) is expanded to
>
> ((__x86_get_cpuid_registers (index_cpu_AVX2)->reg_AVX2 \
> & (bit_cpu_AVX2)) != 0)
>
> which is equivalent to
>
> ((__x86_get_cpuid_registers (COMMON_CPUID_INDEX_7)->ebx \
> & ((1u << 5))) != 0)
>
> It is perfect for macro, not so convenient for inline function.
It's not perfect.
It's difficult to debug.
You are using the c pre-processor here to evaluate the input argument
into another 2 or 3 constant values, this is the kind of propagation
of values that the compiler can do too.
I'd like to see an attempt at a functional definition of this rather
than a macro, and the best argument against a static inline function
is performance. If the macro generates much better code then that's
a real reason to use it.
>>> #define HAS_ARCH_FEATURE(name) \
>>> ((__x86_get_arch_feature (index_arch_##name) & (bit_arch_##name)) != 0)
>>>
>>
>> Likewise.
>>
>>> Users can use <sys/platform/x86.h> to detect if a CPU feature is
>>> supported at run-time with
>>>
>>> HAS_CPU_FEATURE (AVX)
>>
>> Likewise.
>>
>>> which returns true if glibc detects that AVX is available. Before AVX
>>> can be used, he/she should check
>>>
>>> HAS_ARCH_FEATURE (AVX_Usable)
>>
>> Likewise.
>>
>> These macros like this are harder to debug than static inline functions.
>>
>>> which returns true if glibc detects that AVX is available and usable.
>>>
>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>
>>> can be used to convert between TSCs and nanoseconds if
>>>
>>> HAS_ARCH_FEATURE (TSC_To_NS_Usable)
>>>
>>> returns true.
>>>
>>> Backward binary compatibility is provided by:
>>>
>>> 1. __x86_get_cpu_kind will get a new version if a new CPU kind is added
>>> to cpu_features_kind.
>>
>> This function takes void and returns the kind of CPU as an enum value. OK.
>>
>>> 2. __x86_get_cpuid_registers returns a pointer to a cpuid_registers
>>> with all zeros if cpuid index >= COMMON_CPUID_INDEX_MAX.
>>
>> Could you expand on this in more detail please.
>
> Internally, we have
>
> struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
>
> _x86_get_cpuid_registers is defined as
>
> const struct cpuid_registers *
> __x86_get_cpuid_registers (unsigned int i)
> {
> const static struct cpuid_registers zero_cpuid_registers = { 0 };
> if (i >= COMMON_CPUID_INDEX_MAX)
> return &zero_cpuid_registers;
> return &GLRO(dl_x86_cpu_features).cpuid[i];
> }
>
I don't like returning all zero for an error condition, we should have
a distinct error and return value..
Are we certain that an all zero cpuid register return is an invalid
result for cpuid in all cases or could it be confused with a valid
cpuid result but with the options in question disabled?
>>> 3. __x86_get_arch_feature returns 0 if architecture index >=
>>> FEATURE_INDEX_MAX.
>>
>> Likewise.
>>
>>> This means:
>>>
>>> 1. Applications linked against the new __x86_get_cpu_kind will only
>>> run with the newer libc.so.
>>
>> How? Symbol versioning?
>
> Yes.
OK.
>>> 2. Applications linked against the new __x86_get_cpuid_registers and
>>> __x86_get_arch_feature will run with the older libc.so. But new
>>> CPU and architecture features won't be detected by the older libc.so.
>>
>> The caller must attempt to copy or inspect all the data, but it's structure
>> size may be larger, and the old glibc structure may be smaller, how do you
>> prevent the newer program from reading out of bounds when run in an older
>> glibc? This question is only relevant for __x86_get_cpuid_registers which
>> returns a pointer to a fixed size structure. The structure can grow, but
>> the caller needs to know exactly how big the returned structure is to avoid
>> reading outside of bounds.
>
> There is no ABI issue here. __x86_get_cpuid_registers returns a pointer to
> an element of struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX],
> whose size is fixed as
>
> struct cpuid_registers
> {
> unsigned int eax;
> unsigned int ebx;
> unsigned int ecx;
> unsigned int edx;
> };
Ah! Right, it's the cached cpuid return result so it's following the ISA rules
for the registers it touches. I expect this can't change at this point, and if
we got a *new* cpuid instruction we'd just make a new function to implement that.
> which is the cached output of CPUID. As shown above if the input index
>> = COMMON_CPUID_INDEX_MAX. we return a pointer to
>
> const static struct cpuid_registers zero_cpuid_registers = { 0 };
--
Cheers,
Carlos.