This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86: Fix Haswell strong flags (BZ#23709)



On 23/10/2018 06:55, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
>> index f4e0f5a2ed..80b3054cf8 100644
>> --- a/sysdeps/x86/cpu-features.c
>> +++ b/sysdeps/x86/cpu-features.c
>> @@ -316,7 +316,13 @@ init_cpu_features (struct cpu_features *cpu_features)
>>  		    | bit_arch_Fast_Unaligned_Copy
>>  		    | bit_arch_Prefer_PMINUB_for_stringop);
>>  	      break;
>> +	    }
>>  
>> +	 /* Disable TSX on some Haswell processors to avoid TSX on kernels that
>> +	    weren't updated with the latest microcode package (which disables
>> +	    broken feature by default).  */
>> +	 switch (model)
>> +	    {
>>  	    case 0x3f:
>>  	      /* Xeon E7 v3 with stepping >= 4 has working TSX.  */
>>  	      if (stepping >= 4)
> 
> I think the change is okay as posted.  It will need some testing in the
> field because the newly selected implementations could have unexpected
> performance drawbacks.

This fix only change ifunc selection for Haswell chips [1]:

Haswell (Client)	GT3E		0	0x6	0x4	0x6	Family 6 Model 70
			ULT		0	0x6	0x4	0x5	Family 6 Model 69
			S		0	0x6	0x3	0xC	Family 6 Model 60

Haswell (Server)	E, EP, EX	0	0x6	0x3	0xF	Family 6 Model 63

On these chips the internal glibc flags bit_arch_Fast_Rep_String, 
bit_arch_Fast_Unaligned_Load, bit_arch_Fast_Unaligned_Copy, and 
bit_arch_Prefer_PMINUB_for_stringop won't be set:

  * The bit_arch_Fast_Rep_String is only used on ifunc selection 
    on i686 (32-bits) and it selects the *ssse3_rep* memcpy, memmove, 
    bcopy, and mempcpy.  It it is not set the *ssse3* variant is used 
    instead.  

    I am not sure which is the performance difference between but my 
    expectation is ssse3_rep should be faster than ssse3.

  * The bit_arch_Fast_Unaligned_Load influences both i686 and x86_64. 
    For x86_64 it influences the selections of the SSE2 unaligned 
    optimization variants for stpncpy, strcpy, strncpy, stpcpy, strncat, 
    strcat, and strstr.  For all but strstr an ssse3 or sse2 variant is 
    used instead

    I am not sure which is the performance difference between but my 
    expectation the unaligned version should be faster.

  * The bit_arch_Fast_Unaligned_Copy influences mempcpy, memmove, and 
    memcpy. If chip has not SSE3 the bit will select either a RMS or 
    a unaligned variant. For Haswell the *_avx_unaligned_erms variants 
    will be selected, so this bits won't interfere with best selections.

The bit_arch_PMINUB_for_stringop is not used on ifunc selection.

[1] https://en.wikichip.org/wiki/intel/cpuid


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]