This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]


On Tue, Mar 21, 2017 at 11:01 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Mar 16, 2017 at 1:39 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve
>> the first 8 vector registers.  The code layout is
>>
>>   if only %xmm0 - %xmm7 registers are used
>>      preserve %xmm0 - %xmm7 registers
>>   if only %ymm0 - %ymm7 registers are used
>>      preserve %ymm0 - %ymm7 registers
>>   preserve %zmm0 - %zmm7 registers
>>
>> Branch predication always executes the fallthrough code path to preserve
>> %zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7
>> registers are used.  This leads to lower CPU frequency on Skylake
>> server.  This patch changes the fallthrough code path to preserve
>> %xmm0 - %xmm7 registers instead:
>>
>>   if whole %zmm0 - %zmm7 registers are used
>>     preserve %zmm0 - %zmm7 registers
>>   if only %ymm0 - %ymm7 registers are used
>>      preserve %ymm0 - %ymm7 registers
>>   preserve %xmm0 - %xmm7 registers
>>
>> Tested on Skylake server.
>>
>> Any comments?
>
> I checked it in.
>

I am backing it to 2.25 and 2.24 branches.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]