This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 7 Apr 2017 10:06:15 -0700
- Subject: Re: [PATCH] x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]
- Authentication-results: sourceware.org; auth=none
- References: <20170316203911.GA26261@intel.com> <CAMe9rOoRYkDq67hSdiA_LJsc2rZJ+afU3RhgyN97V8kcduYuKw@mail.gmail.com>
On Tue, Mar 21, 2017 at 11:01 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Mar 16, 2017 at 1:39 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve
>> the first 8 vector registers. The code layout is
>>
>> if only %xmm0 - %xmm7 registers are used
>> preserve %xmm0 - %xmm7 registers
>> if only %ymm0 - %ymm7 registers are used
>> preserve %ymm0 - %ymm7 registers
>> preserve %zmm0 - %zmm7 registers
>>
>> Branch predication always executes the fallthrough code path to preserve
>> %zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7
>> registers are used. This leads to lower CPU frequency on Skylake
>> server. This patch changes the fallthrough code path to preserve
>> %xmm0 - %xmm7 registers instead:
>>
>> if whole %zmm0 - %zmm7 registers are used
>> preserve %zmm0 - %zmm7 registers
>> if only %ymm0 - %ymm7 registers are used
>> preserve %ymm0 - %ymm7 registers
>> preserve %xmm0 - %xmm7 registers
>>
>> Tested on Skylake server.
>>
>> Any comments?
>
> I checked it in.
>
I am backing it to 2.25 and 2.24 branches.
--
H.J.