This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
On Fri, Jun 23, 2017 at 12:27 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/23/2017 09:12 PM, H.J. Lu wrote:
>
>>> movzbl -1(%rdi, %rdx), %edi
>>> movzbl -1(%rsi, %rdx), %esi
>>> orl %edi, %eax
>>> orl %esi, %ecx
>>>
>>> into
>>>
>>> movb -1(%rdi, %rdx), %al
>>> movb -1(%rsi, %rdx), %cl
>>
>> Here is the benchmark result on Haswell.
>>
>> [hjl@gnu-6 glibc-test]$ make
>> ./test
>> movb : 19937666
>> movzbl: 21518186
>> [hjl@gnu-6 glibc-test]$
>
> Interesting. So there isn't a steep penalty for partial register writes
> anymore? Your patch is a nice improvement then.
Intel Optimization Guide has
In Intel microarchitecture code name Sandy Bridge, partial register
access is handled in hardware by
inserting a micro-op that merges the partial register with the full
register in the following cases:
• After a write to one of the registers AH, BH, CH or DH and before a
following read of the 2-, 4- or 8-
byte form of the same register. In these cases a merge micro-op is
inserted. The insertion consumes
a full allocation cycle in which other micro-ops cannot be allocated.
• After a micro-op with a destination register of 1 or 2 bytes, which
is not a source of the instruction (or
the register's bigger form), and before a following read of a 2-,4- or
8-byte form of the same register.
In these cases the merge micro-op is part of the flow.
None of them apply here to Haswell and Skylake.
I am checking in my patch now.
--
H.J.