This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]

On Fri, Jun 23, 2017 at 12:27 PM, Florian Weimer <> wrote:
> On 06/23/2017 09:12 PM, H.J. Lu wrote:
>>> movzbl -1(%rdi, %rdx), %edi
>>> movzbl -1(%rsi, %rdx), %esi
>>> orl %edi, %eax
>>> orl %esi, %ecx
>>> into
>>> movb -1(%rdi, %rdx), %al
>>> movb -1(%rsi, %rdx), %cl
>> Here is the benchmark result on Haswell.
>> [hjl@gnu-6 glibc-test]$ make
>> ./test
>> movb  : 19937666
>> movzbl: 21518186
>> [hjl@gnu-6 glibc-test]$
> Interesting.  So there isn't a steep penalty for partial register writes
> anymore?  Your patch is a nice improvement then.

Intel Optimization Guide has

In Intel microarchitecture code name Sandy Bridge, partial register
access is handled in hardware by
inserting a micro-op that merges the partial register with the full
register in the following cases:
• After a write to one of the registers AH, BH, CH or DH and before a
following read of the 2-, 4- or 8-
byte form of the same register. In these cases a merge micro-op is
inserted. The insertion consumes
a full allocation cycle in which other micro-ops cannot be allocated.
• After a micro-op with a destination register of 1 or 2 bytes, which
is not a source of the instruction (or
the register's bigger form), and before a following read of a 2-,4- or
8-byte form of the same register.
In these cases the merge micro-op is part of the flow.

None of them apply here to Haswell and Skylake.

I am checking in my patch now.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]