This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]

From: "H.J. Lu" <hjl dot tools at gmail dot com>
To: Florian Weimer <fweimer at redhat dot com>
Cc: "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 23 Jun 2017 12:40:43 -0700
Subject: Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
Authentication-results: sourceware.org; auth=none
References: <20170623132026.82F2D4017D45E@oldenburg.str.redhat.com> <6fec374c-177f-b8e8-d7a3-ab10d7dab136@redhat.com> <5c63ea37-7fe6-79f3-de42-9146ae084269@redhat.com> <37781c2f-2cd5-6c3e-e0b1-9189ffe97750@redhat.com> <CAMe9rOqwAAwAtDEWEsdcafSfQBRQXqPjEw9gfeVAVvp9ciVryQ@mail.gmail.com> <CAMe9rOp3fZeWSuEaLuWSGFzbF+prMca7WfttBd=QxAkgZTNR-A@mail.gmail.com> <f78e5e52-d976-35fb-d836-81aec5d112e7@redhat.com>

On Fri, Jun 23, 2017 at 12:27 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/23/2017 09:12 PM, H.J. Lu wrote:
>
>>> movzbl -1(%rdi, %rdx), %edi
>>> movzbl -1(%rsi, %rdx), %esi
>>> orl %edi, %eax
>>> orl %esi, %ecx
>>>
>>> into
>>>
>>> movb -1(%rdi, %rdx), %al
>>> movb -1(%rsi, %rdx), %cl
>>
>> Here is the benchmark result on Haswell.
>>
>> [hjl@gnu-6 glibc-test]$ make
>> ./test
>> movb  : 19937666
>> movzbl: 21518186
>> [hjl@gnu-6 glibc-test]$
>
> Interesting.  So there isn't a steep penalty for partial register writes
> anymore?  Your patch is a nice improvement then.

Intel Optimization Guide has

In Intel microarchitecture code name Sandy Bridge, partial register
access is handled in hardware by
inserting a micro-op that merges the partial register with the full
register in the following cases:
• After a write to one of the registers AH, BH, CH or DH and before a
following read of the 2-, 4- or 8-
byte form of the same register. In these cases a merge micro-op is
inserted. The insertion consumes
a full allocation cycle in which other micro-ops cannot be allocated.
• After a micro-op with a destination register of 1 or 2 bytes, which
is not a source of the instruction (or
the register's bigger form), and before a following read of a 2-,4- or
8-byte form of the same register.
In these cases the merge micro-op is part of the flow.

None of them apply here to Haswell and Skylake.

I am checking in my patch now.

-- 
H.J.

References:
- [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
  - From: Florian Weimer
- Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
  - From: Florian Weimer
- Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
  - From: Carlos O'Donell
- Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
  - From: Florian Weimer
- Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
  - From: H.J. Lu
- Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
  - From: H.J. Lu
- Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]