This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
- From: Liubov Dmitrieva <liubov dot dmitrieva at gmail dot com>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: "H.J. Lu" <hjl dot tools at gmail dot com>, Matt Turner <mattst88 at gmail dot com>, Andreas Jaeger <aj at suse dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 12 Jul 2013 10:12:34 +0400
- Subject: Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
- References: <CAMe9rOreowCOEH+6zRaRNk_p9sYe3T2bhwPRbKpybW9cO0BhJA at mail dot gmail dot com> <1373419029-19125-1-git-send-email-mattst88 at gmail dot com> <51DCE51F dot 7000001 at suse dot com> <CAMe9rOqb3_DnhSh0jPh9=suJo5c+WjegxfDh1+1go6pY+7+PLA at mail dot gmail dot com> <CAEdQ38Go4UY=k==nYT_6S86-tsOoxOO=Wn=8_pNk+LkkxSxU_Q at mail dot gmail dot com> <CAMe9rOpgaNgGSdoM5rXdhLT-TqVEJjGMyHgKRP=t+2LrSTpFAA at mail dot gmail dot com> <CAEdQ38FBeyuJpQ1eSHnM5w=8MHD3cfFjgWekkXnRFHO+Aathnw at mail dot gmail dot com> <CAMe9rOompuMMzQm+RX=ejoPMX0uWmXarvSZa_fp-Fi1p_-8o1Q at mail dot gmail dot com> <CAHjhQ91+RSKU=1F4vQ1XrJ=1j1wAv6HuQJh_s9BzcBOOTP8BDg at mail dot gmail dot com> <20130712030150 dot GA7461 at domone dot PAOCY>
Do you mean AMD? For Intel there is no a machine without SSE4_1 where
sse2 unaligned version is faster than ssse3.
--
Liubov
On Fri, Jul 12, 2013 at 7:01 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Thu, Jul 11, 2013 at 06:07:49PM +0400, Liubov Dmitrieva wrote:
>> My Silvermont patch in the latest edition doesn't touch memcmp and
>> wmemcmp at all because I didn't see good boost from switching SSE42
>> off for these 2 functions.
>> Now I see why. There are no SSE42 instruction there. :)
>> The patch looks good. I will just check performance regressions for Penryn.
>>
> Now question is if this is also good for other archs. An SSE 4.1 is not
> really needed, we can just replace ptest with pmovmskb test pair and
> performance will be nearly identical so it is worth checking if old
> cores benefit. (I see possible optimizations which I will send later.)
>
>> --
>> Liubov
>>
>> On Wed, Jul 10, 2013 at 10:23 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Wed, Jul 10, 2013 at 11:19 AM, Matt Turner <mattst88@gmail.com> wrote:
>> >> On Wed, Jul 10, 2013 at 11:16 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>> On Wed, Jul 10, 2013 at 10:41 AM, Matt Turner <mattst88@gmail.com> wrote:
>> >>>> On Wed, Jul 10, 2013 at 8:30 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>>>> On Tue, Jul 9, 2013 at 9:37 PM, Andreas Jaeger <aj@suse.com> wrote:
>> >>>>>> On 07/10/2013 03:17 AM, Matt Turner wrote:
>> >>>>>>> It uses SSE 4.1 instructions (ptest) but no SSE 4.2 instructions.
>> >>>>>>
>> >>>>>> There are two parts to this: It should only run on cpus with those
>> >>>>>> instructions but we also need to ensure that it gives a better
>> >>>>>> performance on such cpus. HJ, Matt, please do run performance tests on a
>> >>>>>> variety of affected cpus to show that this change really helps in all cases,
>> >>>>>>
>> >>>>>> Andreas
>> >>>>>
>> >>>>> Only Penryn has SSE4.1 without SSE4.2. Liubov, can
>> >>>>> you compare performance of memcmp-sse4.S vs
>> >>>>> memcmp-ssse3.S on Penryn?
>> >>>>
>> >>>> Is it also the case that this path would now be used on Silvermont?
>> >>>
>> >>> It is used on Silvermont since it supports SSE4.2
>> >>>
>> >>> --
>> >>> H.J.
>> >>
>> >> To confirm, setting bit_Slow_SSE4_2 on Silvermont (which we do)
>> >> wouldn't prevent this path from executing?
>> >
>> > I don't think so. Liubov, can you verify it?
>> >
>> > --
>> > H.J.
>
> --
>
> The cord jumped over and hit the power switch.