[PATCH v1 3/3] x86: Add sse42 implementation to strcmp's ifunc
Sunil Pandey
skpgkp2@gmail.com
Thu Jul 14 02:54:14 GMT 2022
On Tue, Jun 14, 2022 at 6:09 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Jun 14, 2022 at 5:25 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > This has been missing since the the ifuncs where added.
> >
> > The performance of SSE4.2 is preferable to to SSE2.
> >
> > Measured on Tigerlake with N = 20 runs.
> > Geometric Mean of all benchmarks SSE4.2 / SSE2: 0.906
> > ---
> > sysdeps/x86_64/multiarch/strcmp.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c
> > index a248c2a6e6..9c1677724c 100644
> > --- a/sysdeps/x86_64/multiarch/strcmp.c
> > +++ b/sysdeps/x86_64/multiarch/strcmp.c
> > @@ -28,6 +28,7 @@
> >
> > extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
> > extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned) attribute_hidden;
> > +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse42) attribute_hidden;
> > extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
> > extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
> > extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
> > @@ -52,6 +53,10 @@ IFUNC_SELECTOR (void)
> > return OPTIMIZE (avx2);
> > }
> >
> > + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)
> > + && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2))
> > + return OPTIMIZE (sse42);
> > +
> > if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Load))
> > return OPTIMIZE (sse2_unaligned);
> >
> > --
> > 2.34.1
> >
>
> LGTM.
>
> Thanks.
>
> --
> H.J.
I would like to backport this patch to release branches.
Any comments or objections?
--Sunil
More information about the Libc-alpha
mailing list