This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2
On Wed, 2018-10-31 at 15:36 -0300, Adhemerval Zanella wrote:
> > diff --git a/sysdeps/x86_64/multiarch/strcat-avx2.S
> > b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > new file mode 100644
> > index 00000000000..b0623564276
> > --- /dev/null
> > +++ b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > @@ -0,0 +1,275 @@
> > +/* strcat with AVX2
> Is this really a gain on real work usage comparing to generic strcat
> (strcpy (dest + strlen (dest), src)) assuming optimized strcpy /
The speedup is briefly summarized on the commit description but the
comparison was done between SSE2 Unaligned and AVX2 (current code).
Either SSE2 and AVX2 are much faster the generic strcat (I can provide
> Wouldn't be simple and more i-cache friendly to use a custom generic
> implementation that calls AVX2 strcpy/strlen (such as powerpc64
I believe this is what we have right now through IFUNC_SELECTOR.