This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2


On Wed, 2018-10-31 at 15:36 -0300, Adhemerval Zanella wrote:
> 
> 
> > diff --git a/sysdeps/x86_64/multiarch/strcat-avx2.S
> > b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > new file mode 100644
> > index 00000000000..b0623564276
> > --- /dev/null
> > +++ b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > @@ -0,0 +1,275 @@
> > +/* strcat with AVX2
> 
> Is this really a gain on real work usage comparing to generic strcat
> (
> (strcpy (dest + strlen (dest), src)) assuming optimized strcpy /
> strlen?

The speedup is briefly summarized on the commit description but the
comparison was done between SSE2 Unaligned and AVX2 (current code).
Either SSE2 and AVX2 are much faster the generic strcat (I can provide
numbers).


> Wouldn't be simple and more i-cache friendly to use a custom generic 
> implementation that calls AVX2 strcpy/strlen (such as powerpc64
> does)?

I believe this is what we have right now through IFUNC_SELECTOR.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]