This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2
- From: Leonardo Sandoval <leonardo dot sandoval dot gonzalez at linux dot intel dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
- Date: Thu, 01 Nov 2018 11:29:59 -0600
- Subject: Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2
- References: <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
On Wed, 2018-10-31 at 17:59 -0300, Adhemerval Zanella wrote:
> On 31/10/2018 17:23, Leonardo Sandoval wrote:
> > On Wed, 2018-10-31 at 15:36 -0300, Adhemerval Zanella wrote:
> > >
> > > > diff --git a/sysdeps/x86_64/multiarch/strcat-avx2.S
> > > > b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > > > new file mode 100644
> > > > index 00000000000..b0623564276
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > > > @@ -0,0 +1,275 @@
> > > > +/* strcat with AVX2
> > >
> > > Is this really a gain on real work usage comparing to generic
> > > strcat
> > > (
> > > (strcpy (dest + strlen (dest), src)) assuming optimized strcpy /
> > > strlen?
> > The speedup is briefly summarized on the commit description but the
> > comparison was done between SSE2 Unaligned and AVX2 (current code).
> > Either SSE2 and AVX2 are much faster the generic strcat (I can
> > provide
> > numbers).
> As Rich has put, this strcat optimization seems to use the very
> strategy of strcpy/strlen and basically brings a constant gain with
> downside of i-cache overhead plus maintainability cost (since a
> bug in strcpy/strlen implementation will need to be checked against
> strcat as well).
> I also tend to agree with Rich that strcat is an antipattern, and I
> would prefer we use simpler generic optimization for such cases.
> > > Wouldn't be simple and more i-cache friendly to use a custom
> > > generic
> > > implementation that calls AVX2 strcpy/strlen (such as powerpc64
> > > does)?
> > I believe this is what we have right now through IFUNC_SELECTOR.
> What I meant was use something like:
> char *strcat_avx2 (char *dest, const char *src)
> __strcpy_avx2 (dest + __strlen_avx2 (dest), src);
> return dest;
Thanks Rich and Adhemerval. Agree in all comments. V2 will be out soon.