This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2


This patch went for a V2  but I found out that the latter version is
not as fast as V1 [1]. 

is this ready for merging?

[1] https://sourceware.org/ml/libc-alpha/2018-12/msg00261.html


On Wed, 2018-10-31 at 17:59 -0300, Adhemerval Zanella wrote:
> 
> On 31/10/2018 17:23, Leonardo Sandoval wrote:
> > On Wed, 2018-10-31 at 15:36 -0300, Adhemerval Zanella wrote:
> > > 
> > > > diff --git a/sysdeps/x86_64/multiarch/strcat-avx2.S
> > > > b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > > > new file mode 100644
> > > > index 00000000000..b0623564276
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > > > @@ -0,0 +1,275 @@
> > > > +/* strcat with AVX2
> > > 
> > > Is this really a gain on real work usage comparing to generic
> > > strcat
> > > (
> > > (strcpy (dest + strlen (dest), src)) assuming optimized strcpy /
> > > strlen?
> > 
> > The speedup is briefly summarized on the commit description but the
> > comparison was done between SSE2 Unaligned and AVX2 (current code).
> > Either SSE2 and AVX2 are much faster the generic strcat (I can
> > provide
> > numbers).
> > 
> 
> As Rich has put, this strcat optimization seems to use the very
> strategy of strcpy/strlen and basically brings a constant gain with
> downside of i-cache overhead plus maintainability cost (since a
> possible 
> bug in strcpy/strlen implementation will need to be checked against
> strcat as well).
> 
> I also tend to agree with Rich that strcat is an antipattern, and I
> would prefer we use simpler generic optimization for such cases.
> 
> > > Wouldn't be simple and more i-cache friendly to use a custom
> > > generic 
> > > implementation that calls AVX2 strcpy/strlen (such as powerpc64
> > > does)?
> > 
> > I believe this is what we have right now through IFUNC_SELECTOR.
> > 
> 
> What I meant was use something like:
> 
> char *strcat_avx2 (char *dest, const char *src)
> {
>   __strcpy_avx2 (dest + __strlen_avx2 (dest), src);
>   return dest;
> }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]