This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2


On Wed, Oct 31, 2018 at 03:36:10PM -0300, Adhemerval Zanella wrote:
> 
> 
> 
> > diff --git a/sysdeps/x86_64/multiarch/strcat-avx2.S b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > new file mode 100644
> > index 00000000000..b0623564276
> > --- /dev/null
> > +++ b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > @@ -0,0 +1,275 @@
> > +/* strcat with AVX2
> 
> Is this really a gain on real work usage comparing to generic strcat (
> (strcpy (dest + strlen (dest), src)) assuming optimized strcpy / strlen?
> Wouldn't be simple and more i-cache friendly to use a custom generic 
> implementation that calls AVX2 strcpy/strlen (such as powerpc64 does)?

I second this, and fail to see the advantage of increasing the volume
of asm without a good reason. In this case specifically:

- Improvement over trivial strcpy(dest+strlen(dest),src), assuming
  those functions are optimized, is at best a constant difference in
  overhead, vs the O(m+n) runtime of the operation.

- Use of strcat at all is a major antipattern, typically leading to
  O(n²) time and buffer overflows. Thus optimizing it at all seems
  dubious (further encouraging its use "because it's fast").

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]