This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2
On Wed, Oct 31, 2018 at 03:36:10PM -0300, Adhemerval Zanella wrote:
>
>
>
> > diff --git a/sysdeps/x86_64/multiarch/strcat-avx2.S b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > new file mode 100644
> > index 00000000000..b0623564276
> > --- /dev/null
> > +++ b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > @@ -0,0 +1,275 @@
> > +/* strcat with AVX2
>
> Is this really a gain on real work usage comparing to generic strcat (
> (strcpy (dest + strlen (dest), src)) assuming optimized strcpy / strlen?
> Wouldn't be simple and more i-cache friendly to use a custom generic
> implementation that calls AVX2 strcpy/strlen (such as powerpc64 does)?
I second this, and fail to see the advantage of increasing the volume
of asm without a good reason. In this case specifically:
- Improvement over trivial strcpy(dest+strlen(dest),src), assuming
those functions are optimized, is at best a constant difference in
overhead, vs the O(m+n) runtime of the operation.
- Use of strcat at all is a major antipattern, typically leading to
O(n²) time and buffer overflows. Thus optimizing it at all seems
dubious (further encouraging its use "because it's fast").
Rich