[PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex

Noah Goldstein goldstein.w.n@gmail.com
Tue Sep 14 19:23:50 GMT 2021


On Tue, Sep 14, 2021 at 9:40 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Mon, Sep 13, 2021 at 11:30 PM Noah Goldstein <goldstein.w.n@gmail.com>
> wrote:
> >
> > No bug. This commit adds support for an optimized bcmp implementation.
> > Support is for sse2, sse4_1, avx2, and evex.
> >
> > All string tests passing and build succeeding.
>
> memcmp can be a little slower than bcmp.  But bcmp isn't a standard C
> function.
> All new codes should use memcmp.  Can you improve memcmp instead?


> Thanks.
>


There are some small improvements to memcmp I could imagine.

Use vptestm{b|d} instead of vpcmp for zero tests.
Aligning to 64 bytes so that target alignments/placements can be better
optimized.

But for the most part the biggest optimization is just reducing all the
work to compute
1/-1 for the result.

I think, however, that since GLIBC supports bcmp it makes sense to offer
the best
version we can.  Especially since compilers (Clang at least) will use it
when possible
to optimize memcmp usage.

I don't fully understand the concern with adding it. AFAICT if GLIBC
decides it no
longer wants to support bcmp we can remove it, but essentially the same work
would need to be done regardless. Can you elaborate on why?


>
>
> --
> H.J.
>


More information about the Libc-alpha mailing list