This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] Fixing strcmp performance on power7 for unaligned loads.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 20 Aug 2015 08:58:02 +0200
- Subject: Re: [RFC] Fixing strcmp performance on power7 for unaligned loads.
- Authentication-results: sourceware.org; auth=none
- References: <20150818211826 dot GA8700 at domone> <55D4E200 dot 9060003 at linaro dot org>
On Wed, Aug 19, 2015 at 05:07:28PM -0300, Adhemerval Zanella wrote:
> Hi
>
> Thanks for checking on that. Comments below:
>
> On 18-08-2015 18:18, OndÅej BÃlka wrote:
> > Hi,
> >
> > As I told before that benchmarks should be read or they are useless so I
> > looked on powerpc ones. I noticed that power7 strcmp and strncmp are
> > about five times slower than memcmp for unaligned case.
> >
> > Thats too much so I could easily improve performance by 50% on that case by
> > implementing strcmp as strnlen+memcmp loop despite overhead of strnlen.
> > As that loop is due that overhead lot slower than aligned data it should be fixed in
> > assembly by changing unaligned case to follow pattern in following c
> > code.
> >
> [...]
> > +
> > +# include "libc-internal.h"
> > +int __strcmp_power7b(const char *a, const char *b)
> > +{
> > + size_t len;
> > + int ret;
> > + len = __strnlen_power7 (a, 64);
> > + len = __strnlen_power7 (b, len);
> > + if (len != 64)
> > + {
> > + return __memcmp_power7 (a, b, len + 1);
> > + }
> > + ret = __memcmp_power7 (a, b, 64);
> > + if (ret)
> > + return ret;
> > +
> > + const char *a_old = a;
> > + a = PTR_ALIGN_DOWN (a + 64, 64);
> > + b += a - a_old;
> > +
> > + while (1)
> > + {
> > + len = __strnlen_power7 (b, 64);
> > + if (len != 64)
> > + {
> > + return __memcmp_power7 (a, b, len + 1);
> > + }
> > +
> > + ret = __memcmp_power7 (a, b, 64);
> > + if (ret)
> > + return ret;
> > + a+=64;
> > + b+=64;
> > + }
> > +}
> >
> > libc_ifunc (strcmp,
> > (hwcap2 & PPC_FEATURE2_ARCH_2_07)
> >
>
> Indeed this seems a better strategy, although I am not convinced it will have
> much gain by aligning the 'a' source. The strnlen do take the source alignment
> in consideration (aligned and unaligned will take the same path), and memcmp
> implementation will take the unaligned path anyway (since although 'a' is
> aligned, 'b' won't be).
>
You need that to be able to use memcmp as it could segfault by reading
past end which doesn't happen on aligned case. That is unless
particular memcmp guarantees it doesn't fault by reading cross-page
boundary which several implementations do.
> Using a similar strategy as you did:
>
> int __strcmp_power7c (const char *a, const char *b)
> {
> if (IS_ALIGN(a, 8) && IS_ALIGN(b, 8))
> return __strcmp_power7 (a, b);
>
> while (1)
> {
> size_t len = __strnlen_power7 (b, 64);
> if (len != 64)
> {
> return __memcmp_power7 (a, b, len + 1);
> }
>
> int ret = __memcmp_power7 (a, b, 64);
> if (ret)
> return ret;
> a+=64;
> b+=64;
> }
> }
>
And as strncmp I was tired when I wrote previous mail so implementation
is following, bug was that I forgot to consider checking null in limit.
int __strncmp_power7b (char *a, char *b, size_t l)
{
size_t len;
int ret;
if (l==0)
return 0;
l--;
len = strnlen (a, l < 64 ? l : 64);
len = strnlen (b, len);
if (len != 64)
{
return memcmp (a, b, len + 1);
}
ret = memcmp (a, b, 64);
if (ret)
return ret;
const char *a_old = a;
a = ALIGN_DOWN (a + 64, 64);
b += a - a_old;
l -= a - a_old;
while (1)
{
len = strnlen (b, l < 64 ? l : 64);
if (len != 64)
{
return memcmp (a, b, len + 1);
}
ret = memcmp (a, b, 64);
if (ret)
return ret;
a+=64;
b+=64;
l -= 64;
}
}