This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH][AArch64] Enable _STRING_ARCH_unaligned


> Andrew Pinski wrote:
> On Thu, Aug 20, 2015 at 10:24 PM, Wilco Dijkstra <wdijkstr@arm.com> wrote:
> > +
> > +/* AArch64 implementations support efficient unaligned access.  */
> > +#define _STRING_ARCH_unaligned 1
> 
> I don't think this is 100% true.  On ThunderX, an unaligned store or
> load takes an extra 8 cycles (a full pipeline flush) as all unaligned
> load/stores have to be replayed.
> I think we should also benchmark  there to find out if this is a win
> because I doubt it is a win but I could be proved wrong.

That's bad indeed, but it would still be better than doing everything
one byte at a time. Eg. resolv/arpa/nameser.h does:

#define NS_GET32(l, cp) do { \
        const u_char *t_cp = (const u_char *)(cp); \
        (l) = ((u_int32_t)t_cp[0] << 24) \
            | ((u_int32_t)t_cp[1] << 16) \
            | ((u_int32_t)t_cp[2] << 8) \
            | ((u_int32_t)t_cp[3]) \
            ; \
        (cp) += NS_INT32SZ; \
} while (0)

This becomes an unaligned load plus byteswap with _STRING_ARCH_unaligned 
which should be faster even on ThunderX.

> Are there benchmarks for each of the uses of _STRING_ARCH_unaligned
> so I can do the benchmarking on ThunderX?

I don't believe there are.

> Also I don't see any benchmark results even for any of the other
> AARCH64 processors.

It's obvious it is a big on most of the uses of _STRING_ARCH_unaligned.
Eg. consider the encryption code in crypt/md5.c:

#if !_STRING_ARCH_unaligned
      if (UNALIGNED_P (buffer))
        while (len > 64)
          {
            __md5_process_block (memcpy (ctx->buffer, buffer, 64), 64, ctx);
            buffer = (const char *) buffer + 64;
            len -= 64;
          }
      else
#endif

So basically you end up doing an extra memcpy if unaligned access is not
supported. This means you'll not only do the unaligned loads anyway, but
you'll also do an extra aligned load and store to the buffer.

GLIBC use of _STRING_ARCH_unaligned is quite messy and would benefit from
a major cleanup, however it's quite clear enabling this is a win on overall.

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]