This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] aarch64: optimize the unaligned case of memcmp


Hi,

So I had a look at it using the GLIBC bench-memcmp.c. I quickly got the
unaligned loop to go faster than the aligned one using ccmp, so I had to
tune the unaligned loop too... This is using a similar trick as your byte loop to
remove a branch and 1-2 ALU operations per iteration.

This gives a 24% speedup on both Cortex-A53 and Cortex-A72 for
the aligned loop, and about 18% for the unaligned loop on top of your
patch. Aligning either src1 or src2 appears best as there isn't enough
work in the loops to hide an unaligned access.

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]