This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] aarch64: optimize the unaligned case of memcmp
On Tue, Jun 27, 2017 at 10:11 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> So I had a look at it using the GLIBC bench-memcmp.c. I quickly got the
> unaligned loop to go faster than the aligned one using ccmp, so I had to
> tune the unaligned loop too... This is using a similar trick as your byte loop to
> remove a branch and 1-2 ALU operations per iteration.
> This gives a 24% speedup on both Cortex-A53 and Cortex-A72 for
> the aligned loop, and about 18% for the unaligned loop on top of your
> patch. Aligning either src1 or src2 appears best as there isn't enough
> work in the loops to hide an unaligned access.
This looks good. Could you please send a patch?