This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] aarch64: Optimized memcmp for medium to large sizes
- From: Siddhesh Poyarekar <siddhesh at sourceware dot org>
- To: Marcus Shawcroft <marcus dot shawcroft at gmail dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>, szabolcs dot nagy at arm dot com, Wilco dot Dijkstra at arm dot com
- Date: Wed, 21 Feb 2018 14:46:59 +0530
- Subject: Re: [PATCH] aarch64: Optimized memcmp for medium to large sizes
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org> <CAFqB+Pym-BndY93fCDvU9dFtMk3hvbgDZGDP_f3=KWTrSstHMQ@mail.gmail.com> <email@example.com>
- Reply-to: siddhesh at sourceware dot org
On Monday 12 February 2018 07:41 PM, Siddhesh Poyarekar wrote:
> On Monday 12 February 2018 05:07 PM, Marcus Shawcroft wrote:
>> Thanks for sharing the performance numbers on these two
>> u-architectures. Have you looked at the impact of this patch on
>> performance of the various other aarch64 u-architectures? If so
>> please share your findings.
> I don't have ready access to any other u-arch (unless you want me to
> share more performance numbers on other Qualcomm cores, but I suspect
> you don't ;)), but I'll ask around and let you know.
Sorry it took me a bit long but I was finally able to get my hands on a
HiKey960 (which has A73 and A53 cores) and set it up. I isolated
bench-memcmp on each of those types of cores one at a time and both do
fairly well with the new memcmp.
On the a73 core, performance of the 128 byte to 4K compares take up to
11%-33% less time. On the a53 core, the same range takes between 8%-30%
less time. Numbers for smaller sizes are unstable and fluctuating
between -30% and +30% for the same inputs. I have not found a way to
stabilize these numbers in the benchmark.
So now we have numbers for 4 micro-architectures, all showing positive
gains for medium sizes and non-significant changes in performance for
the smaller sizes.