This is the mail archive of the
mailing list for the libc-ports project.
Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- From: Will Newton <will dot newton at linaro dot org>
- To: "Carlos O'Donell" <carlos at redhat dot com>
- Cc: "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>, Patch Tracking <patches at linaro dot org>, Ondřej Bílka <neleai at seznam dot cz>, Siddhesh Poyarekar <siddhesh at redhat dot com>
- Date: Mon, 2 Sep 2013 15:18:28 +0100
- Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- Authentication-results: sourceware.org; auth=none
- References: <520894D5 dot 7060207 at linaro dot org> <CANu=DmiBHoymFKTvaW_VsdhWZEYwkfViz1tTeRgj7H80f0FntA at mail dot gmail dot com> <5220D30B dot 9080306 at redhat dot com> <CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ at mail dot gmail dot com> <5220F1F0 dot 80501 at redhat dot com>
On 30 August 2013 20:26, Carlos O'Donell <email@example.com> wrote:
> On 08/30/2013 02:48 PM, Will Newton wrote:
>> On 30 August 2013 18:14, Carlos O'Donell <firstname.lastname@example.org> wrote:
>> Hi Carlos,
>>>>> A small change to the entry to the aligned copy loop improves
>>>>> performance slightly on A9 and A15 cores for certain copies.
>>>>> 2013-08-07 Will Newton <email@example.com>
>>>>> * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
>>>>> on entry to aligned copy loop for improved performance.
>>>>> ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>> How did you test the performance?
>>> glibc has a performance microbenchmark, did you use that?
>> No, I used the cortex-strings package developed by Linaro for
>> benchmarking various string functions against one another.
>> I haven't checked the glibc benchmarks but I'll look into that. It's
>> quite a specific case that shows the problem so it may not be obvious
>> which one is better however.
> If it's not obvious how is someone supposed to review this patch? :-)
>>  https://launchpad.net/cortex-strings
> There are 2 benchmarks. One appears to be dhrystone 2.1, which isn't a string
> test in and of itself which should not be used for benchmarking or changing
> string functions. The other is called "multi" and appears to run some functions
> in a loop and take the time.
> I would not call `multi' exhaustive, and while neither is the glibc performance
> benchmark tests the glibc tests have received review from the glibc community
> and are our preferred way of demonstrating performance gains when posting
> performance patches.
> I would really really like to see you post the results of running your new
> implementation with this benchmark and show the numbers that claim this is
> faster. Is that possible?
The mailing list server does not seem to accept image attachments so I
have uploaded the performance graph here:
Toolchain Working Group, Linaro