This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH][AArch64] Optimized memcmp
- From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: nd at arm dot com
- Date: Tue, 08 Aug 2017 17:09:08 +0100
- Subject: Re: [PATCH][AArch64] Optimized memcmp
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs dot Nagy at arm dot com;
- Nodisclaimer: True
- References: <AM5PR0802MB2610D22D6378E8876E9B77DD83AA0@AM5PR0802MB2610.eurprd08.prod.outlook.com> <DB6PR0801MB205393A98B4357550962759B838A0@DB6PR0801MB2053.eurprd08.prod.outlook.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On 08/08/17 17:03, Wilco Dijkstra wrote:
> From: Wilco Dijkstra
> Sent: 07 July 2017 16:11
> To: email@example.com
> Cc: nd; Szabolcs Nagy
> Subject: [PATCH][AArch64] Optimized memcmp
> This is an optimized memcmp for AArch64. This is a complete rewrite
> using a different algorithm. The previous version split into cases
> where both inputs were aligned, the inputs were mutually aligned and
> unaligned using a byte loop. The new version combines all these cases,
> while small inputs of less than 8 bytes are handled separately.
> This allows the main code to be sped up using unaligned loads since
> there are now at least 8 bytes to be compared. After the first 8 bytes,
> align the first input. This ensures each iteration does at most one
> unaligned access and mutually aligned inputs behave as aligned.
> After the main loop, process the last 8 bytes using unaligned accesses.
> This improves performance of (mutually) aligned cases by 25% and
> unaligned by >500% (yes >6 times faster) on large inputs.
> 2017-07-07 Wilco Dijkstra <firstname.lastname@example.org>
> * sysdeps/aarch64/memcmp.S (memcmp):
> Rewrite of optimized memcmp.
ok to commit.