This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][AArch64] Optimized memset
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: 'GNU C Library' <libc-alpha at sourceware dot org>
- Cc: nd <nd at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, "Marcus Shawcroft" <Marcus dot Shawcroft at arm dot com>
- Date: Fri, 15 Apr 2016 12:39:42 +0000
- Subject: Re: [PATCH][AArch64] Optimized memset
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <AM3PR08MB0088F02B88A2C6614843F2A383EE0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
ping
________________________________________
From: Wilco Dijkstra
Sent: 15 December 2015 16:39
To: 'GNU C Library'
Cc: nd
Subject: Re: [PATCH][AArch64] Optimized memset
ping
-----Original Message-----
From: Wilco Dijkstra [mailto:wdijkstr@arm.com]
Sent: 31 July 2015 16:02
To: 'GNU C Library'
Subject: [PATCH][AArch64] Optimized memset
This is an optimized memset for AArch64. Memset is split into 4 main cases: small sets of up to 16 bytes, medium of 16..96 bytes which are fully unrolled. Large memsets of more than 96 bytes align the destination and use an unrolled loop processing 64 bytes per iteration. Memsets of zero of more than 256 use the dc zva instruction, and there are faster versions for the common ZVA sizes 64 or 128. STP of Q registers is used to reduce codesize without loss of performance.
Speedup on test-memset is 1% on Cortex-A57 and 8% on Cortex-A53. On a random test with varying sizes and alignment the new version is 50% faster.
OK for commit?
ChangeLog:
2015-07-31 Wilco Dijkstra <wdijkstr@arm.com>
* sysdeps/aarch64/memset.S (__memset):
Rewrite of optimized memset.