[patch,arm] strcmp implementation using LDRD
Wed Feb 8 09:59:00 GMT 2012
The attached patch provides a new implementation of strcmp for ARM, using
LDRD instead of LDR whenever possible.
For older architectures that do not support LDRD, this implementation uses
the same algorithm as before.
This patch replaces strcmp.c with strcmp.S. The huge inline assembly from
strcmp.c was converted into plain assembly and included in strcmp.S under
the appropriate predefines.
Testing and benchmarking:
* Validation: successfully passes a test that compares different strings of
length 1-128 and offsets 0-8 from a word boundary. Checked on qemu/A15/A9,
ARM/Thumb mode, Big/Little Endian. This test is also added to newlib
testsuite as part of this patch.
* Integration with gcc: no regression on qemu for arm-none-eabi --with-cpu
a15/a9 --with-mode arm/thumb.
* Performance (relative to the current strcmp in newlib, only in ARM mode):
On Dhrystone, the new implementation (ldrd) is 22% faster on Cortex-A15
FPGA, and 16% on Cortex-A9 VE2.
On synthetic benchmarks, which measure the average number of cycles for
strcmp on strings of length 4-128K and offsets 0,1,2,3,4,8 from a word
boundary, where the strings are equal, the new implementation is three times
faster for long strings, when the input strings have the same offset from a
word boundary, and up to 30% faster in other cases, on both A15 FPGA and A9
2012-02-08 Greta Yorsh <Greta.Yorsh@arm.com>
* libc/machine/arm/strcmp.S: New File.
* libc/machine/arm/strcmp.c: Deleted.
* libc/machine/arm/Makefile.am: Replaces strcmp.c with strcmp.S
* libc/machine/arm/Makefile.in: Regenerated.
* testsuite/newlib.string/strcmp-1.c: New file.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the Newlib