[PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
Will Newton
will.newton@linaro.org
Thu Apr 11 20:14:00 GMT 2013
On 11 April 2013 18:23, Ramana Radhakrishnan <ramrad01@arm.com> wrote:
> On 04/03/13 17:43, Will Newton wrote:
>>
>>
>> * libc/machine/arm/memcpy-stub.c: Use generic memcpy if unaligned
>> access is not enabled.
>> * libc/machine/arm/memcpy.S: Faster memcpy implementation for
>> Cortex A15 cores using NEON and VFP if available.
>>
>> Signed-off-by: Will Newton <will.newton@linaro.org>
>
>
>
> You know this looks really good but what benefit does it provide on A15 ?
>
>
> While this
> http://sourceware.org/ml/newlib/2013/msg00176.html
>
> mentions improvements on a pandaboard - the subject indicates that it
> improves performance on an A15 :)
I'm glad you asked Ramana! ;-)
Below are some numbers form a Cortex A15 I just acquired. The pattern
seems pretty similar, faster in the majority of cases for the
cortex-strings benchmarks.
newlib:memcpy:8:1000000000:1:18.354787: took 18.354787 s for
1000000000 calls to memcpy of 8 bytes. ~813.352 MB/s corrected.
this:memcpy:8:1000000000:1:12.757507: took 12.757507 s for 1000000000
calls to memcpy of 8 bytes. ~2016.807 MB/s corrected.
newlib:memcpy:8:1000000000:2:18.288893: took 18.288893 s for
1000000000 calls to memcpy of 8 bytes. ~819.106 MB/s corrected.
this:memcpy:8:1000000000:2:10.625291: took 10.625291 s for 1000000000
calls to memcpy of 8 bytes. ~4621.940 MB/s corrected.
newlib:memcpy:8:1000000000:4:11.211078: took 11.211078 s for
1000000000 calls to memcpy of 8 bytes. ~3411.343 MB/s corrected.
this:memcpy:8:1000000000:4:10.988971: took 10.988971 s for 1000000000
calls to memcpy of 8 bytes. ~3787.482 MB/s corrected.
newlib:memcpy:8:1000000000:8:11.208891: took 11.208891 s for
1000000000 calls to memcpy of 8 bytes. ~3414.683 MB/s corrected.
this:memcpy:8:1000000000:8:12.976570: took 12.976570 s for 1000000000
calls to memcpy of 8 bytes. ~1906.410 MB/s corrected.
newlib:memcpy:16:1000000000:1:19.468156: took 19.468156 s for
1000000000 calls to memcpy of 16 bytes. ~1454.110 MB/s corrected.
this:memcpy:16:1000000000:1:13.570801: took 13.570801 s for 1000000000
calls to memcpy of 16 bytes. ~3319.870 MB/s corrected.
newlib:memcpy:16:1000000000:2:19.467872: took 19.467872 s for
1000000000 calls to memcpy of 16 bytes. ~1454.150 MB/s corrected.
this:memcpy:16:1000000000:2:12.080183: took 12.080183 s for 1000000000
calls to memcpy of 16 bytes. ~4913.341 MB/s corrected.
newlib:memcpy:16:1000000000:4:12.388744: took 12.388744 s for
1000000000 calls to memcpy of 16 bytes. ~4469.287 MB/s corrected.
this:memcpy:16:1000000000:4:13.568817: took 13.568817 s for 1000000000
calls to memcpy of 16 bytes. ~3321.303 MB/s corrected.
newlib:memcpy:16:1000000000:8:12.388740: took 12.388740 s for
1000000000 calls to memcpy of 16 bytes. ~4469.292 MB/s corrected.
this:memcpy:16:1000000000:8:11.215983: took 11.215983 s for 1000000000
calls to memcpy of 16 bytes. ~6807.755 MB/s corrected.
newlib:memcpy:20:1000000000:1:19.467851: took 19.467851 s for
1000000000 calls to memcpy of 20 bytes. ~1817.691 MB/s corrected.
this:memcpy:20:1000000000:1:13.568949: took 13.568949 s for 1000000000
calls to memcpy of 20 bytes. ~4151.510 MB/s corrected.
newlib:memcpy:20:1000000000:2:19.467863: took 19.467863 s for
1000000000 calls to memcpy of 20 bytes. ~1817.689 MB/s corrected.
this:memcpy:20:1000000000:2:12.028355: took 12.028355 s for 1000000000
calls to memcpy of 20 bytes. ~6245.911 MB/s corrected.
newlib:memcpy:20:1000000000:4:17.108270: took 17.108270 s for
1000000000 calls to memcpy of 20 bytes. ~2345.004 MB/s corrected.
this:memcpy:20:1000000000:4:13.568996: took 13.568996 s for 1000000000
calls to memcpy of 20 bytes. ~4151.468 MB/s corrected.
newlib:memcpy:20:1000000000:8:17.108325: took 17.108325 s for
1000000000 calls to memcpy of 20 bytes. ~2344.988 MB/s corrected.
this:memcpy:20:1000000000:8:13.060007: took 13.060007 s for 1000000000
calls to memcpy of 20 bytes. ~4668.687 MB/s corrected.
newlib:memcpy:31:1000000000:1:41.277268: took 41.277268 s for
1000000000 calls to memcpy of 31 bytes. ~915.216 MB/s corrected.
this:memcpy:31:1000000000:1:12.347025: took 12.347025 s for 1000000000
calls to memcpy of 31 bytes. ~8766.365 MB/s corrected.
newlib:memcpy:31:1000000000:2:41.242007: took 41.242007 s for
1000000000 calls to memcpy of 31 bytes. ~916.216 MB/s corrected.
this:memcpy:31:1000000000:2:12.458796: took 12.458796 s for 1000000000
calls to memcpy of 31 bytes. ~8485.143 MB/s corrected.
newlib:memcpy:31:1000000000:4:36.780817: took 36.780817 s for
1000000000 calls to memcpy of 31 bytes. ~1063.212 MB/s corrected.
this:memcpy:31:1000000000:4:15.926538: took 15.926538 s for 1000000000
calls to memcpy of 31 bytes. ~4252.614 MB/s corrected.
newlib:memcpy:31:1000000000:8:37.720629: took 37.720629 s for
1000000000 calls to memcpy of 31 bytes. ~1028.452 MB/s corrected.
this:memcpy:31:1000000000:8:15.925489: took 15.925489 s for 1000000000
calls to memcpy of 31 bytes. ~4253.255 MB/s corrected.
newlib:memcpy:32:1000000000:1:41.131749: took 41.131749 s for
1000000000 calls to memcpy of 32 bytes. ~949.014 MB/s corrected.
this:memcpy:32:1000000000:1:16.516581: took 16.516581 s for 1000000000
calls to memcpy of 32 bytes. ~4046.361 MB/s corrected.
newlib:memcpy:32:1000000000:2:41.271609: took 41.271609 s for
1000000000 calls to memcpy of 32 bytes. ~944.904 MB/s corrected.
this:memcpy:32:1000000000:2:16.509494: took 16.509494 s for 1000000000
calls to memcpy of 32 bytes. ~4050.167 MB/s corrected.
newlib:memcpy:32:1000000000:4:17.697097: took 17.697097 s for
1000000000 calls to memcpy of 32 bytes. ~3498.720 MB/s corrected.
this:memcpy:32:1000000000:4:16.514984: took 16.514984 s for 1000000000
calls to memcpy of 32 bytes. ~4047.218 MB/s corrected.
newlib:memcpy:32:1000000000:8:17.696875: took 17.696875 s for
1000000000 calls to memcpy of 32 bytes. ~3498.809 MB/s corrected.
this:memcpy:32:1000000000:8:16.516808: took 16.516808 s for 1000000000
calls to memcpy of 32 bytes. ~4046.239 MB/s corrected.
newlib:memcpy:63:1000000000:1:26.545693: took 26.545693 s for
1000000000 calls to memcpy of 63 bytes. ~3419.337 MB/s corrected.
this:memcpy:63:1000000000:1:17.114647: took 17.114647 s for 1000000000
calls to memcpy of 63 bytes. ~7380.975 MB/s corrected.
newlib:memcpy:63:1000000000:2:26.545417: took 26.545417 s for
1000000000 calls to memcpy of 63 bytes. ~3419.390 MB/s corrected.
this:memcpy:63:1000000000:2:17.105119: took 17.105119 s for 1000000000
calls to memcpy of 63 bytes. ~7389.625 MB/s corrected.
newlib:memcpy:63:1000000000:4:23.006271: took 23.006271 s for
1000000000 calls to memcpy of 63 bytes. ~4281.848 MB/s corrected.
this:memcpy:63:1000000000:4:16.225055: took 16.225055 s for 1000000000
calls to memcpy of 63 bytes. ~8286.581 MB/s corrected.
newlib:memcpy:63:1000000000:8:23.006052: took 23.006052 s for
1000000000 calls to memcpy of 63 bytes. ~4281.915 MB/s corrected.
this:memcpy:63:1000000000:8:17.128080: took 17.128080 s for 1000000000
calls to memcpy of 63 bytes. ~7368.814 MB/s corrected.
newlib:memcpy:64:1000000000:1:26.545284: took 26.545284 s for
1000000000 calls to memcpy of 64 bytes. ~3473.693 MB/s corrected.
this:memcpy:64:1000000000:1:21.279753: took 21.279753 s for 1000000000
calls to memcpy of 64 bytes. ~4960.130 MB/s corrected.
newlib:memcpy:64:1000000000:2:26.545591: took 26.545591 s for
1000000000 calls to memcpy of 64 bytes. ~3473.632 MB/s corrected.
this:memcpy:64:1000000000:2:21.236653: took 21.236653 s for 1000000000
calls to memcpy of 64 bytes. ~4977.564 MB/s corrected.
newlib:memcpy:64:1000000000:4:24.775816: took 24.775816 s for
1000000000 calls to memcpy of 64 bytes. ~3862.687 MB/s corrected.
this:memcpy:64:1000000000:4:21.236529: took 21.236529 s for 1000000000
calls to memcpy of 64 bytes. ~4977.615 MB/s corrected.
newlib:memcpy:64:1000000000:8:15.337350: took 15.337350 s for
1000000000 calls to memcpy of 64 bytes. ~9592.576 MB/s corrected.
this:memcpy:64:1000000000:8:18.287111: took 18.287111 s for 1000000000
calls to memcpy of 64 bytes. ~6554.103 MB/s corrected.
newlib:memcpy:100:1000000000:1:48.936478: took 48.936478 s for
1000000000 calls to memcpy of 100 bytes. ~2386.460 MB/s corrected.
this:memcpy:100:1000000000:1:25.660808: took 25.660808 s for
1000000000 calls to memcpy of 100 bytes. ~5715.345 MB/s corrected.
newlib:memcpy:100:1000000000:2:48.840026: took 48.840026 s for
1000000000 calls to memcpy of 100 bytes. ~2392.234 MB/s corrected.
this:memcpy:100:1000000000:2:25.660839: took 25.660839 s for
1000000000 calls to memcpy of 100 bytes. ~5715.334 MB/s corrected.
newlib:memcpy:100:1000000000:4:21.236431: took 21.236431 s for
1000000000 calls to memcpy of 100 bytes. ~7777.585 MB/s corrected.
this:memcpy:100:1000000000:4:26.545690: took 26.545690 s for
1000000000 calls to memcpy of 100 bytes. ~5427.519 MB/s corrected.
newlib:memcpy:100:1000000000:8:24.775759: took 24.775759 s for
1000000000 calls to memcpy of 100 bytes. ~6035.470 MB/s corrected.
this:memcpy:100:1000000000:8:21.236774: took 21.236774 s for
1000000000 calls to memcpy of 100 bytes. ~7777.368 MB/s corrected.
newlib:memcpy:200:1000000000:1:34.214364: took 34.214364 s for
1000000000 calls to memcpy of 200 bytes. ~7556.920 MB/s corrected.
this:memcpy:200:1000000000:1:33.625015: took 33.625015 s for
1000000000 calls to memcpy of 200 bytes. ~7737.592 MB/s corrected.
newlib:memcpy:200:1000000000:2:34.214418: took 34.214418 s for
1000000000 calls to memcpy of 200 bytes. ~7556.903 MB/s corrected.
this:memcpy:200:1000000000:2:33.633800: took 33.633800 s for
1000000000 calls to memcpy of 200 bytes. ~7734.836 MB/s corrected.
newlib:memcpy:200:1000000000:4:31.854609: took 31.854609 s for
1000000000 calls to memcpy of 200 bytes. ~8336.311 MB/s corrected.
this:memcpy:200:1000000000:4:33.634330: took 33.634330 s for
1000000000 calls to memcpy of 200 bytes. ~7734.670 MB/s corrected.
newlib:memcpy:200:1000000000:8:27.135389: took 27.135389 s for
1000000000 calls to memcpy of 200 bytes. ~10502.565 MB/s corrected.
this:memcpy:200:1000000000:8:30.106632: took 30.106632 s for
1000000000 calls to memcpy of 200 bytes. ~9025.865 MB/s corrected.
More information about the Newlib
mailing list