This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
- From: "Zhangxuelei (Derek)" <zhangxuelei4 at huawei dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, "siddhesh at gotplt dot org" <siddhesh at gotplt dot org>, jiangyikun <jiangyikun at huawei dot com>, "yikunkero at gmail dot com" <yikunkero at gmail dot com>
- Cc: nd <nd at arm dot com>
- Date: Sat, 26 Oct 2019 13:46:19 +0000
- Subject: Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
Hi Wilco, sorry for the delay in replying because we tested lot after modifing memmove.
> In order to select the right memmove implementation,
> multiarch/memmove.c needs similar changes as multiarch/memcpy.c.
That's true, we missed this patch and will submit it next patch.
> Also since the memmove entry sequence does both check for medium
> and large cases, the full overlap check should be done in both.
> Currently only sizes 96-512 benefit, not the move_long case:
Yes, we will add full overlap check in move_long case next patch.
And what confusing us now is that, we removed dst_unaligned code in memcpy according to the previous comments, which did not affect performance after testing in memcpy cases. But in the case when uses memmove function and enters the memcpy part, unaligned cases is significantly slower than aligned case according to the results of the first half part of memmove-walk as shown in the bottom. So do you think we should still remove dst_unaligned code?
We analyse the reason is of more judgement in the begin of memmove and may weak processor ability to handle this case, and so dst_unaligned make difference.
> Well it looks the dst_unaligned code (which deals with a specific
> issue on ThunderX2) ...
I remember you memtioned the specific issue on ThunderX2 before, could you tell us more about it?
Function: memmove
Variant: walk
__memmove_thunderx __memmove_thunderx2 __memmove_falkor __memmove_kunpeng2 __memmove_generic
========================================================================================================================
length=128: 33.99 (-73.69%) 18.65 ( 4.67%) 17.75 ( 9.29%) 19.21 ( 1.80%) 19.57
length=129: 35.41 ( 2.08%) 37.43 ( -3.51%) 35.87 ( 0.79%) 34.71 ( 4.01%) 36.16
length=256: 45.55 (-37.95%) 32.61 ( 1.23%) 35.59 ( -7.79%) 32.95 ( 0.20%) 33.02
length=257: 66.36 ( 4.20%) 69.50 ( -0.33%) 68.03 ( 1.80%) 68.53 ( 1.08%) 69.27
length=512: 82.77 (-34.10%) 65.67 ( -6.41%) 65.61 ( -6.30%) 60.13 ( 2.57%) 61.72
length=513: 146.19 ( 3.90%) 132.98 ( 12.59%) 132.28 ( 13.05%) 151.50 ( 0.41%) 152.12
length=1024: 155.75 (-26.13%) 142.74 (-15.60%) 126.97 ( -2.83%) 121.58 ( 1.53%) 123.48
length=1025: 289.15 ( 4.72%) 318.71 ( -5.02%) 262.97 ( 13.35%) 307.00 ( -1.16%) 303.48
length=2048: 298.85 (-22.16%) 233.98 ( 4.35%) 249.71 ( -2.08%) 245.37 ( -0.30%) 244.63
length=2049: 409.46 ( 14.62%) 399.08 ( 16.78%) 508.64 ( -6.07%) 465.79 ( 2.87%) 479.54
length=4096: 543.10 (-11.30%) 445.35 ( 8.73%) 491.40 ( -0.71%) 435.61 ( 10.73%) 487.95
length=4097: 680.95 ( 18.96%) 593.99 ( 29.31%) 990.52 (-17.89%) 882.91 ( -5.08%) 840.23
length=8192: 1047.46 ( -8.01%) 867.03 ( 10.59%) 977.80 ( -0.83%) 850.57 ( 12.29%) 969.74
length=8193: 1224.46 ( 21.97%) 979.34 ( 37.59%) 1981.71 (-26.29%) 1714.96 ( -9.29%) 1569.12
length=16384: 2055.73 ( -5.42%) 1701.01 ( 12.77%) 1944.38 ( 0.29%) 1683.51 ( 13.67%) 1950.11
length=16385: 2314.62 ( 23.38%) 1774.44 ( 41.26%) 3967.45 (-31.34%) 3385.52 (-12.07%) 3020.82
length=32768: 5153.99 (-32.25%) 3426.50 ( 12.08%) 3875.16 ( 0.56%) 3338.91 ( 14.32%) 3897.16
length=32769: 5343.41 ( 9.64%) 3375.50 ( 42.92%) 7925.06 (-34.01%) 6716.28 (-13.57%) 5913.72
length=65536: 10361.70 (-35.90%) 6768.32 ( 11.23%) 7759.75 ( -1.78%) 6658.73 ( 12.66%) 7624.32
length=65537: 10284.00 ( 12.00%) 6528.85 ( 44.13%) 15844.40 (-35.58%) 13437.90 (-14.98%) 11686.80
length=131072: 20539.30 (-34.71%) 13672.50 ( 10.33%) 15567.10 ( -2.10%) 13325.60 ( 12.60%) 15247.50
length=131073: 20868.20 ( 10.97%) 12807.80 ( 45.36%) 31605.90 (-34.83%) 26788.20 (-14.28%) 23440.70
length=262144: 41304.50 (-35.25%) 26883.30 ( 11.97%) 31038.70 ( -1.63%) 26533.40 ( 13.12%) 30539.40
length=262145: 41157.90 ( 12.84%) 25568.20 ( 45.85%) 63229.00 (-33.90%) 53525.00 (-13.35%) 47220.50
length=524288: 81777.00 (-32.88%) 54133.00 ( 12.04%) 61853.30 ( -0.51%) 52869.40 ( 14.09%) 61542.20
length=524289: 81986.90 ( 14.71%) 50562.00 ( 47.40%) 126255.00 (-31.33%) 105969.00 (-10.23%) 96132.70
length=1048576: 163628.00 (-33.00%) 107776.00 ( 12.00%) 123819.00 ( -1.00%) 105831.00 ( 14.00%) 123170.00
length=1048577: 177503.00 ( 12.00%) 98680.60 ( 51.09%) 253068.00 (-26.00%) 211155.00 ( -5.00%) 201763.00
length=2097152: 336756.00 (-34.00%) 224097.00 ( 11.00%) 254575.00 ( -1.00%) 219864.00 ( 13.00%) 253124.00
length=2097153: 373590.00 ( 9.00%) 214822.00 ( 48.00%) 506479.00 (-23.00%) 426299.00 ( -3.00%) 414899.00
length=4194304: 662606.00 (-35.00%) 437195.00 ( 11.00%) 497288.00 ( -2.00%) 427614.00 ( 13.00%) 491729.00
length=4194305: 697910.00 ( 9.00%) 417656.00 ( 45.00%) 1020670.00 (-32.62%) 856051.00 (-12.00%) 769599.00
length=8388608: 1307990.00 (-34.88%) 852030.00 ( 12.00%) 983092.00 ( -2.00%) 834918.00 ( 13.00%) 969712.00
length=8388609: 1416420.00 ( 8.70%) 821262.00 ( 47.06%) 2030660.00 (-30.89%) 1708360.00 (-10.11%) 1551450.00
length=16777216: 2586380.00 (-33.02%) 1702120.00 ( 12.46%) 1970000.00 ( -1.32%) 1676900.00 ( 13.76%) 1944360.00
length=16777217: 2796060.00 ( 13.29%) 1627720.00 ( 49.52%) 4079100.00 (-26.51%) 3410640.00 ( -5.77%) 3224440.00
length=33554432: 5241680.00 (-33.96%) 3488860.00 ( 10.84%) 4890730.00 (-24.99%) 3474630.00 ( 11.20%) 3912900.00
length=33554433: 5666550.00 ( 14.71%) 3357520.00 ( 49.46%) 8039630.00 (-21.01%) 6824230.00 ( -2.72%) 6643780.00