[PATCH] aarch64: revert memcpy optimze for kunpeng to avoid performance degradation

Adhemerval Zanella adhemerval.zanella@linaro.org
Thu Jan 21 16:41:42 GMT 2021


On 20/01/2021 22:55, Zhangxuelei (Derek) wrote:
> Hi,
> 
> They are my colleagues and we have certified this results together. It would be better to revert the original selection according to the negative performance of a specific product. And we will still study for a better or more balanced version of memcpy on Kunpeng.
> 
> Thank you~

This is ok for 2.33, please commit.

> 
> -----邮件原件-----
> 发件人: Adhemerval Zanella [mailto:adhemerval.zanella@linaro.org] 
> 发送时间: 2021年1月20日 21:09
> 收件人: wangshuo (AF) <wangshuo47@huawei.com>; Zhangxuelei (Derek) <zhangxuelei4@huawei.com>; libc-alpha@sourceware.org
> 抄送: Hushiyuan <hushiyuan@huawei.com>; liqingqing (C) <liqingqing3@huawei.com>
> 主题: Re: [PATCH] aarch64: revert memcpy optimze for kunpeng to avoid performance degradation
> 
> Hi,
> 
> Since I don't have access to this specific hardware, it would be good if the original author, Xuelei Zhang, of the change could certify this reversion is ok.
> 
> It should be ok during the freeze since it just a selection of an already tested implementation for an specific chip implementation.
> 
> On 20/01/2021 04:20, Shuo Wang wrote:
>> In commit 863d775c481704baaa41855fc93e5a1ca2dc6bf6, kunpeng920 is 
>> added to default memcpy version, however, there is performance degradation when the copy size is some large bytes, eg: 100k.
>> This is the result, tested in glibc-2.28:
>>              before backport  after backport	 Performance improvement
>> memcpy_1k      0.005              0.005                 0.00%
>> memcpy_10k     0.032              0.029                 10.34%
>> memcpy_100k    0.356              0.429                 -17.02%
>> memcpy_1m      7.470              11.153                -33.02%
>>
>> This is the demo
>> #include "stdio.h"
>> #include "string.h"
>> #include "stdlib.h"
>>
>> char a[1024*1024] = {12};
>> char b[1024*1024] = {13};
>> int main(int argc, char *argv[])
>> {
>>     int i = atoi(argv[1]);
>>     int j;
>>     int size = atoi(argv[2]);
>>     
>>     for (j = 0; j < i; j++)
>>         memcpy(b, a, size*1024);
>>     return 0;
>> }
>>
>> # gcc -g -O0 memcpy.c -o memcpy
>> # time taskset -c 10 ./memcpy 100000 1024
>>
>> Co-authored-by: liqingqing <liqingqing3@huawei.com>
>>
>> ---
>>  sysdeps/aarch64/multiarch/memcpy.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/sysdeps/aarch64/multiarch/memcpy.c 
>> b/sysdeps/aarch64/multiarch/memcpy.c
>> index 27259d3386..0e0a5cbcfb 100644
>> --- a/sysdeps/aarch64/multiarch/memcpy.c
>> +++ b/sysdeps/aarch64/multiarch/memcpy.c
>> @@ -37,7 +37,7 @@ extern __typeof (__redirect_memcpy) __memcpy_falkor 
>> attribute_hidden;  libc_ifunc (__libc_memcpy,
>>              (IS_THUNDERX (midr)
>>  	     ? __memcpy_thunderx
>> -	     : (IS_FALKOR (midr) || IS_PHECDA (midr) || IS_KUNPENG920 (midr)
>> +	     : (IS_FALKOR (midr) || IS_PHECDA (midr)
>>  		? __memcpy_falkor
>>  		: (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr)
>>  		  ? __memcpy_thunderx2
>>


More information about the Libc-alpha mailing list