This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor


Hi Derek,

>> There shouldn't be any performance difference between the two cases.

> With the help of our professors, the reason of different performance in unaligned
> case has been found. It is because the streaming write feature of Kunpeng processor
> is not triggered in unaligned case, that means the data needs to be read first and
> then written, time-consumingly. 

So is there a way to force write streaming, for example by aligning the source rather
than the destination or use particular instructions?

> And the implementation of v1 patch happens to avoid this problem, which seems a
> better choice for Kunpeng processor at now.

I don't believe it always helps - there is still a large factor between good and bad cases,
like these results from the v1 memcpy:

   length=1048578:    305883.00 (  0.00%)	   182002.00 ( 40.00%)	   120063.00 ( 60.00%)	   292063.00 (  4.00%)	   306638.00

Here the Falkor variant is 2.4 times faster...

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]