This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor


On 14/10/2019 04:44, Xuelei Zhang wrote:
> This is an optimized implementation of the memcpy and memmove on the
> Huawei Kunpeng processor.
> 
> Based on the prefetch mechanism on Kunpeng arch, branch to handle 96
> to 2K bytes in memcpy is written without prfm instruction. Hence,
> memcpy has an optimization effect above 128 bytes, 18% improvement
> for copies above 2K bytes, and 38% for larger bytes, such as 32M
> bytes around.
> 
> And for memmove, there are two main changes: i) Q register is used
> instead of  X register. ii) dst address is aligned instead of src
> address aligned to improve store operation. Hence, memmove
> implementation also has improvement above 128 bytes, that about 30%
> for 2k to 8M bytes, and about 50% for 32M or more.

i'd like to work on a generic memcpy that's acceptable
instead of minor variations of memcpy per uarch, i'll
have to take a look why this one is different from all
the others.

it would be nice to see the memcpy-random benchmarks too.

stopping prefetch at 2k is surprising.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]