This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor


> Sz
> i'd like to work on a generic memcpy that's acceptable
> instead of minor variations of memcpy per uarch, i'll
> have to take a look why this one is different from all
> the others.

Yes it seems the key thing we need is a generic Q-register memcpy.

> it would be nice to see the memcpy-random benchmarks too.
> stopping prefetch at 2k is surprising.

Also it's odd it's based on the ThunderX2 variant rather than the Falkor
one, particularly since for large copies misaligned accesses are cheap.

Briefly looking at the data for memcpy, it seems the Falkor results are typically
faster for large copy sizes, eg. from 512KB to 4MBytes.

On the other hand, the memmove results for Kunpeng look genuinely faster than
the existing implementations - and that is without prefetching or special code to
handle unaligned cases. So that suggests to me these don't help much, and all
we need is code that does Q-register copies/moves (clearly using LDP/STP as that
is where the memmove seems to win).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]