[PATCH][AArch64] Tune memcpy

Andrew Pinski pinskia@gmail.com
Fri Nov 6 15:00:00 GMT 2015

On Fri, Nov 6, 2015 at 10:34 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> pinskia@gmail.com wrote:
>> > def_fn memcpy p2align=6
>> > +    prfm    PLDL1KEEP, [src]
>> Why keep rather than strm for the prefetches?
> It improves small copies by prefetching the input immediately,  so setting
> it to streaming would have an adverse effect as it claims the line will not
> be used again. For huge copies the initial prefetch has no effect.

This might be true on ARM's cores but not on all AARCH64 cores.
For ThunderX, we don't have a hardware prefetcher and STRM does
immediately starts the fetching of that cache line and it marks the
cache line in L2 as not going to be used any time afterwards.
So it could improve the performance there over the keep one.

Also that is how I read AARCH64 spec as the same as ThunderX
implements STRM too.


> Wilco

More information about the Newlib mailing list