[PATCH][AArch64] Tune memcpy
Fri Nov 6 15:00:00 GMT 2015
On Fri, Nov 6, 2015 at 10:34 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> firstname.lastname@example.org wrote:
>> > def_fn memcpy p2align=6
>> > + prfm PLDL1KEEP, [src]
>> Why keep rather than strm for the prefetches?
> It improves small copies by prefetching the input immediately, so setting
> it to streaming would have an adverse effect as it claims the line will not
> be used again. For huge copies the initial prefetch has no effect.
This might be true on ARM's cores but not on all AARCH64 cores.
For ThunderX, we don't have a hardware prefetcher and STRM does
immediately starts the fetching of that cache line and it marks the
cache line in L2 as not going to be used any time afterwards.
So it could improve the performance there over the keep one.
Also that is how I read AARCH64 spec as the same as ThunderX
implements STRM too.
More information about the Newlib