[PATCH][AArch64] Tune memcpy
Andrew Pinski
pinskia@gmail.com
Fri Nov 6 15:00:00 GMT 2015
On Fri, Nov 6, 2015 at 10:34 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> pinskia@gmail.com wrote:
>> > def_fn memcpy p2align=6
>> > + prfm PLDL1KEEP, [src]
>>
>> Why keep rather than strm for the prefetches?
>
> It improves small copies by prefetching the input immediately, so setting
> it to streaming would have an adverse effect as it claims the line will not
> be used again. For huge copies the initial prefetch has no effect.
This might be true on ARM's cores but not on all AARCH64 cores.
For ThunderX, we don't have a hardware prefetcher and STRM does
immediately starts the fetching of that cache line and it marks the
cache line in L2 as not going to be used any time afterwards.
So it could improve the performance there over the keep one.
Also that is how I read AARCH64 spec as the same as ThunderX
implements STRM too.
Thanks,
Andrew
>
> Wilco
>
>
More information about the Newlib
mailing list