This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: [PATCH][AArch64] Tune memcpy
- From: Andrew Pinski <pinskia at gmail dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: "<newlib at sourceware dot org>" <newlib at sourceware dot org>
- Date: Fri, 6 Nov 2015 22:42:17 +0800
- Subject: Re: [PATCH][AArch64] Tune memcpy
- Authentication-results: sourceware.org; auth=none
- References: <000001d117fc$6fafe190$4f0fa4b0$ at arm dot com> <565E3296-5669-441A-AB9E-1E4A06239BC2 at gmail dot com> <000601d118a0$387b8c70$a972a550$ at arm dot com>
On Fri, Nov 6, 2015 at 10:34 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> pinskia@gmail.com wrote:
>> > def_fn memcpy p2align=6
>> > + prfm PLDL1KEEP, [src]
>>
>> Why keep rather than strm for the prefetches?
>
> It improves small copies by prefetching the input immediately, so setting
> it to streaming would have an adverse effect as it claims the line will not
> be used again. For huge copies the initial prefetch has no effect.
This might be true on ARM's cores but not on all AARCH64 cores.
For ThunderX, we don't have a hardware prefetcher and STRM does
immediately starts the fetching of that cache line and it marks the
cache line in L2 as not going to be used any time afterwards.
So it could improve the performance there over the keep one.
Also that is how I read AARCH64 spec as the same as ThunderX
implements STRM too.
Thanks,
Andrew
>
> Wilco
>
>