This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][AArch64] Tune memcpy


On Fri, Nov 6, 2015 at 10:34 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> pinskia@gmail.com wrote:
>> > def_fn memcpy p2align=6
>> > +    prfm    PLDL1KEEP, [src]
>>
>> Why keep rather than strm for the prefetches?
>
> It improves small copies by prefetching the input immediately,  so setting
> it to streaming would have an adverse effect as it claims the line will not
> be used again. For huge copies the initial prefetch has no effect.

This might be true on ARM's cores but not on all AARCH64 cores.
For ThunderX, we don't have a hardware prefetcher and STRM does
immediately starts the fetching of that cache line and it marks the
cache line in L2 as not going to be used any time afterwards.
So it could improve the performance there over the keep one.

Also that is how I read AARCH64 spec as the same as ThunderX
implements STRM too.


Thanks,
Andrew

>
> Wilco
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]