This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] aarch64: Optimized implementation of memmove for Qualcomm Falkor


On Thursday 05 October 2017 05:27 PM, Szabolcs Nagy wrote:
> i'd expect memmove to do the same thing as memcpy
> if there is no overlap or the overlap is dst < src.
> 
> why memcpy is not optimal for those cases?

It is, but I'd have to add an extra condition in there specifically for
that and jump to memcpy.  The current check only tells us if the src is
before the dest or after, so it is not a pure overlap check.

> i don't quite understand the prefetching and loop
> size problems.

I have specified that in the code comment.  The hardware prefetcher on
the falkor is trained using the register numbers due to which memcpy
works best when only one register is used for copy.  Unrolling to copy
64 bytes in a loop works very well, much like the generic memcpy.

For memmove though, it cannot use the same algorithm since it needs to
prefetch ahead to avoid overwriting overlapping locations, due to which
it can only copy 32 bytes at a time using two registers that can alias
to the same hardware prefetcher, thus training it effectively.

> i think sharing code between memmove and memcpy is
> useful for instruction cache and code maintenance too.
> if that cannot be done for some reason then that
> should be spelled out more clearly in the commit
> message.

I agree that sharing code would have been nice (I tried that, and I'll
continue looking for a way to unify it if possible in future), but there
is a non-trivial difference in performance between memcpy and memmove
and I can't justify giving up that performance.  If the above
explanation is clear enough for you, I'll put that into the commit message.

Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]