This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] aarch64: thunderx2 memmove performance improvements


Hi Anton,

> The check inside the loop is free as it
> is done while the data are being brought from the memory.

The loop is used both for small copies and for the tail of
very large copies.  The small copies might well be in the
cache, while the large copies are prefetched.

> I can move the check to prologue/epilogue and while this
> can still be free the codepath will look less clear as I
> need to handle 2 "asymmetric" cases in the epilogue (if
> the excessive writebacks are to be avoided): one for <64
> bytes and the other >=64 bytes. This also makes the tails
> longer.
> I can probably merge the tails making the longer case
> falls through to the shorter but this makes the things even
> less clear.
> As there is no real performance nor clarity benefit of using
> single branch loop I am inclined to leave the 128 bytes loop
> as it is now. Do you think this is reasonable or I'm missing
> something?

It's reasonable for now (and Szabolcs already approved your
latest version). But it is feasible to improve further given that the
memmove loop does 64 bytes per iteration, so if that is fast enough
then that may be a simpler way to handle this loop too.

Wilco
    

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]