This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimized strlen() for x64 (SSE2 based strlen)


On Sat, May 12, 2018 at 05:54:47AM -0700, H.J. Lu wrote:
 
> Your implementation is good, especially at shorter strings.   For strings
> longer than 128 bytes, yours is slower.
> 
No, that implementation isn't good. This is pretty much same as origninal implementation
from ages ago which we several times improved. So I will briefly recapitulate state of
affairs.

These benchmarks are worthless beause they ignore branch misprediction
which what matters most. 

With my profiler new implementation is 10% slower than current one for
gcc. This is done by interposing strlen calls of gcc compilation and
measuring variant running times. Results when one randomizes
size/alignment are similar, results and profiler are here.

https://iuuk.mff.cuni.cz/~ondra/strlen_profile/results_gcc/result.html
https://iuuk.mff.cuni.cz/~ondra/strlen_profile/results_rand/result.html
https://iuuk.mff.cuni.cz/~ondra/strlen_profile180512.tar.bz2

Checks with aligned loads like here are terrible idea because they
inherently introduce missprediction mainly in case of loading only few bytes.
Unalinged loads in current implementation are there for reason, it gives
biggest perforance improvement that one always checks 16 bytes(well
unless of rare page boundary case) which is lot more predictable check. 

Second improvement is that we unroll loop 4 times to gain better
performance on larger inputs, but it isn't that important as most inputs
are short in practice.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]