[PATCH v3 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h

Thu Apr 28 18:03:37 GMT 2022

On Wed, 27 Apr 2022, Noah Goldstein via Libc-alpha wrote:

> I think it is the way you're doing your analysis as a loop-carried
> dependency. I.e really 7c per iteration with no unroll (although
> its fair the loads on address can speculate ahead so it will
> indeed be faster) vs 9c per 2x iterations.

Hm? Right, the CPU will issue loads speculatively, so you shouldn't count
load latency as part of critical path.

I don't understand how you get a 2x improvement on long strings, did you
run the benchmark with rdtscp timing, i.e. with

    make USE_RDTSCP=1 bench

?

Alexander