[PATCH v3 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h
Alexander Monakov
amonakov@ispras.ru
Thu Apr 28 18:03:37 GMT 2022
On Wed, 27 Apr 2022, Noah Goldstein via Libc-alpha wrote:
> I think it is the way you're doing your analysis as a loop-carried
> dependency. I.e really 7c per iteration with no unroll (although
> its fair the loads on address can speculate ahead so it will
> indeed be faster) vs 9c per 2x iterations.
Hm? Right, the CPU will issue loads speculatively, so you shouldn't count
load latency as part of critical path.
I don't understand how you get a 2x improvement on long strings, did you
run the benchmark with rdtscp timing, i.e. with
make USE_RDTSCP=1 bench
?
Alexander
More information about the Libc-alpha
mailing list