On Thu, Jul 06, 2006 at 07:53:29PM +0000, djamel anonymous wrote:
Hello, i am writing you this time about the first variant.after looking at
the benchmark results i noted that there have been a reduction in the
number of l1 cache misses; a reduction in l1 cache misses means a win of 12
cycles ; the difference between the latency of l2 cache and that of l1
cache 15-3.on the other hand replacing a division by a binary and & is a
win of at least 25 cycles, so it think that avoiding tthe division in the
common case may improve performance.
I don't think it is the modulo that matters, but the smaller footprint
of .gnu.hash case in that case. I have implemented what I think you meant
and the numbers actually convinced me.
So here is the new set of patches and new statistics.
take1 is 2006-06-28 state of things, take2 2006-07-03, take3 2006-07-05
and take4 what is attached here.