This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Faster strlen
On Tue, Oct 09, 2012 at 06:51:15AM -0700, Andi Kleen wrote:
> OndÅej BÃlka <neleai@seznam.cz> writes:
> >
> > I also benchmarked atom and added variant which is identical to
> > strlen-sse2-pminub except bsf is replaced by table lookup.
>
> Is your micro benchmark just a tight loop or does it fill the caches?
Starting position is random within 8MB interval and sizes are chosen
randomly within same order of magnitude.
>
> I have doubts that table lookups are a good idea if it blows away
> the working set in L1 for the application.
It does not have this problem. It does lookup only for powers of 2 which
fits 11 cache lines.
However it has problem that atom L2 cache has slow latency. When I
add access 8 random reads between calls then performance becomes
same as pminub.
>
> Microbenchmarks that do not use caches much can be very misleading
> here. Even if it's slightly slower not doing table lookups
> is usually preferred for functions like this, simply because it lessens
> the impact on the caches.
>
> I would recommend to measure what happens both if the microbenchmark
> stresses data cache and icache. Otherwise you risk winning
> benchmarks, but making real apps slower.
>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only
--
Typo in the code