This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Optimizing hash table lookup in symbol binding

From: Florian Weimer <fweimer at redhat dot com>
To: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
Cc: "libc-alpha\@sourceware.org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
Date: Mon, 18 Nov 2019 18:34:41 +0100
Subject: Re: Optimizing hash table lookup in symbol binding
References: <87lfsd477i.fsf@oldenburg2.str.redhat.com> <0dca7a5c-fc7b-8aa9-48cf-338e5b2c313f@arm.com>

* Szabolcs Nagy:

> On 18/11/2019 13:58, Florian Weimer wrote:
>> My primary interest was the % operator because it turns out that it
>> actually shows up in some profiles stressing symbol binding during
>> program lookup.  In most cases, however the search for the right mapping
>> dominates and the preceding bitmask check fails most of the time.  But
>> with shallow library dependencies, we can end up in a situation where
>> the division actually matters.
>> 
>> Strategies for optimizing integer division are discussed in Hacker's
>> Delight and here:
>> 
>> <http://ridiculousfish.com/blog/posts/labor-of-division-episode-i.html>
>> <http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html>
>> 
>> (I have written to the author to get some of the math fixed in minor
>> ways, but I think the general direction is solid.)
>> 
>> The algorithm from the first episode looks like this:
>
> note that _itoa.c already uses something like this.
>
> (i think the itoa code is unnecessary: there is no reason
> for optimizing anything but base 10 and 16, and the compiler
> can do a better job at those than trying something at runtime,
> but you may look at that implementation if it's any better).

I don't think so.  It doesn't show to generate magic values for
arbitrary divisors, which is the tricky part anyway.  It uses both the
Episode 1 and Episode 3 algorithms.  The uint64_t case for 32-bit
architectures is very complicated because it avoids the __umoddi3 call.
This is *not* something that current GCC would do on its own.  GCC does
not even fold the division and modulo into one libcall.

For base 10, it could still result in better code to take a few
__udivdi3 calls to reduce the value into 9-digit pieces, and then use
the 32-bit code to process that efficiently.  I'm not sure if we need
anything else for glibc (only base 8, 10, 16, maybe base 2 somewhere).

People have also posted vectorized implementations of this operation,
including base 10.

But that's not really realted to the hash table optimization. 8-)

Thanks,
Florian

References:
- Optimizing hash table lookup in symbol binding
  - From: Florian Weimer
- Re: Optimizing hash table lookup in symbol binding
  - From: Szabolcs Nagy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]