This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, RFC]: Re-implement gas symbol table hash routines


On Sat, Jul 30, 2016 at 02:00:50PM +0900, Gianni Tedesco wrote:
> Hi,
> 
> I had a problem assembling large (on the order of 100MB's to gigabytes
> -- yes, yes, insane, I know) source files. The main culprit seemed to

Well, I have some 20mb .s files lying around from real projects, so
large, but not *that* large.

> be the hash table implementation. Even when I hacked gas to use very
> large tables (bigger than the largest settable on command-line) it
> didn't help. Profiles would always show it spending consistently 85-95% 
> of the time in hash_lookup().
> 
> So I implemented an open-addressing scheme using linear probing and
> robin-hood replacement strategy which reduces variance in load-factors, 
> which helps tremendously with load-factors vs lookup times.
> 
> It implements the strategy of maintaining a target load-factor (0.8125
> here) and doubling the hash table when we would exceed it. It uses
> similar amounts of memory as before so there's no real time/space
> trade-off here for smaller systems.
> 
> It might be nice to just use a bigger load factor when --reduce-memory-
> overheads is selected. The code should work, all be it a bit slower, up
> at load factors past 0.9 but it does become unbearably slow at 1.0.
> 
> This code brings in CityHash as a good modern hash function not
> requiring prime-number size tables.
> 
> CAVEATS:
> 
> I can address these things pretty easily if there is interest in the
> patch (not sure, since it's obviously addressed at a niche use case).

It seems worthwhile at least in principal to speed up hash tables.  That
said I'd really like to see consolidation between gas's hash table and
the one in libiberty.

>  - On the largest assembly file I have (2.4GB, producing a 1.5GB 
>    binary, time is reduced from 38mins to 4mins
>  - On a wide variety of other, smaller inputs, run-time was either 
>    indistinguishable from the original, or better. Same for memory.
> 
> Thanks for reading.

Thanks for looking into it.

Trev


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]