Bug 12682 - [PATCH] high memory usage when linking many small object files.
Summary: [PATCH] high memory usage when linking many small object files.
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.21
: P2 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-18 17:36 UTC by Bertram Felgenhauer
Modified: 2011-06-03 16:19 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
patch against Debian's binutils-2.21.0.20110327 (722 bytes, patch)
2011-04-18 17:36 UTC, Bertram Felgenhauer
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bertram Felgenhauer 2011-04-18 17:36:20 UTC
Created attachment 5675 [details]
patch against Debian's binutils-2.21.0.20110327

GNU ld takes a lot of memory when linking the Glasgow Haskell Compiler (ghc), and many programs produced by this compiler. These programs have the characteristic that they consist of many, literally thousands, small object files with perhaps a dozen symbols each.

Version: GNU ld (GNU Binutils) 2.21.0.20110327

This topic comes up occasionally on haskell mailing lists, for example, http://www.mail-archive.com/glasgow-haskell-users@haskell.org/msg18215.html

I have a patch (attached below) that reduces the memory usage for linking ghc from almost 440 MB to 190 MB on x86_64. I did not see any negative impact on performance, but admittedly I did not try very hard to measure it (I really expect no discernible impact: the extra work is limited to a couple of allocations and copying of memory, and ld does quite a bit of other, heavier lifting. The primary effect, namely the reduced working size of the program, can only help performance.) For ghc, link time improved very slightly from about 1.9 to 1.8 seconds.

The patch works by changing the default hashtable size of libbfd from 4k entries to 31 entries, and increasing the granularity of hash table sizes by adding more intermediate sizes. No code is changed at all. There is nothing magical about the 31 except that going below 31 did not seem to improve memory usage anymore.
Comment 1 Sourceware Commits 2011-06-03 16:16:39 UTC
CVSROOT:	/cvs/src
Module name:	src
Changes by:	nickc@sourceware.org	2011-06-03 16:16:32

Modified files:
	bfd            : ChangeLog hash.c 

Log message:
	PR ld/12682
	* hash.c (higher_primer_number): Add more, small, prime numbers.
	(bfd_hash_set_default_size): Likewise.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/ChangeLog.diff?cvsroot=src&r1=1.5367&r2=1.5368
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/hash.c.diff?cvsroot=src&r1=1.33&r2=1.34
Comment 2 Nick Clifton 2011-06-03 16:19:40 UTC
Hi Bertram,

 Thanks for reporting this problem.  I have applied the second half of your patch - adding more small prime values to the arrays in hash.c.  But there is no need for the first part of the patch (changing the value of DEFAULT_SIZE).  You can do this from the linker command line.  vis:

  --hash-size=31

You might also wish to consider adding the --reduce-memory-overheads option to the linker command line as well.

Cheers
  Nick