When building an arhive with lots of .o files (eg over 10,000) GNU ar takes a very long time and lots of memory. One might hope that the 'q' "quick append" option might operate in linear time and constant memory (since all it needs to do is to append to the archive without looking at the existing entries). However the GNU ar man page says that it is an alias for the 'r' option and so the possible performance advantages of the 'q' option are lost. This is a real problem for building ghc on machines with little RAM (http://haskell.org/ghc/). When building ghc's libHSbase.a (with 12746 .o files), GNU ar takes over 500Mb of memory on my system. On machines with less memory this operation can take many minutes. There may be good reasons for alising the 'q' mode to 'r' however it does also mean that it's impossible to build large archives on weaker machines.
I tried using the 'S' modifier which did make adding archive members faster (by not building the symbol index). However it also missed members out for some reason which I have not figured out yet. Further tests revel that it is most likely that it is the ranlib / ar s part that is taking so long. I wrote a little program (100loc) to create GNU ar archives without the symbol table (which unsprisingly is very quick). Then running ranlib on this archive still took 500Mb for a short time near the end of the run. This problem with ranlib/ar memory consumption is made worse by the fact that we have to use xargs and invoke ar multiple times if we have a large number of archive members. This is of course because of restrictions on the size of command line arguments. When using "... | xargs ar q libfoo.a" we end up re-generating the symbol index multiple times (eg about 10 for the ghc libHSbase.a case). If ar supported -input then this would be better.
Adding some instrumantation shows that the spike in memory use occurs within the function _bfd_compute_and_write_armap() in bfd's archive.c module. It maps over each object file in the archive and allocates symbol tables for each one. This is where the linear memory use is comming from. Then when it's done collecting symbols it writes them out. Perhaps it'd be possible to change the algorithm so that we only need to keep the bfd resources for one archive member at a time. Afterall the code already copies out the information it will need for the final symbol index rather than leaving it shared with the archive member bfd structures. So as far as I can see it should be possible to only allocate bfd resources for one member of the archive at a time rather than allocating bfd resources for every archive member simultaniously. The size of the final symbol index is not that big (only a few Mb) so if the memory use were a small multiple of that size rather that a large multiple (x20) of the size of the whole archive then the memory savings would be quite significant. [And we wouldn't get complaints from users about the OOM killer kicking in while they build libHSbase.a ! :-) ]
Interestingly, this bit has no noticable effect on memory use. /* Now ask the BFD to free up any cached information, so we don't fill all of memory with symbol tables. */ if (! bfd_free_cached_info (current)) goto error_return;
Subject: Re: "ar q" gives quadratic memory use On Fri, Mar 17, 2006 at 12:46:53AM -0000, dcoutts at gentoo dot org wrote: > This problem with ranlib/ar memory consumption is made worse by the fact that we > have to use xargs and invoke ar multiple times if we have a large number of > archive members. This is of course because of restrictions on the size of > command line arguments. When using "... | xargs ar q libfoo.a" we end up > re-generating the symbol index multiple times (eg about 10 for the ghc > libHSbase.a case). If ar supported -input then this would be better. Recent versions support @file for this situation. I don't know if this was in the last release - I don't think so.
Changing summary to be more accurate. The quadratic thing is a misleading since we only get that behaviour with xargs (where we're basically running ranlib a linear number of times on an archive that is growing linearly). The real problem is with the fact that ranlib allocates bfd resources for each archive member simultaniously while building the symbol armap. It seems possible that it could be done by only allocating bfd resources for one archive member at a time (+ space for the symbol map that is being accumulated).
After looking at the code a bit more I have these observations: as we map over each element in the archive, it is the call to: bfd_check_format (current, bfd_object); that makes the current bfd swell in memory use. If we close each archive member bfd after use then the memory use remains constant. However we can't actually close each archive member since they need to be open for when the archive gets written. On the other hand writing out the archive doesn't need all the auxillary information in each bfd that bfd_check_format allocates. If we omit the loop over archive members entrely then writing the archive doesn't cause a mamory spike. So what we need is something to release the auxillary info that gets allocated for each archive member bfd after we're done with it but without actually closing the bfds. Sadly, neither bfd_cache_close nor bfd_free_cached_info serve this purpose.
The reason that bfd_free_cached_info does nothing is because on elf targets it is #defined to be _bfd_generic_bfd_free_cached_info which is #defined to be bfd_true. As a complete hack I tried calling this instead of bfd_free_cached_info: bfd_hash_table_free (¤t->section_htab); objalloc_free ((struct objalloc *) current->memory); This works. That is the memory use remains relatively constant and the resulting archive is the byte for byte the same as an unpatched ranlib. Now obviously this is a hack, we shouldn't be using this target-specific code in target-independent code like archive.c. Perhaps we could have something along these lines in bfd_free_cached_info on elf targets.
To try and make this patch less of a hack I tried this: add a new _bfd_generic_bfd_free_cached_info and in libbfd.h change the #define _bfd_generic_bfd_free_cached_info bfd_true to a proper prototype: bfd_boolean _bfd_generic_bfd_free_cached_info (bfd *); Then in libbfd.c I added the implementaion: bfd_boolean _bfd_generic_bfd_free_cached_info (bfd *abfd) { bfd_hash_table_free (&abfd->section_htab); objalloc_free ((struct objalloc *) abfd->memory); return TRUE; } So I need someone else woh knows about bfd to say if this code is appropriate for the generic implementation or if it's only appropriate for some targets/backends. Also I don't know if this would mess up other users of bfd_free_cached_info, though as far as I can see it's only called from archive.c.
Created attachment 926 [details] Possible patch to make _bfd_generic_bfd_free_cached_info effective Just to be concrete, here's a patch that implements what I've been trying. As I've said, it needs to be reviewed by someone who knows the bfd library as I'm just guesing.
Fixed by http://sourceware.org/ml/binutils/2006-04/msg00095.html