Summary: | "ar q" / ranlib has large memory use (linear in archive size) | ||
---|---|---|---|
Product: | binutils | Reporter: | Duncan Coutts <dcoutts> |
Component: | binutils | Assignee: | unassigned |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | bug-binutils |
Priority: | P2 | ||
Version: | 2.16 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Possible patch to make _bfd_generic_bfd_free_cached_info effective |
Description
Duncan Coutts
2006-03-16 19:43:39 UTC
I tried using the 'S' modifier which did make adding archive members faster (by not building the symbol index). However it also missed members out for some reason which I have not figured out yet. Further tests revel that it is most likely that it is the ranlib / ar s part that is taking so long. I wrote a little program (100loc) to create GNU ar archives without the symbol table (which unsprisingly is very quick). Then running ranlib on this archive still took 500Mb for a short time near the end of the run. This problem with ranlib/ar memory consumption is made worse by the fact that we have to use xargs and invoke ar multiple times if we have a large number of archive members. This is of course because of restrictions on the size of command line arguments. When using "... | xargs ar q libfoo.a" we end up re-generating the symbol index multiple times (eg about 10 for the ghc libHSbase.a case). If ar supported -input then this would be better. Adding some instrumantation shows that the spike in memory use occurs within the function _bfd_compute_and_write_armap() in bfd's archive.c module. It maps over each object file in the archive and allocates symbol tables for each one. This is where the linear memory use is comming from. Then when it's done collecting symbols it writes them out. Perhaps it'd be possible to change the algorithm so that we only need to keep the bfd resources for one archive member at a time. Afterall the code already copies out the information it will need for the final symbol index rather than leaving it shared with the archive member bfd structures. So as far as I can see it should be possible to only allocate bfd resources for one member of the archive at a time rather than allocating bfd resources for every archive member simultaniously. The size of the final symbol index is not that big (only a few Mb) so if the memory use were a small multiple of that size rather that a large multiple (x20) of the size of the whole archive then the memory savings would be quite significant. [And we wouldn't get complaints from users about the OOM killer kicking in while they build libHSbase.a ! :-) ] Interestingly, this bit has no noticable effect on memory use. /* Now ask the BFD to free up any cached information, so we don't fill all of memory with symbol tables. */ if (! bfd_free_cached_info (current)) goto error_return; Subject: Re: "ar q" gives quadratic memory use
On Fri, Mar 17, 2006 at 12:46:53AM -0000, dcoutts at gentoo dot org wrote:
> This problem with ranlib/ar memory consumption is made worse by the fact that we
> have to use xargs and invoke ar multiple times if we have a large number of
> archive members. This is of course because of restrictions on the size of
> command line arguments. When using "... | xargs ar q libfoo.a" we end up
> re-generating the symbol index multiple times (eg about 10 for the ghc
> libHSbase.a case). If ar supported -input then this would be better.
Recent versions support @file for this situation. I don't know if this
was in the last release - I don't think so.
Changing summary to be more accurate. The quadratic thing is a misleading since we only get that behaviour with xargs (where we're basically running ranlib a linear number of times on an archive that is growing linearly). The real problem is with the fact that ranlib allocates bfd resources for each archive member simultaniously while building the symbol armap. It seems possible that it could be done by only allocating bfd resources for one archive member at a time (+ space for the symbol map that is being accumulated). After looking at the code a bit more I have these observations: as we map over each element in the archive, it is the call to: bfd_check_format (current, bfd_object); that makes the current bfd swell in memory use. If we close each archive member bfd after use then the memory use remains constant. However we can't actually close each archive member since they need to be open for when the archive gets written. On the other hand writing out the archive doesn't need all the auxillary information in each bfd that bfd_check_format allocates. If we omit the loop over archive members entrely then writing the archive doesn't cause a mamory spike. So what we need is something to release the auxillary info that gets allocated for each archive member bfd after we're done with it but without actually closing the bfds. Sadly, neither bfd_cache_close nor bfd_free_cached_info serve this purpose. The reason that bfd_free_cached_info does nothing is because on elf targets it is #defined to be _bfd_generic_bfd_free_cached_info which is #defined to be bfd_true. As a complete hack I tried calling this instead of bfd_free_cached_info: bfd_hash_table_free (¤t->section_htab); objalloc_free ((struct objalloc *) current->memory); This works. That is the memory use remains relatively constant and the resulting archive is the byte for byte the same as an unpatched ranlib. Now obviously this is a hack, we shouldn't be using this target-specific code in target-independent code like archive.c. Perhaps we could have something along these lines in bfd_free_cached_info on elf targets. To try and make this patch less of a hack I tried this: add a new _bfd_generic_bfd_free_cached_info and in libbfd.h change the #define _bfd_generic_bfd_free_cached_info bfd_true to a proper prototype: bfd_boolean _bfd_generic_bfd_free_cached_info (bfd *); Then in libbfd.c I added the implementaion: bfd_boolean _bfd_generic_bfd_free_cached_info (bfd *abfd) { bfd_hash_table_free (&abfd->section_htab); objalloc_free ((struct objalloc *) abfd->memory); return TRUE; } So I need someone else woh knows about bfd to say if this code is appropriate for the generic implementation or if it's only appropriate for some targets/backends. Also I don't know if this would mess up other users of bfd_free_cached_info, though as far as I can see it's only called from archive.c. Created attachment 926 [details]
Possible patch to make _bfd_generic_bfd_free_cached_info effective
Just to be concrete, here's a patch that implements what I've been trying.
As I've said, it needs to be reviewed by someone who knows the bfd library as
I'm just guesing.
|