Bug 2467

Summary:	"ar q" / ranlib has large memory use (linear in archive size)
Product:	binutils	Reporter:	Duncan Coutts <dcoutts>
Component:	binutils	Assignee:	unassigned
Status:	RESOLVED FIXED
Severity:	normal	CC:	bug-binutils
Priority:	P2
Version:	2.16
Target Milestone:	---
Host:		Target:
Build:		Last reconfirmed:
Attachments:	Possible patch to make _bfd_generic_bfd_free_cached_info effective

Description Duncan Coutts 2006-03-16 19:43:39 UTC

When building an arhive with lots of .o files (eg over 10,000) GNU ar takes a
very long time and lots of memory.

One might hope that the 'q' "quick append" option might operate in linear time
and constant memory (since all it needs to do is to append to the archive
without looking at the existing entries). However the GNU ar man page says that
it is an alias for the 'r' option and so the possible performance advantages of
the 'q' option are lost.

This is a real problem for building ghc on machines with little RAM
(http://haskell.org/ghc/). When building ghc's libHSbase.a (with 12746 .o
files), GNU ar takes over 500Mb of memory on my system. On machines with less
memory this operation can take many minutes.

There may be good reasons for alising the 'q' mode to 'r' however it does also
mean that it's impossible to build large archives on weaker machines.

Comment 1 Duncan Coutts 2006-03-17 00:46:52 UTC

I tried using the 'S' modifier which did make adding archive members faster (by
not building the symbol index). However it also missed members out for some
reason which I have not figured out yet.

Further tests revel that it is most likely that it is the ranlib / ar s part
that is taking so long. I wrote a little program (100loc) to create GNU ar
archives without the symbol table (which unsprisingly is very quick). Then
running ranlib on this archive still took 500Mb for a short time near the end of
the run.

This problem with ranlib/ar memory consumption is made worse by the fact that we
have to use xargs and invoke ar multiple times if we have a large number of
archive members. This is of course because of restrictions on the size of
command line arguments. When using "... | xargs ar q libfoo.a" we end up
re-generating the symbol index multiple times (eg about 10 for the ghc
libHSbase.a case). If ar supported -input then this would be better.

Comment 2 Duncan Coutts 2006-03-17 02:57:18 UTC

Adding some instrumantation shows that the spike in memory use occurs within the
function _bfd_compute_and_write_armap() in bfd's archive.c module.

It maps over each object file in the archive and allocates symbol tables for
each one. This is where the linear memory use is comming from. Then when it's
done collecting symbols it writes them out.

Perhaps it'd be possible to change the algorithm so that we only need to keep
the bfd resources for one archive member at a time. Afterall the code already
copies out the information it will need for the final symbol index rather than
leaving it shared with the archive member bfd structures. So as far as I can see
it should be possible to only allocate bfd resources for one member of the
archive at a time rather than allocating bfd resources for every archive member
simultaniously.

The size of the final symbol index is not that big (only a few Mb) so if the
memory use were a small multiple of that size rather that a large multiple (x20)
of the size of the whole archive then the memory savings would be quite significant.

[And we wouldn't get complaints from users about the OOM killer kicking in while
they build libHSbase.a ! :-) ]

Comment 3 Duncan Coutts 2006-03-17 03:08:17 UTC

Interestingly, this bit has no noticable effect on memory use.

          /* Now ask the BFD to free up any cached information, so we
             don't fill all of memory with symbol tables.  */
          if (! bfd_free_cached_info (current))
            goto error_return;

Comment 4 drow@false.org 2006-03-17 03:18:42 UTC

Subject: Re:  "ar q" gives quadratic memory use

On Fri, Mar 17, 2006 at 12:46:53AM -0000, dcoutts at gentoo dot org wrote:
> This problem with ranlib/ar memory consumption is made worse by the fact that we
> have to use xargs and invoke ar multiple times if we have a large number of
> archive members. This is of course because of restrictions on the size of
> command line arguments. When using "... | xargs ar q libfoo.a" we end up
> re-generating the symbol index multiple times (eg about 10 for the ghc
> libHSbase.a case). If ar supported -input then this would be better.

Recent versions support @file for this situation.  I don't know if this
was in the last release - I don't think so.

Comment 5 Duncan Coutts 2006-03-17 12:33:45 UTC

Changing summary to be more accurate. The quadratic thing is a misleading since
we only get that behaviour with xargs (where we're basically running ranlib a
linear number of times on an archive that is growing linearly).

The real problem is with the fact that ranlib allocates bfd resources for each
archive member simultaniously while building the symbol armap. It seems possible
that it could be done by only allocating bfd resources for one archive member at
a time (+ space for the symbol map that is being accumulated).

Comment 6 Duncan Coutts 2006-03-17 15:18:48 UTC

After looking at the code a bit more I have these observations:

as we map over each element in the archive, it is the call to:
bfd_check_format (current, bfd_object);
that makes the current bfd swell in memory use. If we close each archive member
bfd after use then the memory use remains constant. However we can't actually
close each archive member since they need to be open for when the archive gets
written. On the other hand writing out the archive doesn't need all the
auxillary information in each bfd that bfd_check_format allocates. If we omit
the loop over archive members entrely then writing the archive doesn't cause a
mamory spike.

So what we need is something to release the auxillary info that gets allocated
for each archive member bfd after we're done with it but without actually
closing the bfds.

Sadly, neither bfd_cache_close nor bfd_free_cached_info serve this purpose.

Comment 7 Duncan Coutts 2006-03-17 16:25:46 UTC

The reason that bfd_free_cached_info does nothing is because on elf targets it
is #defined to be _bfd_generic_bfd_free_cached_info which is #defined to be
bfd_true.

As a complete hack I tried calling this instead of bfd_free_cached_info:
          bfd_hash_table_free (&current->section_htab);
          objalloc_free ((struct objalloc *) current->memory);

This works. That is the memory use remains relatively constant and the resulting
archive is the byte for byte the same as an unpatched ranlib.

Now obviously this is a hack, we shouldn't be using this target-specific code in
target-independent code like archive.c. Perhaps we could have something along
these lines in bfd_free_cached_info on elf targets.

Comment 8 Duncan Coutts 2006-03-17 17:15:57 UTC

To try and make this patch less of a hack I tried this:

add a new _bfd_generic_bfd_free_cached_info and in libbfd.h change the
#define _bfd_generic_bfd_free_cached_info bfd_true
to a proper prototype:
bfd_boolean _bfd_generic_bfd_free_cached_info (bfd *);

Then in libbfd.c I added the implementaion:

bfd_boolean
_bfd_generic_bfd_free_cached_info (bfd *abfd) {
        bfd_hash_table_free (&abfd->section_htab);
        objalloc_free ((struct objalloc *) abfd->memory);
        return TRUE;
}

So I need someone else woh knows about bfd to say if this code is appropriate
for the generic implementation or if it's only appropriate for some
targets/backends. Also I don't know if this would mess up other users of
bfd_free_cached_info, though as far as I can see it's only called from archive.c.

Comment 9 Duncan Coutts 2006-03-17 18:41:13 UTC

Created attachment 926 [details]
Possible patch to make _bfd_generic_bfd_free_cached_info effective

Just to be concrete, here's a patch that implements what I've been trying.
As I've said, it needs to be reviewed by someone who knows the bfd library as
I'm just guesing.

Comment 10 H.J. Lu 2006-04-26 13:40:18 UTC

Fixed by

http://sourceware.org/ml/binutils/2006-04/msg00095.html