[RFC/RFA] dangling bfd pointer in archive cache...

Joel Brobecker brobecker@adacore.com
Tue Oct 2 14:14:00 GMT 2012


I am trying to fix a crash that's occurring when running a program
from GDB on ppc-aix. Take any program that uses threading, and:

    (gdb) run
    Starting program: /[...]/task_switch
    zsh: 46458 segmentation fault (core dumped)  gdb-head -q task_switch

This is related to a recent patch that started counting references
to BFDs, and closing them when the reference count reached zero.
Here is what is happening in chronological order:

During the startup phase, GDB receives notification that libthread.a
has been mapped. It creates an archive BFD for it, and starts going
through it object files. It looks at the first one by calling:

    bfd *result = bfd_openr_next_archived_file (archive, previous);

(where previous is NULL).

Following what archive.c:bfd_openr_next_archived_file does, we
find that it calls coff-rs6000.c:_bfd_xcoff_openr_next_archived_file,
which eventually calls archive.c:_bfd_get_elt_at_filepos. This
routine first checks the archive's cache for our bfd, and creates
a new one if not found. At the end of the element's creation, it
then adds it to the archive BFD's cache:

  if (_bfd_add_bfd_to_archive_cache (archive, filepos, n_nfd))
    return n_nfd;

Back to GDB, GDB looks at our elt bfd, finds that it's not the one
it is looking for, gets the next one using the same function, and
then unref's it.  As the ref count of that first objfile reached zero,
it therefore calls bfd_close.

This is when things start going wrong, as bfd_close frees the memory
allocated to our elt bfd, but does not remove it from the archive's
cache. As a result, the next time we query the first elt of our archive,
we find the reference in the cache, and return that - a pointer to
free'ed memory, which eventually leads to a crash.

Looking further into this, I went back and forth between different
approaches, until I found that archive.c defines a function that
the cleanup: archive.c:_bfd_archive_close_and_cleanup. I don't
think it should be hooked up to the target vector to be called
automatically, since it's not entirely a target properly, more like
a "construct" property. So I've simply added a call to it from
include "bfd_close".

This fixes the problem on ppc-aix, tested using the gdb-testsuite
on ppc-aix and x86_64-linux.  I'd be happy to do more testing if
someone told me which testsuites to run for this type of change.
But before doing so, I thought I'd make sure I am making the correct
type of change...



        * opncls.c (bfd_close); Add call to _bfd_archive_close_and_cleanup.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: opncls.diff
Type: text/x-diff
Size: 870 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/gdb-patches/attachments/20121002/fe59ce00/attachment.bin>

More information about the Gdb-patches mailing list