This is the mail archive of the
mailing list for the elfutils project.
Re: Detecting separate debuginfo
- From: Mark Wielaard <mjw at redhat dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Sun, 30 Mar 2014 11:23:50 +0200
- Subject: Re: Detecting separate debuginfo
On Fri, Mar 28, 2014 at 02:00:49PM +0100, Florian Weimer wrote:
> I maintain a database which extracts symbol information from ELF objects
> (among other things). I would like to enrich that with DWARF producer
> data, and perhaps additional DWARF information in the future.
> I'd really like to avoid importing the ELF symbol information twice,
> once from the real object file, and once from the separate debuginfo.
Note that in general the main ELF file contains only a subset of the ELF
symbols in .dynsym, or the compressed .gnu_debugdata section (which only
contains function symbols), that the separate .debugdata file contains
in the full .symtab. To get the full symbol table you can ignore main
ELF file if you know there should be a corresponding .debug file (because
the main ELF file has a .gnu_debuglink section).
> The database performs content-based deduplication, this means I do not
> have path name information during extraction. This mean I cannot use
> file system paths to disambiguate the real thing and its debugging
> information. Both files are loaded separately and not necessarily@
> the same time. I don't want to change that if possible because this
> would result in a scalability issue eventually. I don't want to assume
> that *all* debuginfo data has been separated, either.
You can use the build-id to check whether two files describe the same
module. Use eu-unstrip -n -e <file> to see it and the possible separate
.debug file if it has some (that will show the file based location, but
at least you know whether it should exist).
> Based on the previous discussion around program interpreter reporting in
> readelf, there is no easy way to detect separate debuginfo to trigger
> special processing for it (e.g., do not extract symbols, only
> DW_at_producer data).
It isn't easy to detect whether the program headers of an ELF file are
valid, although Roland suggested an heuristic to detect if they are.
But it is easy to detect whether a file had debuginfo (just check for a
.debug_info section or just try opening the Elf with libdw dwarf_begin).
And if it doesn't then just check to see if there is a .gnu_debuglink
section to see if it has separate debuginfo (and a separate full symbol
> One thing that would help me as well if there is a way to get the exact
> same set of exported symbols from the real file and its separate
> debuginfo. The I could deduplicate based on that, and processing both
> files would not matter anymore. eu-readelf shows quite different output
> for the two files, so I'm not sure how to achieve that.
Yes, as explained above the main ELF file, if it has separate debuginfo
will also have its full symbol table (.symtab) in the .debug file. So the
main ELF file will just contain the minimal .dynsym symbols needed at
runtime (and might have some extra function symbols in the .gnu_debugdata
section see https://sourceware.org/gdb/onlinedocs/gdb/MiniDebugInfo.html
you can read that compressed section with eu-readelf --elf-section).
> I don't actually use eu-readelf output (but my extraction code is
> derived from it), and I'm open to suggestions to look(a)particular
> sections/headers to get matching output. I'm mainly interested in
> public symbols and undefined symbols. Internal symbols from debugging
> information could be ignored for the time being.
I think you just want the .dynsym symbols then. eu-readelf -s will show
either or both of .dynsym and .symtab if it exists. eu-nm -D will only
show the dynamic symbols. If the ELF file isn't stripped it will have both.
A separate .debug file will only have the full .symtab table (the .dynsym
section will have NOBITS set).
Hope that helps,