This is the mail archive of the
mailing list for the elfutils project.
RE: [patch 2/2] Fix loading core files without build-ids
- From: Bruce Dawson <bruced at valvesoftware dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Tue, 04 Jun 2013 22:12:49 +0000
- Subject: RE: [patch 2/2] Fix loading core files without build-ids
Ubuntu is misconfigured so that build IDs are missing from core files. A bug was filed but it doesn't look like much has been done with it.
So yeah, it is unfortunately important to handle this case.
From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of Jan Kratochvil
Sent: Sunday, May 19, 2013 9:39 AM
To: Mark Wielaard
Subject: Re: [patch 2/2] Fix loading core files without build-ids
there is still some work on this patch as I have found from your review, see below.
On Sat, 18 May 2013 00:00:58 +0200, Mark Wielaard wrote:
> BTW. How often does it happen that build-id are missing in core files?
> Is that just when a distro "misconfigures" the kernel dumping settings?
> We should of course handle it correctly anyway, just wondering.
CORE_DUMP_DEFAULT_ELF_HEADERS is in linux-2.6.git since
Author: Roland McGrath <firstname.lastname@example.org>
Date: Sat Oct 18 20:28:23 2008 -0700
and it is default Y since
Author: Roland McGrath <email@example.com>
Date: Wed Oct 27 15:34:09 2010 -0700
So anything older than 2010 or anything non-Linux or any GDB gcore-generated core file (still not fixed, mea culpa). Besides that someone can also unconfigure it, I guess some ultra-mini kernel configs may illogically do it.
OTOH for practical Fedora/RHEL cases this patch is probably irrelevant.
> > The testcase run-unstrip-n.sh had to be updated:
> > (a) It was currently reporting filenames like /lib/libc.so.6 but those were
> > a false match, the system files do not match build-ids from the core file,
> > therefore there should be no successful match to system files.
> But now you don't know the name at all? That is a pity since they are
> a good hint. Would be nice if we could provide them in some way as
> hint as you see with /proc/self/exe -> /bin/foo (deleted), or by
> putting them between [brackets].
This is a real bug, thanks for noticing it.
New dwfl_link_map_report does not report modules for which it cannot find its file on disk. This is correct as one does not know the module boundaries just from a link map entry. The module boundaries were found in such case by dwfl_segment_report_module and names filled in later by dwfl_link_map_report.
Recent Linux kernels now produce also core note NT_FILE but in such case one usually has also build-id headers already so this patch is not needed.
My current plan is:
(1) Save filenames for non-existing files during initial dwfl_link_map_report.
(2) Run dwfl_segment_report_module even if dwfl_link_map_report found something.
(2b) dwfl_segment_report_module should omit conflicting modules.
(3) Update found anonymous "[dso]" modules by the saved filenames.
> > * core-file.c (dwfl_core_file_report): Move raw segments reporting
> > lower. New variables vdso_bias, vdso_l_ld and exec_vaddr. Call also
> > dwfl_segment_report_module for EXEC_VADDR and VDSO_L_LD. Call
> > dwfl_segment_report_module for raw segments only if LISTED is zero.
> Is the FIXME: Use vdso_bias. just informational, or were you going to
> do that?
I am not going to do that. I was trying to refactor the code to make it possible but I find too many changes needed in a code I do not understand well. Primarily I believe the bias auto-detection in dwfl_segment_report_module should always work correctly for vDSO.
> So you don't sniff at all if dwfl_link_map_report () found something.
> Would it hurt at this point to sniff?
That's the question. Sniffing will always bring in possibly incorrect data.
I believe it would be useful only in the case link_map (r_debug->r_map) is corrupted in the middle and some of the libraries do not get reported (as I have seen in real world core files from ABRT). Maybe report_r_debug could report corrupted link_map - it currently internally checks
++iterations < dwfl->lookup_elts
but it should also check l_prev does match the previous entry like GDB does.
Still core files with corrupted link_map I find more a corner case and it could be added as an add-on patch IMO.
> It is a pity dwfl_segment_report_module () doesn't have documentation.
> I am still staring at it to fully understand what it does.
Yes, I also had this problem, this is why I used it only as a 2nd pass if dwfl_link_map_report did not find anything.
One needs to decide there when it is really safe to add a new non-conflicting module by dwfl_segment_report_module but I admit I did not investigate it much.
> > * dwfl_module_build_id.c (check_notes): Move into
> > __libdwfl_find_elf_build_id.
> > (__libdwfl_find_build_id): Rename to ...
> > (__libdwfl_find_elf_build_id): ... here. Add parameters build_id_bits,
> > build_id_elfaddr and build_id_len. Verify MOD vs. ELF.
> > (__libdwfl_find_elf_build_id) (check_notes): Remove parameters mod and
> > set, rename data_vaddr to data_elfaddr. Do not call found_build_id.
> > (__libdwfl_find_elf_build_id): Update the check_notes caller, do not
> > adjust its data_elfaddr parameter.
> > (__libdwfl_find_build_id): New wrapper of __libdwfl_find_elf_build_id.
> > * libdwflP.h (__libdwfl_find_elf_build_id): New declaration.
> OK, as written here the moving around confused me a little. But
> reading the code it looks correct. Since __libdwfl_find_build_id ()
> adds the main bias afterwards, you don't adjusting vaddr in the other
> places anymore.
Technically I just split out the libelf/ part of __libdwfl_find_build_id into __libdwfl_find_elf_build_id. But in the end I found I cannot easily make it really libelf/-clean as __libdwfl_find_elf_build_id needs __libdwfl_relocate_value for ET_REL and separating the call out of it would complicate the code more than worth moving it to libelf/. But still I could refactor it more to really it move __libdwfl_find_elf_build_id to libelf/ .
elfutils-devel mailing list