This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

libdwfl --core crashes


wrt https://bugzilla.redhat.com/show_bug.cgi?id=559836

The proximate cause of the crash was that elf_strptr returned a bogus
pointer.  This happened because the offset was bogus, but so was the strtab
section's sh_size, so the offset validation didn't fire.

Because the data was already mapped, the section's rawdata_base pointer was
already set and elf_strptr just used it (plus the bogus offset).  This
rawdata_base was invalid because it was derived from a bogus sh_offset
(which came with a bogus sh_size to boot).  In a non-mapped file, these
offsets get tried in __libelf_set_rawdata_wrlock and are found wanting.

Even for a mapped file when rawdata_base is not already set, the code in
__libelf_set_rawdata_wrlock will validate the offset and size before
setting it.  I'm not sure off hand how this path can ever be reached in the
0.144 code.  If it were, it would diagnose the bogon there.

But elf_begin (file_read_elf) prepopulated the pointers into the mapped
file without checking the sh_offset and sh_size values.  AFAICT this
precludes the __libelf_set_rawdata_wrlock path (that does appropriate
checking) from ever being taken.  That's certainly what's happening in the
test case.

So I fixed that in commit 429502f.  This simply validates the sh_offset and
sh_size values in file_read_elf and does not set up the pointers if they
don't point inside the mapped file.  Then an attempt to use the bogus
section's data will indeed go to the __libelf_set_rawdata_wrlock path and
diagnose the error there.

This fix just masks the underlying problem with core file reading--the
problem that is producing these section headers to begin with.

I had been thinking the second fix along the way would be to make
elf_from_remote_memory punt the section headers (i.e. clear e_shoff, as it
does when they are invisible or truncated in the memory image) when they
are invalid.  I'd imagined doing some quick sanity checks like sh_offset's
not being wild, or maybe just do that check on the e_shstrndx shdr.  That
still might be a sensible idea for elf_from_remote_memory in theory, though
I think being fully robust to totally-bogus section headers at other levels
instead is plenty good enough.  But, elf_from_remote_memory is not involved
in core file reading at all!  Instead, we use elf_begin_rand, which is
built around the idea of a verbatim file image embedded in the core file.
So there it doesn't fit the model to clear some header fields when they're
suspicious like we do in elf_from_remote_memory--we get the ELF file image
that is embedded in the core file, verbatim, end of story.

So, the elf_begin_rand/elf_memory part of it is actually OK.  Those give an
accurate (and now adequately robust) ELF format reading of the data that's
there.  If the data's lossy, that's what the robustness is for.

But, this data is indeed bogus.  It looks like an ELF file image, but isn't
really.  Why?  What happened is that it's the data page of a small DSO.

Here are some of the relevant core segments:

  LOAD           0x313000 0x00007f9b04d3d000 0x0000000000000000 0x001000 0x001000 R E 0x1000
  LOAD           0x314000 0x00007f9b04d3e000 0x0000000000000000 0x000000 0x1ff000     0x1000
  LOAD           0x314000 0x00007f9b04f3d000 0x0000000000000000 0x001000 0x001000 RW  0x1000

What we got was a module identified as:

0x7f9b04f3d000+0x201000 d713b06ee3ed93aef5a253fd1d5d5888a10f7916(a)0x7f9b04f3d1a0 . - libomnibook.so

Right away, something is fishy because a module starts with the text
segment mapping, which is "R E" and here we have a "RW ".

Every normal DSO has two PT_LOAD segments.  Unless the DSO's text happens
to end on a page boundary, the normal layout puts the data immediately
after in the file, so that the file page that contains the data segment's
p_offset is mapped in two places--the data segment's p_vaddr, and the text
segment's last page.  When the whole text segment fits inside a page, then
the first text page is the last text page.  This is the case we have here.
So, 0x7f9b04f3d000 is the data segment, but it's mapped from the first page
of the file (p_offset = 0).  This means it starts with the ELF header and
phdrs, along with the rest of the text.  

Hence, the generic module sniffer (dwfl_segment_report_module) looking at
this page will decide that it starts a new module.  It takes the bounds of
the module from the phdrs therein, and extrapolates.  So the core dump's
memory corresponding to 0x7f9b04f3d000 + e_shoff is being taken as section
header data.  What's actually in that memory as found in the dump is some
other data, because 0x7f9b04f3d000 is the second mapping of the DSO's first
page, not the one that corresponds to the first phdr as we extrapolated.
In fact, we get some random other stuff including the ELF header of the
next DSO higher in memory, so that ELFMAG appears as part of the bogus
section header data.

What's supposed to avoid all that is the sniffer having found the proper
start of that DSO at 0x00007f9b04d3d000.  When it sniffs that segment, it
will find the first copy of our DSO's first file page.  Here it sees the
ELF header and phdrs that tell it that these three core segments are all
covered by the address range of this DSO.  So, we should skip over the
second and third core segments shown above, and start sniffing again after
that.  This is where we're going wrong.

The dwfl_segment_report_module sniffer was returning the index of the last
segment involved in the module just sniffed, rather than the next segment
after that.  I fixed that in commit ca84a55.  Now, sniffing skips the core
segments already consumed by a module.

With just the libelf fix, the test case will no longer crash.
This diff shows the effect of the libdwfl fix.

--- /tmp/a	2010-02-17 02:56:38.863889807 -0800
+++ /tmp/b	2010-02-17 02:57:00.448016272 -0800
@@ -1,5 +1,3 @@
-0x7f9b04331000+0x201000 da2781a960a4f7bc1c3581c8eda2a059c6fc7ec3(a)0x7f9b043311a0 - - libsonypi.so
-0x7f9b04d3d000+0x201000 d713b06ee3ed93aef5a253fd1d5d5888a10f7916(a)0x7f9b04d3d1a0 - - libomnibook.so
 0x400000+0x21a000 b5c4ce98ec4e21c38772861d5da78ab441790f6c(a)0x40024c - - [exe]
 0x7fff5cdff000+0x1000 a0c1eaa68bfe20c3020d7dde068719f298aa7ddf(a)0x7fff5cdff2f8 . - linux-vdso.so.1
 0x3bdbc00000+0x29e000 26381c0e78d6547be47efe210d1fba2b27d5ee3a(a)0x3bdbc001a0 /usr/lib64/libgnomeui-2.so.0 - libgnomeui-2.so.0
@@ -94,11 +92,13 @@
 0x7f9b05544000+0x202000 f71367b4026bb4404bd07a61f9aea5060715caf0(a)0x7f9b055441a0 /usr/lib64/libsensors-applet-plugin.so.0 - libsensors-applet-plugin.so.0
 0x7f9b05342000+0x202000 c80dfbf734de0b8c01aa4bc28a9ad60029466e5c(a)0x7f9b053421a0 /usr/lib64/sensors-applet/plugins//libsmu-sys.so - libsmu-sys.so
 0x7f9b05140000+0x202000 d174306cbc20c50614d54097cd5a4f9b3016aeaa(a)0x7f9b051401a0 /usr/lib64/sensors-applet/plugins//libhddtemp.so - libhddtemp.so
-0x7f9b04f3d000+0x201000 d713b06ee3ed93aef5a253fd1d5d5888a10f7916(a)0x7f9b04f3d1a0 . - libomnibook.so
+0x7f9b04f3e000+0x202000 491fae5ee5a93017b33edae501f95f9741631492(a)0x7f9b04f3e1a0 /usr/lib64/sensors-applet/plugins//libi8k.so - libi8k.so
+0x7f9b04d3d000+0x201000 d713b06ee3ed93aef5a253fd1d5d5888a10f7916(a)0x7f9b04d3d1a0 /usr/lib64/sensors-applet/plugins//libomnibook.so - libomnibook.so
 0x7f9b04b3b000+0x202000 9ce5478a29debe806965c11d96b6a5f269792f9d(a)0x7f9b04b3b1a0 /usr/lib64/sensors-applet/plugins//libeee.so - libeee.so
 0x7f9b04939000+0x202000 82a0c007aa5bacadf2a68a8521ae958a0725ad4a(a)0x7f9b049391a0 /usr/lib64/sensors-applet/plugins//libnvidia.so - libnvidia.so
 0x7f9b04734000+0x205000 51e43e9ef2fb68c4f01eb1b6e55643b04ae180d8(a)0x7f9b047341a0 /usr/lib64/libXNVCtrl.so.0 - libXNVCtrl.so.0
-0x7f9b04531000+0x201000 da2781a960a4f7bc1c3581c8eda2a059c6fc7ec3(a)0x7f9b045311a0 . - libsonypi.so
+0x7f9b04532000+0x202000 33988f0efe1847cf007684faf7702d258252e9fa(a)0x7f9b045321a0 /usr/lib64/sensors-applet/plugins//libpmu-sys.so - libpmu-sys.so
+0x7f9b04331000+0x201000 da2781a960a4f7bc1c3581c8eda2a059c6fc7ec3(a)0x7f9b043311a0 /usr/lib64/sensors-applet/plugins//libsonypi.so - libsonypi.so
 0x7f9b0412f000+0x202000 2b3a3c87b27a0b297d429e1362c490a2f94f5f99(a)0x7f9b0412f1a0 /usr/lib64/sensors-applet/plugins//liblibsensors.so - liblibsensors.so
 0x7f9b03f1f000+0x210000 a1337343fe13867b87b1b7c73f190b3e8cb6dc7c(a)0x7f9b03e001a0 /usr/lib64/libsensors.so.4 - libsensors.so.4
 0x7f9b03d1d000+0x202000 f4160b6a86ea0beb95d562d996aeba9182804a6c(a)0x7f9b03d1d1a0 /usr/lib64/sensors-applet/plugins//libacpi.so - libacpi.so

The most telling bit is the first two lines, two modules reported before
the fix.  Note how these are the only DSOs that precede [exe] in the list.

Remember the way that the core-file sniffing works.  There are two phases.
The first is the pure segment-content module sniffing, which is what we are
debugging now.  That reports the modules we find from seeing ELF headers,
in the order we come across them (i.e. ascending address order).  The
second phase follows the struct link_map chain from the DT_DEBUG pointer.
In normal circumstances, this list (the dynamic linker's data structure)
shows every module, and (given partial segments dumps) shows only those
that we've already found in the pure ELF phase.  When this phase points to
a module we know from the first phase, we adjust the name details of that
module in case link_map gave us more info than we had, and we reorder it in
the Dwfl module list to match the link_map ordering.  

Because of the reordering, any content-sniffed modules that were not found
in the link_map list show up as first in the list, before the executable,
which would otherwise always be first.  So, in looking at an eu-unstrip -n
listing, we can immediately be suspicious when we see modules reported
before the executable.  

Those extra modules are gone after the fix.  Instead, there are a couple of
proper-looking modules that reappear.  Those were being skipped over by the
sniffer because of its wrong extrapolation based on phdrs starting one page
off so a following DSO page can get missed.

Petr, please review the libdwfl change (and this explanation) and verify
that it makes sense.


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]