[PATCH 3/4] Provide access to non SEC_HAS_CONTENTS core file sections

Sun Mar 29 13:18:46 GMT 2020

Hi Kevin,

On 3/5/20 12:42 AM, Kevin Buettner wrote:
> Consider the following program:
> 
> Change-Id: I1adbb4e9047baad7cae7eab9c72e6d2b16f87d73
> 

This Change-Id line should be at the bottom of the commit log.
Or removed entirely since we're not relying on it anymore.

> --- mkmmapcore.c ---
> 
> static char *buf;
> 
> int
> main (int argc, char **argv)
> {
>   buf = mmap (NULL, 8192, PROT_READ | PROT_WRITE,
>               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>   abort ();
> }
> --- end mkmmapcore.c ---
> 
> Compile it like this:
> 
> gcc -g -o mkmmapcore mkmmapcore.c
> 
> Now let's run it from GDB.  I've already placed a breakpoint on the
> line with the abort() call and have run to that breakpoint.
> 
> Breakpoint 1, main (argc=1, argv=0x7fffffffd678) at mkmmapcore.c:11
> 11	  abort ();
> (gdb) x/x buf
> 0x7ffff7fcb000:	0x00000000
> 
> Note that we can examine the memory allocated via the call to mmap().
> 
> Now let's try debugging a core file created by running this program.
> Depending on your system, in order to make a core file, you may have to
> run the following as root (or using sudo):
> 
>     echo core > /proc/sys/kernel/core_pattern
> 
> It may also be necessary to do:
> 
>     ulimit -c unlimited
> 
> I'm using Fedora 31. YMMV if you're using one of the BSDs or some other
> (non-Linux) system.
> 
> This is what things look like when we debug the core file:
> 
>     [kev@f31-1 tmp]$ gdb -q ./mkmmapcore core.304767
>     Reading symbols from ./mkmmapcore...
>     [New LWP 304767]
>     Core was generated by `/tmp/mkmmapcore'.
>     Program terminated with signal SIGABRT, Aborted.
>     #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>     50	  return ret;
>     (gdb) x/x buf
>     0x7ffff7fcb000:	Cannot access memory at address 0x7ffff7fcb000
> 
> Note that we can no longer access the memory region allocated by mmap().
> 
> Back in 2007, a hack for GDB was added to _bfd_elf_make_section_from_phdr()
> in bfd/elf.c:
> 
> 	  /* Hack for gdb.  Segments that have not been modified do
> 	     not have their contents written to a core file, on the
> 	     assumption that a debugger can find the contents in the
> 	     executable.  We flag this case by setting the fake
> 	     section size to zero.  Note that "real" bss sections will
> 	     always have their contents dumped to the core file.  */
> 	  if (bfd_get_format (abfd) == bfd_core)
> 	    newsect->size = 0;
> 
> You can find the entire patch plus links to other discussion starting
> here:
> 
>     https://sourceware.org/ml/binutils/2007-08/msg00047.html
> 
> This hack sets the size of certain BFD sections to 0, which
> effectively causes GDB to ignore them.  I think it's likely that the
> bug described above existed even before this hack was added, but I
> have no easy way to test this now.
> 
> The output from objdump -h shows the result of this hack:
> 
>  25 load13        00000000  00007ffff7fcb000  0000000000000000  00013000  2**12
>                   ALLOC
> 
> (The first field, after load13, shows the size of 0.)
> 
> Once the hack is removed, the output from objdump -h shows the correct
> size:
> 
>  25 load13        00002000  00007ffff7fcb000  0000000000000000  00013000  2**12
>                   ALLOC
> 
> (This is a digression, but I think it's good that objdump will now show
> the correct size.)
> 
> If we remove the hack from bfd/elf.c,> but do nothing to GDB, we'll
> see the following regression:
> 
> FAIL: gdb.base/corefile.exp: print coremaker_ro
> 
> The reason for this is that all sections which have the BFD flag
> SEC_ALLOC set, but for which SEC_HAS_CONTENTS is not set no longer
> have zero size.  Some of these sections have data that can (and should)
> be read from the executable.  

Removing the bfd hack alone fixes your new test for me.

> But, due to the way that the target
> strata are traversed when attempting to access memory, the
> non-SEC_HAS_CONTENTS sections will be read as zeroes from the
> process_stratum (which in this case is the core file stratum) without
> first checking the file stratum, which is where the data might actually
> be found.

I've applied your patch #1 only, and ran the corefile.exp test, but
it still passes cleanly for me.  I don't see any "print coremaker_ro"
FAIL here.  :-/  That makes it a bit harder for me to understand all
of this.  I'm on Fedora 27.

Can you expand a bit more on this following part?

> Some of these sections have data that can (and should) be read
> from the executable.

I'd like to understand and explore this a little bit better.

Are these cases truly indistinguishable from the cases where data
shouldn't be read from the executable?  I don't mean from the current
bfd data structures, but from the data in the core file and the executable.
It's kind of fascinating that that's the case, and if so, it would sound
like a nasty bug in either the core format or in the Linux kernel for
producing such cores with which we have to apply heuristics.

For the NON-split fake sections case (by split I mean the loadXXXa/loadXXXb
sections that map to a single segment), how come we end up with such sections
in the core in the first place if they weren't modified at run time?

Diffing "objdump -h" results from before/after the hack removal, on the corefile.exp
core dump, I see, this case for example:

 - 18 load6         00000000  00007fd61476a000  0000000000000000  00027000  2**12
 + 18 load6         001ff000  00007fd61476a000  0000000000000000  00027000  2**12
                    ALLOC, READONLY

This is a case of a segment that is not split in two sections like some
others (note no trailing "a" and "b").  So this is a "!split" case in
_bfd_elf_make_section_from_phdr.  Trying to disassemble that address, with
the whole patch series applied, results in:

(gdb) disassemble 0x00007fd61476a000,+10
Dump of assembler code from 0x7fd61476a000 to 0x7fd61476a00a:
   0x00007fd61476a000:  /home/pedro/gdb/binutils-gdb/src/gdb/target.c:1271: internal-error: target_xfer_status target_xfer_partial(target_ops*, target_object, const char*, gdb_byte*, const gdb_byte*, ULONGEST, ULONGEST, ULONGEST*): Assertion `*xfered_len > 0' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) 

I'd be good to also cover this in the testsuite somehow.

Using "info proc mappings" and "readelf -l" we can see that the address
belongs to a file-backed mapping with p_filesz=0.  I'm puzzled about
why we ended up with a p_filesz=0 load segment in the core for
this memory range (load6).

> 
> What we should be doing is this:
> 
> - Attempt to access core file data for SEC_HAS_CONTENTS sections.
> - Attempt to access executable file data if the above fails.
> - Attempt to access core file data for non SEC_HAS_CONTENTS sections, if
>   both of the above fail.
> 

This seems to end up in line with Daniel's suggestion back in 2007 at:

  https://sourceware.org/legacy-ml/binutils/2007-08/msg00045.html

Except it uses the section flags as proxy for the p_filesz/p_memsz
checks.

I'm still not fully sure this is the right thing to do given I'm not
clear on all the details, but if there truly is no other way to
distinguish the segments that need to be read from the executable
compared to segments that need to be read from the core, I suppose
this is the way to go.  

I'm not fully convinced on the splitting the sections though, compared to
just walking the core sections list twice with a predicate.
section_table_xfer_memory_partial already has a predicate parameter,
the 'section_name' parameter, we would just need to generalize
it to a gdb::function_view callback instead.

Or alternatively, if you prefer two separate lists, then I don't
understand why build a single list, and then split in two right after.
Wouldn't it be preferable to make core_target::core_target() build the
two separate lists from the get go, rather that build a single list and
then split it in two lists immediately after?

BTW, that TARGET_OBJECT_MEMORY case in core_target::xfer_partial
is getting largish, might be worth it to move that chunk to a
separate core_target::xfer_memory method.

But really just walking the single sections list in place would
be simpler, I think.  I don't think this is a bottleneck.

>  enum target_xfer_status
> @@ -741,12 +767,52 @@ core_target::xfer_partial (enum target_object object, const char *annex,

> +      /* If none of the above attempts worked to access the memory in
> +	 question, return TARGET_XFER_UNAVAILABLE.  Due to the fact
> +	 that the exec file stratum has already been considered, we
> +	 want to prevent it from being examined yet again (at a higher
> +	 level).  */
> +      if (xfer_status == TARGET_XFER_OK)
> +	return TARGET_XFER_OK;
> +      else
> +	return TARGET_XFER_UNAVAILABLE;

This returning ...UNAVAILABLE seems like the wrong thing to do.  If
we want to prevent continuing to the next layer, then we could
just make core_target::has_all_memory() return true.

Effectively that would mean we could eliminate that method, since it
only exists for core files, here, in raw_memory_xfer_partial:

      /* We want to continue past core files to executables, but not
	 past a running target's memory.  */
      if (ops->has_all_memory ())
	break;

At the very least, that comment should be updated.

Trying it out locally, like this, on top of your whole series:

diff --git c/gdb/corelow.c w/gdb/corelow.c
index 7a711740622..d449efb74b9 100644
--- c/gdb/corelow.c
+++ w/gdb/corelow.c
@@ -90,7 +90,7 @@ class core_target final : public process_stratum_target
 
   const char *thread_name (struct thread_info *) override;
 
-  bool has_all_memory () override { return false; }
+  bool has_all_memory () override { return true; }
   bool has_memory () override;
   bool has_stack () override;
   bool has_registers () override;
@@ -804,15 +804,7 @@ core_target::xfer_partial (enum target_object object, const char *annex,
                       m_core_no_contents_section_table.sections_end,
                       NULL);
 
-      /* If none of the above attempts worked to access the memory in
-        question, return TARGET_XFER_UNAVAILABLE.  Due to the fact
-        that the exec file stratum has already been considered, we
-        want to prevent it from being examined yet again (at a higher
-        level).  */
-      if (xfer_status == TARGET_XFER_OK)
-       return TARGET_XFER_OK;
-      else
-       return TARGET_XFER_UNAVAILABLE;
+      return xfer_status;
 
     case TARGET_OBJECT_AUXV:
       if (readbuf)

Fixes the assertion (different address since this was another
core dump from another test run):

 - 17 load6         00000000  00007ffff7884000  0000000000000000  001d3000  2**12
 + 17 load6         001ff000  00007ffff7884000  0000000000000000  001d3000  2**12
                    ALLOC, READONLY

 (gdb) disassemble /r 0x7ffff7884000,+10
 Dump of assembler code from 0x7ffff7884000 to 0x7ffff788400a:
    0x00007ffff7884000:  00 00   add    %al,(%rax)
    0x00007ffff7884002:  00 00   add    %al,(%rax)
    0x00007ffff7884004:  00 00   add    %al,(%rax)
    0x00007ffff7884006:  00 00   add    %al,(%rax)
    0x00007ffff7884008:  00 00   add    %al,(%rax)
 End of assembler dump.

But I'm not sure (yet anyway), whether reading that section
as all zeroes is really the right thing to do.

Running "info proc mappings" when debugging the core shows that
this address comes from libc.so.  It's the second libc-2.26.so
mapping below, see "THIS ONE":

(gdb) info proc mappings 
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
            0x400000           0x401000     0x1000        0x0 /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/corefile/corefile
            0x600000           0x601000     0x1000        0x0 /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/corefile/corefile
            0x601000           0x602000     0x1000     0x1000 /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/corefile/corefile
      0x7ffff76d7000     0x7ffff7884000   0x1ad000        0x0 /usr/lib64/libc-2.26.so
      0x7ffff7884000     0x7ffff7a83000   0x1ff000   0x1ad000 /usr/lib64/libc-2.26.so   <<< THIS ONE
      0x7ffff7a83000     0x7ffff7a87000     0x4000   0x1ac000 /usr/lib64/libc-2.26.so
      0x7ffff7a87000     0x7ffff7a89000     0x2000   0x1b0000 /usr/lib64/libc-2.26.so
      0x7ffff7a8d000     0x7ffff7bd7000   0x14a000        0x0 /usr/lib64/libm-2.26.so
      0x7ffff7bd7000     0x7ffff7dd6000   0x1ff000   0x14a000 /usr/lib64/libm-2.26.so
      0x7ffff7dd6000     0x7ffff7dd7000     0x1000   0x149000 /usr/lib64/libm-2.26.so
      0x7ffff7dd7000     0x7ffff7dd8000     0x1000   0x14a000 /usr/lib64/libm-2.26.so
      0x7ffff7dd8000     0x7ffff7dfd000    0x25000        0x0 /usr/lib64/ld-2.26.so
      0x7ffff7ff5000     0x7ffff7ff7000     0x2000        0x0 /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/coremmap.data
      0x7ffff7ffc000     0x7ffff7ffd000     0x1000    0x24000 /usr/lib64/ld-2.26.so


So, I tried comparing a live process to a core dump one.  Since we need
to use a kernel-generated core, what I did was, load the corefile program
under GDB, let it run till before the abort() call, and then do

 (gdb) print fork ()

This makes the program fork, and the fork child crashes and aborts.
Now I'm still debugging the parent, and I have a kernel-generated core
with the same same memory map as the still-running inferior.

I loaded the core dump as a second inferior under gdb.
(add-inferior -no-connection; inferior 2; file ...; core ...)
Remember that this now works with multi-target.

 (gdb) p fork ()
 [Detaching after fork from child process 19488]
 $2 = 19488
 (gdb) add-inferior -no-connection
 [New inferior 2]
 Added inferior 2
 (gdb) inferior 2
 [Switching to inferior 2 [<null>] (<noexec>)]
 (gdb) file /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/corefile/corefile
 Reading symbols from /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/corefile/corefile...
 (gdb) core core.19488 
 [New LWP 19488]
 Core was generated by `/home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/co'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x00007fffffffd23f in ?? ()

Find the address in question:

 (gdb) shell objdump -h core.19488
 ...
 17 load6         00000000  00007ffff7884000  0000000000000000  001d3000  2**12
                  ALLOC, READONLY
 ...

Disassemble it in inferior 2, the core dump:

 (gdb) disassemble /r 0x00007ffff7884000,+10
 Dump of assembler code from 0x7ffff7884000 to 0x7ffff788400a:
    0x00007ffff7884000:  00 00   add    %al,(%rax)
    0x00007ffff7884002:  00 00   add    %al,(%rax)
    0x00007ffff7884004:  00 00   add    %al,(%rax)
    0x00007ffff7884006:  00 00   add    %al,(%rax)
    0x00007ffff7884008:  00 00   add    %al,(%rax)
 End of assembler dump.

Now let's try disassembling the same address in the live inferior,
inferior 1:

(gdb) inferior 1
[Switching to inferior 1 [process 19451] (/home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/corefile/corefile)]
[Switching to thread 1.1 (process 19451)]
#0  main (argc=1, argv=0x7fffffffd3b8) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/coremaker.c:155
155       func1 ();
(gdb) disassemble /r 0x00007ffff7884000,+10
Dump of assembler code from 0x7ffff7884000 to 0x7ffff788400a:
   0x00007ffff7884000:  20 c6   and    %al,%dh
   0x00007ffff7884002:  07      (bad)  
   0x00007ffff7884003:  00 00   add    %al,(%rax)
   0x00007ffff7884005:  00 00   add    %al,(%rax)
   0x00007ffff7884007:  00 40 b8        add    %al,-0x48(%rax)
End of assembler dump.
(gdb) 

They should result in the same contents, but clearly the
core case read all zeroes, while the live one didn't.

If we unwind all the patches in this series and try pristine master,
we hit the original:

 (gdb) disassemble /r 0x00007ffff7884000,+10
 Dump of assembler code from 0x7ffff7884000 to 0x7ffff788400a:
    0x00007ffff7884000:  Cannot access memory at address 0x7ffff7884000

So GDB doesn't find this section's contents in the executable or shared
libraries, even though the file-backed mappings suggest we should be able
to read it from libc-2.26.so.  Maybe on your system you'll have different
results and gdb manages to find the data in the executable somehow?

Thanks,
Pedro Alves