25631 – GDB cannot access unwritten-to mmap'd buffer from core file

Bug 25631 - GDB cannot access unwritten-to mmap'd buffer from core file

Summary: GDB cannot access unwritten-to mmap'd buffer from core file

Status:	RESOLVED FIXED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	corefiles (show other bugs)
Version:	HEAD

Importance:	P2 normal
Target Milestone:	10.1
Assignee:	Kevin Buettner

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-03-05 00:18 UTC by Kevin Buettner
Modified:	2020-09-01 01:55 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Test case demonstrating mmap core file bug (175 bytes, text/x-csrc) 2020-03-05 00:18 UTC, Kevin Buettner	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kevin Buettner 2020-03-05 00:18:44 UTC

Created attachment 12347 [details]
Test case demonstrating mmap core file bug

Compile the test case as follows:

gcc -g -o mkmmapcore mkmmapcore.c

Next, make sure that your system can create core files in a local directory.  This might require executing the following command as root or using sudo:

echo core > /proc/sys/kernel/core_pattern

Also, it may be necessary to do "ulimit -c unlimited".

Now, run the program using GDB...

(gdb) b 11
Breakpoint 1 at 0x401171: file mkmmapcore.c, line 11.
(gdb) run
Starting program: /tmp/mkmmapcore 

Breakpoint 1, main (argc=1, argv=0x7fffffffd678) at mkmmapcore.c:11
11	  abort ();
(gdb) x/x buf
0x7ffff7fcb000:	0x00000000
(gdb) c
Continuing.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	  return ret;
(gdb) q
A debugging session is active.

	Inferior 1 [process 304383] will be killed.

Quit anyway? (y or n) y

In the above, note that GDB is able to access the contents of the buffer created using mmap().

Now, debug the program again, this time using the core file:

[kev@f31-1 tmp]$ gdb -q ./mkmmapcore core.304767
Reading symbols from ./mkmmapcore...
[New LWP 304767]
Core was generated by `/tmp/mkmmapcore'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	  return ret;
(gdb) x/x buf
0x7ffff7fcb000:	Cannot access memory at address 0x7ffff7fcb000
(gdb) q

This demonstrates the bug; using the core file, GDB should be able to access the same memory region as when the process was live.

Comment 1 Kevin Buettner 2020-03-05 00:19:37 UTC

I have a fix for this bug; I'll be posting a patch set soon.

Comment 2 Kevin Buettner 2020-03-05 00:45:26 UTC

Patch series can be found here:

https://sourceware.org/ml/gdb-patches/2020-03/msg00106.html

Comment 3 Sourceware Commits 2020-07-22 19:54:29 UTC

The master branch has been updated by Kevin Buettner <kevinb@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=678c7a56ced1828d37a554ec97f672496f135054

commit 678c7a56ced1828d37a554ec97f672496f135054
Author: Kevin Buettner <kevinb@redhat.com>
Date:   Tue May 12 17:44:19 2020 -0700

    Adjust corefile.exp test to show regression after bfd hack removal
    
    In his review of my BZ 25631 patch series, Pedro was unable to
    reproduce the regression which should occur after patch #1, "Remove
    hack for GDB which sets the section size to 0", is applied.
    
    Pedro was using an ld version older than 2.30.  Version 2.30
    introduced the linker option -z separate-code.  Here's what the man
    page has to say about it:
    
        Create separate code "PT_LOAD" segment header in the object.  This
        specifies a memory segment that should contain only instructions
        and must be in wholly disjoint pages from any other data.
    
    In ld version 2.31, use of separate-code became the default for
    Linux/x86.  So, really, 2.31 or later is required in order to see the
    regression that occurs in recent Linux distributions when only the
    bfd hack removal patch is applied.
    
    For the test case in question, use of the separate-code linker option
    means that the global variable "coremaker_ro" ends up in a separate
    load segment (though potentially with other read-only data).  The
    upshot of this is that when only patch #1 is applied, GDB won't be
    able to correctly access coremaker_ro.  The reason for this is due
    to the fact that this section will now have a non-zero size, but
    will not have contents from the core file to find this data.
    So GDB will ask BFD for the contents and BFD will respond with
    zeroes for anything from those sections.  GDB should instead be
    looking in the executable for this data.  Failing that, it can
    then ask BFD for a reasonable value.  This is what a later patch
    in this series does.
    
    When using ld versions earlier than 2.31 (or 2.30 w/ the
    -z separate-code option explicitly provided to the linker), there is
    the possibility that coremaker_ro ends up being placed near other data
    which is recorded in the core file.  That means that the correct value
    will end up in the core file, simply because it resides on a page that
    the kernel chooses to put in the core file.  This is why Pedro wasn't
    able to reproduce the regression that should occur after fixing the
    BFD hack.
    
    This patch places a big chunk of memory, two pages worth on x86, in
    front of "coremaker_ro" to attempt to force it onto another page
    without requiring use of that new-fangled linker switch.
    
    Speaking of which, I considered changing the test to use
    -z separate-code, but this won't work because it didn't
    exist prior to version 2.30.  The linker would probably complain
    of an unrecognized switch.  Also, it likely won't be available in
    other linkers not based on current binutils.  I.e. it probably won't
    work in FreeBSD, NetBSD, etc.
    
    To make this more concrete, this is what *should* happen when
    attempting to access coremaker_ro when only patch #1 is applied:
    
        Core was generated by `/mesquite2/sourceware-git/f28-coresegs/bld/gdb/testsuite/outputs/gdb.base/coref'.
        Program terminated with signal SIGABRT, Aborted.
        #0  0x00007f68205deefb in raise () from /lib64/libc.so.6
        (gdb) p coremaker_ro
        $1 = 0
    
    Note that this result is wrong; 201 should have been printed instead.
    But that's the point of the rest of the patch series.
    
    However, without this commit, or when using an old Linux distro with
    a pre-2.31 ld, this is what you might see instead:
    
        Core was generated by `/mesquite2/sourceware-git/f28-coresegs/bld/gdb/testsuite/outputs/gdb.base/coref'.
        Program terminated with signal SIGABRT, Aborted.
        #0  0x00007f63dd658efb in raise () from /lib64/libc.so.6
        (gdb) p coremaker_ro
        $1 = 201
    
    I.e. it prints the right answer, which sort of makes it seem like the
    rest of the series isn't required.
    
    Now, back to the patch itself... what should be the size of the memory
    chunk placed before coremaker_ro?
    
    It needs to be at least as big as the page size (PAGE_SIZE) from
    the kernel.  For x86 and several other architectures this value is
    4096.  I used MAPSIZE which is defined to be 8192 in coremaker.c.
    So it's twice as big as what's currently needed for most Linux
    architectures.  The constant PAGE_SIZE is available from <sys/user.h>,
    but this isn't portable either.  In the end, it seemed simpler to
    just pick a value and hope that it's big enough.  (Running a separate
    program which finds the page size via sysconf(_SC_PAGESIZE) and then
    passes it to the compilation via a -D switch seemed like overkill
    for a case which is rendered moot by recent linker versions.)
    
    Further information can be found here:
    
       https://sourceware.org/pipermail/gdb-patches/2020-May/168168.html
       https://sourceware.org/pipermail/gdb-patches/2020-May/168170.html
    
    Thanks to H.J. Lu for telling me about the '-z separate-code' linker
    switch.
    
    gdb/testsuite/ChangeLog:
    
            * gdb.base/coremaker.c (filler_ro): New global constant.

Comment 4 Sourceware Commits 2020-07-22 19:54:39 UTC

The master branch has been updated by Kevin Buettner <kevinb@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=2735d4218ea81ea83458007a80e4132fa6e73668

commit 2735d4218ea81ea83458007a80e4132fa6e73668
Author: Kevin Buettner <kevinb@redhat.com>
Date:   Wed Mar 4 17:42:42 2020 -0700

    Provide access to non SEC_HAS_CONTENTS core file sections
    
    Consider the following program:
    
    - - - mkmmapcore.c - - -
    
    static char *buf;
    
    int
    main (int argc, char **argv)
    {
      buf = mmap (NULL, 8192, PROT_READ | PROT_WRITE,
                  MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
      abort ();
    }
    - - - end mkmmapcore.c - - -
    
    Compile it like this:
    
    gcc -g -o mkmmapcore mkmmapcore.c
    
    Now let's run it from GDB.  I've already placed a breakpoint on the
    line with the abort() call and have run to that breakpoint.
    
    Breakpoint 1, main (argc=1, argv=0x7fffffffd678) at mkmmapcore.c:11
    11        abort ();
    (gdb) x/x buf
    0x7ffff7fcb000: 0x00000000
    
    Note that we can examine the memory allocated via the call to mmap().
    
    Now let's try debugging a core file created by running this program.
    Depending on your system, in order to make a core file, you may have to
    run the following as root (or using sudo):
    
        echo core > /proc/sys/kernel/core_pattern
    
    It may also be necessary to do:
    
        ulimit -c unlimited
    
    I'm using Fedora 31. YMMV if you're using one of the BSDs or some other
    (non-Linux) system.
    
    This is what things look like when we debug the core file:
    
        [kev@f31-1 tmp]$ gdb -q ./mkmmapcore core.304767
        Reading symbols from ./mkmmapcore...
        [New LWP 304767]
        Core was generated by `/tmp/mkmmapcore'.
        Program terminated with signal SIGABRT, Aborted.
        #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        50    return ret;
        (gdb) x/x buf
        0x7ffff7fcb000:     Cannot access memory at address 0x7ffff7fcb000
    
    Note that we can no longer access the memory region allocated by mmap().
    
    Back in 2007, a hack for GDB was added to _bfd_elf_make_section_from_phdr()
    in bfd/elf.c:
    
              /* Hack for gdb.  Segments that have not been modified do
                 not have their contents written to a core file, on the
                 assumption that a debugger can find the contents in the
                 executable.  We flag this case by setting the fake
                 section size to zero.  Note that "real" bss sections will
                 always have their contents dumped to the core file.  */
              if (bfd_get_format (abfd) == bfd_core)
                newsect->size = 0;
    
    You can find the entire patch plus links to other discussion starting
    here:
    
        https://sourceware.org/ml/binutils/2007-08/msg00047.html
    
    This hack sets the size of certain BFD sections to 0, which
    effectively causes GDB to ignore them.  I think it's likely that the
    bug described above existed even before this hack was added, but I
    have no easy way to test this now.
    
    The output from objdump -h shows the result of this hack:
    
     25 load13        00000000  00007ffff7fcb000  0000000000000000  00013000  2**12
                      ALLOC
    
    (The first field, after load13, shows the size of 0.)
    
    Once the hack is removed, the output from objdump -h shows the correct
    size:
    
     25 load13        00002000  00007ffff7fcb000  0000000000000000  00013000  2**12
                      ALLOC
    
    (This is a digression, but I think it's good that objdump will now show
    the correct size.)
    
    If we remove the hack from bfd/elf.c, but do nothing to GDB, we'll
    see the following regression:
    
    FAIL: gdb.base/corefile.exp: print coremaker_ro
    
    The reason for this is that all sections which have the BFD flag
    SEC_ALLOC set, but for which SEC_HAS_CONTENTS is not set no longer
    have zero size.  Some of these sections have data that can (and should)
    be read from the executable.  (Sections for which SEC_HAS_CONTENTS
    is set should be read from the core file; sections which do not have
    this flag set need to either be read from the executable or, failing
    that, from the core file using whatever BFD decides is the best value
    to present to the user - it uses zeros.)
    
    At present, due to the way that the target strata are traversed when
    attempting to access memory, the non-SEC_HAS_CONTENTS sections will be
    read as zeroes from the process_stratum (which in this case is the
    core file stratum) without first checking the file stratum, which is
    where the data might actually be found.
    
    What we should be doing is this:
    
    - Attempt to access core file data for SEC_HAS_CONTENTS sections.
    - Attempt to access executable file data if the above fails.
    - Attempt to access core file data for non SEC_HAS_CONTENTS sections, if
      both of the above fail.
    
    This corresponds to the analysis of Daniel Jacobowitz back in 2007
    when the hack was added to BFD:
    
        https://sourceware.org/legacy-ml/binutils/2007-08/msg00045.html
    
    The difference, observed by Pedro in his review of my v1 patches, is
    that I'm using "the section flags as proxy for the p_filesz/p_memsz
    checks."
    
    gdb/ChangeLog:
    
            PR corefiles/25631
            * corelow.c (core_target:xfer_partial):  Revise
            TARGET_OBJECT_MEMORY case to consider non-SEC_HAS_CONTENTS
            case after first checking the stratum beneath the core
            target.
            (has_all_memory): Return true.
            * target.c (raw_memory_xfer_partial): Revise comment
            regarding use of has_all_memory.

Comment 5 Sourceware Commits 2020-07-22 19:54:44 UTC

The master branch has been updated by Kevin Buettner <kevinb@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=94c265d790b88e691b9ea0173b7000a54a3eb0a0

commit 94c265d790b88e691b9ea0173b7000a54a3eb0a0
Author: Kevin Buettner <kevinb@redhat.com>
Date:   Wed Mar 4 17:42:43 2020 -0700

    Test ability to access unwritten-to mmap data in core file
    
    gdb/testsuite/ChangeLog:
    
            PR corefiles/25631
            * gdb.base/corefile.exp (accessing anonymous, unwritten-to mmap data):
            New test.
            * gdb.base/coremaker.c (buf3): New global.
            (mmapdata): Add mmap call which uses MAP_ANONYMOUS and MAP_PRIVATE
            flags.

Comment 6 Kevin Buettner 2020-07-22 20:20:00 UTC

Final (v5) patch series fixing this bug plus several other core file problems can be found starting here:

https://sourceware.org/pipermail/gdb-patches/2020-July/170686.html

It's upstream now.

Comment 7 Kevin Buettner 2020-07-22 20:22:13 UTC

Closing this bug now.

Comment 8 Sourceware Commits 2020-09-01 01:55:21 UTC

The master branch has been updated by Kevin Buettner <kevinb@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=973695d6bb824a1e724d5ea24e7ece013109dc74

commit 973695d6bb824a1e724d5ea24e7ece013109dc74
Author: Kevin Buettner <kevinb@redhat.com>
Date:   Fri Aug 7 13:07:44 2020 -0700

    Work around incorrect/broken pathnames in NT_FILE note
    
    Luis Machado reported some regressions after I pushed recent core file
    related patches fixing BZ 25631:
    
        FAIL: gdb.base/corefile.exp: backtrace in corefile.exp
        FAIL: gdb.base/corefile.exp: core-file warning-free
        FAIL: gdb.base/corefile.exp: print func2::coremaker_local
        FAIL: gdb.base/corefile.exp: up in corefile.exp
        FAIL: gdb.base/corefile.exp: up in corefile.exp (reinit)
    
    This commit fixes these regressions.  Thanks to Luis for testing
    an earlier version of the patch.  (I was unable to reproduce these
    regressions in various test environments that I created.)
    
    Luis is testing in a docker container which is using the AUFS storage
    driver.  It turns out that the kernel is placing docker host paths in
    the NT_FILE note instead of paths within the container.
    
    I've made a similar docker environment (though apparently not similar
    enough to reproduce the regressions).  This is one of the paths that
    I see mentioned in the warning messages printed while loading the
    core file during NT_FILE note processing - note that I've shortened
    the path component starting with "d07c4":
    
    /var/lib/docker/aufs/diff/d07c4...21/lib/x86_64-linux-gnu/ld-2.27.so
    
    This is a path on the docker host; it does not exist in the
    container.  In the docker container, this is the path:
    
    /lib/x86_64-linux-gnu/ld-2.27.so
    
    My first thought was to disable all NT_FILE mappings when any path was
    found to be bad.  This would have caused GDB to fall back to accessing
    memory using the file stratum as it did before I added the NT_FILE
    note loading code.  After further consideration, I realized that we
    could do better than this.  For file-backed memory access, we can
    still use the NT_FILE mappings when available, and then attempt to
    access memory using the file stratum constrained to those address
    ranges corresponding to the "broken" mappings.
    
    In order to test it, I made some additions to corefile2.exp in which
    the test case's executable is renamed.  The core file is then loaded;
    due to the fact that the executable has been renamed, those mappings
    will be unavailable.  After loading the core file, the executable is
    renamed back to its original name at which point it is loaded using
    GDB's "file" command.  The "interesting" tests are then run.  These
    tests will print out values in file-backed memory regions along with
    mmap'd regions placed within/over the file-backed regions.  Despite
    the fact that the executable could not be found during the NT_FILE
    note processing, these tests still work correctly due to the fact that
    memory is available from the file stratum combined with the fact that
    the broken NT_FILE mappings are used to prevent file-backed access
    outside of the "broken" mappings.
    
    gdb/ChangeLog:
    
            * corelow.c (unordered_set): Include.
            (class core_target): Add field 'm_core_unavailable_mappings'.
            (core_target::build_file_mappings): Print only one warning
            per inaccessible file.  Add unavailable/broken mappings
            to m_core_unavailable_mappings.
            (core_target::xfer_partial): Call...
            (core_target::xfer_memory_via_mappings): New method.
    
    gdb/testsuite/ChangeLog:
    
            * gdb.base/corefile2.exp (renamed binfile): New tests.