Bug 11608 - gcore does not support build-id
Summary: gcore does not support build-id
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: corefiles (show other bugs)
Version: 7.5
: P2 normal
Target Milestone: 7.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 16092
  Show dependency treegraph
 
Reported: 2010-05-17 13:17 UTC by Jan Kratochvil
Modified: 2019-04-25 18:29 UTC (History)
5 users (show)

See Also:
Host:
Target: x86_64-fedora12-linux-gnu
Build:
Last reconfirmed: 2013-11-13 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Kratochvil 2010-05-17 13:17:55 UTC
gcore omits whole readonly-executable-code segments.

It should dump their first page so that:
eu-unstrip -n --core=corefile
works as for the kernel-dumped core files.

GDB:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz
  Flg Align
  LOAD           0x000b10 0x0000000000400000 0x0000000000000000 0x000000
0x006000 R E 0x1

Linux kernel:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz
  Flg Align
  LOAD           0x001000 0x0000000000400000 0x0000000000000000 0x001000
0x006000 R E 0x1000
Comment 1 Jan Kratochvil 2012-08-09 08:41:34 UTC
In practice it works now although it depends on something accidentally happening in Linux kernel:

echo 'const int i[2000]={0};void _start(void){}'|gcc -Wall -nostdlib -fno-asynchronous-unwind-tables -Wl,--build-id -x c -;gdb -nx ./a.out -ex 'b *_start' -ex r -ex 'gcore core' -ex 'set confirm no' -ex q;eu-unstrip -n --core=core
[...]
0x400000+0x400000 2fb48d92cfa19eb24524f14211565853e3da3deb@0x400284 - - [exe]

It works since:
http://sourceware.org/ml/gdb-patches/2012-08/msg00225.html

But it is more accidental:
$ cat /proc/22440/smaps
00400000-00401000 r-xp 00000000 fd:02 15079087 /home/jkratoch/t/a.out
Shared_Dirty:          0 kB
Private_Dirty:         4 kB
Anonymous:             4 kB
Swap:                  0 kB

Despite it is r-x Linux kernel had to write there some data - see Private_Dirty and Anonymous.  I do not know why, Linux kernel hackers could advice.

gcc (GCC) 4.7.2 20120809 (prerelease)
GNU gdb (GDB) 7.5.50.20120809-cvs
binutils-2.22.52.0.4-8.fc18.x86_64
kernel-3.4.6-1.fc16.x86_64

The right fix would be to see the '[exe]' line even with gdb-7.5 or earlier (not FSF GDB HEAD where is committed the patch for PR 11804 above).  Earlier GDBs did not pay attention to the 4 lines in smaps and thus they did not dump the build-id page so that the '[exe]' line was not visible.
Comment 2 Jan Kratochvil 2013-11-13 06:23:54 UTC
At least with Fedora 19 x86_64 /usr/bin/sleep it does not work:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000fc4 0x0000000000400000 0x0000000000000000
                 0x0000000000000000 0x0000000000007000  R E    1

Shared libraries also do not have the first page dumped:
/lib64/libc.so.6:
  LOAD           0x0000000000047fc4 0x0000003347200000 0x0000000000000000
                 0x0000000000000000 0x00000000001b6000  R E    1

Although some shared libraries have it dumped:
/lib64/ld-linux-x86-64.so.2
  LOAD           0x0000000000023fc4 0x0000003346e00000 0x0000000000000000
                 0x0000000000021000 0x0000000000021000  R E    1

Bug 16092 is related.
Comment 3 Sourceware Commits 2019-04-25 18:23:54 UTC
The master branch has been updated by Sergio Durigan Junior <sergiodj@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=57e5e645010430b3d73f8c6a757d09f48dc8f8d5

commit 57e5e645010430b3d73f8c6a757d09f48dc8f8d5
Author: Sergio Durigan Junior <sergiodj@redhat.com>
Date:   Tue Apr 23 18:17:57 2019 -0400

    Implement dump of mappings with ELF headers by gcore
    
    This patch has a long story, but it all started back in 2015, with
    commit df8411da087dc05481926f4c4a82deabc5bc3859 ("Implement support
    for checking /proc/PID/coredump_filter").  The purpose of that commit
    was to bring GDB's corefile generation closer to what the Linux kernel
    does.  However, back then, I did not implement the full support for
    the dumping of memory mappings containing ELF headers (like mappings
    of DSOs or executables).  These mappings were being dumped most of
    time, though, because the default value of /proc/PID/coredump_filter
    is 0x33, which would cause anonymous private mappings (DSOs/executable
    code mappings have this type) to be dumped.  Well, until something
    happened on binutils...
    
    A while ago, I noticed something strange was happening with one of our
    local testcases on Fedora GDB: it was failing due to some strange
    build-id problem.  On Fedora GDB, we (unfortunately) carry a bunch of
    "local" patches, and some of these patches actually extend upstream's
    build-id support in order to generate more useful information for the
    user of a Fedora system (for example, when the user loads a corefile
    into GDB, we detect whether the executable that generated that
    corefile is present, and if it's not we issue a warning suggesting
    that it should be installed, while also providing the build-id of the
    executable).  A while ago, Fedora GDB stopped printing those warnings.
    
    I wanted to investigate this right away, and spent some time trying to
    determine what was going on, but other things happened and I got
    sidetracked.  Meanwhile, the bug started to be noticed by some of our
    users, and its priority started changing.  Then, someone on IRC also
    mentioned the problem, and when I tried helping him, I noticed he
    wasn't running Fedora.  Hm...  So maybe the bug was *also* present
    upstream.
    
    After "some" time investigating, and with a lot of help from Keith and
    others, I was finally able to determine that yes, the bug is also
    present upstream, and that even though it started with a change in ld,
    it is indeed a GDB issue.
    
    So, as I said, the problem started with binutils, more specifically
    after the following commit was pushed:
    
      commit f6aec96dce1ddbd8961a3aa8a2925db2021719bb
      Author: H.J. Lu <hjl.tools@gmail.com>
      Date:   Tue Feb 27 11:34:20 2018 -0800
    
          ld: Add --enable-separate-code
    
    This commit makes ld use "-z separate-code" by default on x86-64
    machines.  What this means is that code pages and data pages are now
    separated in the binary, which is confusing GDB when it tries to decide
    what to dump.
    
    BTW, Fedora 28 binutils doesn't have this code, which means that
    Fedora 28 GDB doesn't have the problem.  From Fedora 29 on, binutils
    was rebased and incorporated the commit above, which started causing
    Fedora GDB to fail.
    
    Anyway, the first thing I tried was to pass "-z max-page-size" and
    specify a bigger page size (I saw a patch that did this and was
    proposed to Linux, so I thought it might help).  Obviously, this
    didn't work, because the real "problem" is that ld will always use
    separate pages for code and data.  So I decided to look into how GDB
    dumped the pages, and that's where I found the real issue.
    
    What happens is that, because of "-z separate-code", the first two pages
    of the ELF binary are (from /proc/PID/smaps):
    
      00400000-00401000 r--p 00000000 fc:01 799548                             /file
      Size:                  4 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      Rss:                   4 kB
      Pss:                   4 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:         4 kB
      Private_Dirty:         0 kB
      Referenced:            4 kB
      Anonymous:             0 kB
      LazyFree:              0 kB
      AnonHugePages:         0 kB
      ShmemPmdMapped:        0 kB
      Shared_Hugetlb:        0 kB
      Private_Hugetlb:       0 kB
      Swap:                  0 kB
      SwapPss:               0 kB
      Locked:                0 kB
      THPeligible:    0
      VmFlags: rd mr mw me dw sd
      00401000-00402000 r-xp 00001000 fc:01 799548                             /file
      Size:                  4 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      Rss:                   4 kB
      Pss:                   4 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:         0 kB
      Private_Dirty:         4 kB
      Referenced:            4 kB
      Anonymous:             4 kB
      LazyFree:              0 kB
      AnonHugePages:         0 kB
      ShmemPmdMapped:        0 kB
      Shared_Hugetlb:        0 kB
      Private_Hugetlb:       0 kB
      Swap:                  0 kB
      SwapPss:               0 kB
      Locked:                0 kB
      THPeligible:    0
      VmFlags: rd ex mr mw me dw sd
    
    Whereas before, we had only one:
    
      00400000-00401000 r-xp 00000000 fc:01 798593                             /file
      Size:                  4 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      Rss:                   4 kB
      Pss:                   4 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:         0 kB
      Private_Dirty:         4 kB
      Referenced:            4 kB
      Anonymous:             4 kB
      LazyFree:              0 kB
      AnonHugePages:         0 kB
      ShmemPmdMapped:        0 kB
      Shared_Hugetlb:        0 kB
      Private_Hugetlb:       0 kB
      Swap:                  0 kB
      SwapPss:               0 kB
      Locked:                0 kB
      THPeligible:    0
      VmFlags: rd ex mr mw me dw sd
    
    Notice how we have "Anonymous" data mapped into the page.  This will be
    important.
    
    So, the way GDB decides which pages it should dump has been revamped
    by my patch in 2015, and now it takes the contents of
    /proc/PID/coredump_filter into account.  The default value for Linux
    is 0x33, which means:
    
      Dump anonymous private, anonymous shared, ELF headers and HugeTLB
      private pages.
    
    Or:
    
      filter_flags filterflags = (COREFILTER_ANON_PRIVATE
    			      | COREFILTER_ANON_SHARED
    			      | COREFILTER_ELF_HEADERS
    			      | COREFILTER_HUGETLB_PRIVATE);
    
    Now, it is important to keep in mind that GDB doesn't always have *all*
    of the necessary information to exactly determine the type of a page, so
    the whole algorithm is based on heuristics (you can take a look at
    linux-tdep.c:dump_mapping_p and
    linux-tdep.c:linux_find_memory_regions_full for more info).
    
    Before the patch to make ld use "-z separate-code", the (single) page
    containing data and code was being flagged as an anonymous (due to the
    non-zero "Anonymous:" field) private (due to the "r-xp" permission),
    which means that it was being dumped into the corefile.  That's why it
    was working fine.
    
    Now, as you can imagine, when "-z separate-code" is used, the *data*
    page (which is where the ELF notes are, including the build-id one) now
    doesn't have any "Anonymous:" mapping, so the heuristic is flagging it
    as file-backed private, which is *not* dumped by default.
    
    The next question I had to answer was: how come a corefile generated by
    the Linux kernel was correct?  Well, the answer is that GDB, unlike
    Linux, doesn't actually implement the COREFILTER_ELF_HEADERS support.
    On Linux, even though the data page is also treated as a file-backed
    private mapping, it is also checked to see if there are any ELF headers
    in the page, and then, because we *do* have ELF headers there, it is
    dumped.
    
    So, after more time trying to think of ways to fix this, I was able to
    implement an algorithm that reads the first few bytes of the memory
    mapping being processed, and checks to see if the ELF magic code is
    present.  This is basically what Linux does as well, except that, if
    it finds the ELF magic code, it just dumps one page to the corefile,
    whereas GDB will dump the whole mapping.  But I don't think that's a
    big issue, to be honest.
    
    It's also important to explain that we *only* perform the ELF magic
    code check if:
    
      - The algorithm has decided *not* to dump the mapping so far, and;
      - The mapping is private, and;
      - The mapping's offset is zero, and;
      - The user has requested us to dump mappings with ELF headers.
    
    IOW, we're not going to blindly check every mapping.
    
    As for the testcase, I struggled even more trying to write it.  Since
    our build-id support on upstream GDB is not very extensive, it's not
    really possible to determine whether a corefile contains build-id
    information or not just by using GDB.  So, after thinking a lot about
    the problem, I decided to rely on an external tool, eu-unstrip, in
    order to verify whether the dump was successful.  I verified the test
    here on my machine, and everything seems to work as expected (i.e., it
    fails without the patch, and works with the patch applied).  We are
    working hard to upstream our "local" Fedora GDB patches, and we intend
    to submit our build-id extension patches "soon", so hopefully we'll be
    able to use GDB itself to perform this verification.
    
    I built and regtested this on the BuildBot, and no problems were
    found.
    
    gdb/ChangeLog:
    2019-04-25  Sergio Durigan Junior  <sergiodj@redhat.com>
    
    	PR corefiles/11608
    	PR corefiles/18187
    	* linux-tdep.c (dump_mapping_p): Add new parameters ADDR and
    	OFFSET.  Verify if current mapping contains an ELF header.
    	(linux_find_memory_regions_full): Adjust call to
    	dump_mapping_p.
    
    gdb/testsuite/ChangeLog:
    2019-04-25  Sergio Durigan Junior  <sergiodj@redhat.com>
    
    	PR corefiles/11608
    	PR corefiles/18187
    	* gdb.base/coredump-filter-build-id.exp: New file.
Comment 4 Sergio Durigan Junior 2019-04-25 18:29:01 UTC
Should be fixed now.