[PATCH 3/3] gdb: handle core files with .reg/0 section names

Kevin Buettner kevinb@redhat.com
Sat Jun 10 00:36:01 GMT 2023


On Mon,  5 Jun 2023 10:11:09 +0100
Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> wrote:

> The previous commit added the test gdb.arch/core-file-pid0.exp which
> tests GDB's ability to load a core file containing threads with an
> lwpid of 0, which is something we GDB can encounter when loading a
> vmcore file -- a core file generated by the Linux kernel.  The threads
> with an lwpid of 0 represents idle cores.
> 
> While the previous commit added the test, which confirms GDB doesn't
> crash when confronted with such a core file, there are still some
> problems with GDB's handling of these core files.  These problems all
> originate from the fact that the core file (once opened by bfd)
> contains multiple sections called .reg/0, these sections all
> represents different threads (cpu cores in the original vmcore dump),
> but GDB gets confused and thinks all of these .reg/0 sections are all
> referencing the same thread.
> 
> Here is a GDB session on an x86-64 machine which loads the core file
> from the gdb.arch/core-file-pid0.exp, this core file contains two
> threads, both of which have a pid of 0:
> 
>   $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q
>   (gdb) core-file /tmp/x86_64-pid0-core.core
>   [New process 1]
>   [New process 1]
>   Failed to read a valid object file image from memory.
>   Core was generated by `./segv-mt'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   The current thread has terminated
>   (gdb) info threads
>     Id   Target Id         Frame
>     2    process 1         0x00000000004017c2 in ?? ()
> 
>   The current thread <Thread ID 1> has terminated.  See `help thread'.
>   (gdb) maintenance info sections
>   Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64.
>    [0]      0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS
>    [1]      0x00000000->0x000000d8 at 0x0000039c: .reg/0 HAS_CONTENTS
>    [2]      0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS
>    [3]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/0 HAS_CONTENTS
>    [4]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS
>    [5]      0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS
>    [6]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/0 HAS_CONTENTS
>    [7]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS
>    [8]      0x00000000->0x00000200 at 0x000007cc: .reg2/0 HAS_CONTENTS
>    [9]      0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS
>    [10]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate/0 HAS_CONTENTS
>    [11]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS
>    [12]     0x00000000->0x000000d8 at 0x00000ea4: .reg/0 HAS_CONTENTS
>    [13]     0x00000000->0x00000200 at 0x00000f98: .reg2/0 HAS_CONTENTS
>    [14]     0x00000000->0x00000440 at 0x000011ac: .reg-xstate/0 HAS_CONTENTS
>    [15]     0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS
>    [16]     0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE
>    [17]     0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY
>    [18]     0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS
>    [19]     0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS
>    [20]     0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS
>    [21]     0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY
>    [22]     0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS
>    [23]     0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS
>    [24]     0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS
>    [25]     0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS
>    [26]     0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS
>   (gdb)
> 
> Notice when the core file is first loaded we see two lines like:
> 
>   [New process 1]
> 
> And GDB reports:
> 
>   The current thread has terminated
> 
> Which isn't what we'd expect from a core file -- the core file should
> only contain threads that are live at the point of the crash, one of
> which should be the current thread.  The above message is reported
> because GDB has deleted what we think is the current thread!
> 
> And in the 'info threads' output we are only seeing a single thread,
> again, this is because GDB has deleted one of the threads.
> 
> Finally, the 'maintenance info sections' output shows the cause of all
> our problems, two sections named .reg/0.  When GDB sees the first of
> these it creates a new thread.  But, when we see the second .reg/0 GDB
> tries to create another new thread, but this thread has the same
> ptid_t as the first thread, so GDB deletes the first thread and
> creates the second thread in its place.
> 
> Because both these threads are created with an lwpid of 0 GDB reports
> these are 'New process NN' rather than 'New LWP NN' which is what we
> would normally expect.
> 
> The previous commit includes a little more of the history of GDB
> support in this area, but these problems were discussed on the mailing
> list a while ago in this thread:
> 
>   https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/
> 
> In this commit I propose a solution to these problems.
> 
> What I propose is that GDB should spot when we have .reg/0 sections
> and, when these are found, should rename these sections using some
> unique non-zero lwpid.
> 
> Note in the above output we also have sections like .reg2/0 and
> .reg-xstate/0, these are additional register sets, this commit also
> renumbers these sections inline with their .reg section.
> 
> The user is warned that some section renumbering has been performed.
> 
> GDB takes care to ensure that the new numbers assigned are unique and
> don't clash with any of the pid's that might already be in use --
> remember, in a real vmcore file, 0 is used to indicate an idle core,
> non-idle cores will have the pid of whichever process was running on
> that core, so we don't want GDB to assign an lwpid that clashes with
> an actual pid that is in use in the core file.
> 
> After this commit here's the updated GDB session output:
> 
>   $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q
>   (gdb) core-file /tmp/x86_64-pid0-core.core
>   warning: found threads with pid 0, assigned replacement Target Ids: LWP 1, LWP 2
>   [New LWP 1]
>   [New LWP 2]
>   Failed to read a valid object file image from memory.
>   Core was generated by `./segv-mt'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   #0  0x00000000004017c2 in ?? ()
>   [Current thread is 1 (LWP 1)]
>   (gdb) info threads
>     Id   Target Id         Frame
>   * 1    LWP 1             0x00000000004017c2 in ?? ()
>     2    LWP 2             0x000000000040dda5 in ?? ()
>   (gdb) maintenance info sections
>   Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64.
>    [0]      0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS
>    [1]      0x00000000->0x000000d8 at 0x0000039c: .reg/1 HAS_CONTENTS
>    [2]      0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS
>    [3]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/1 HAS_CONTENTS
>    [4]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS
>    [5]      0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS
>    [6]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/1 HAS_CONTENTS
>    [7]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS
>    [8]      0x00000000->0x00000200 at 0x000007cc: .reg2/1 HAS_CONTENTS
>    [9]      0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS
>    [10]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate/1 HAS_CONTENTS
>    [11]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS
>    [12]     0x00000000->0x000000d8 at 0x00000ea4: .reg/2 HAS_CONTENTS
>    [13]     0x00000000->0x00000200 at 0x00000f98: .reg2/2 HAS_CONTENTS
>    [14]     0x00000000->0x00000440 at 0x000011ac: .reg-xstate/2 HAS_CONTENTS
>    [15]     0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS
>    [16]     0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE
>    [17]     0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY
>    [18]     0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS
>    [19]     0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS
>    [20]     0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS
>    [21]     0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY
>    [22]     0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS
>    [23]     0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS
>    [24]     0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS
>    [25]     0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS
>    [26]     0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS
>   (gdb)
> 
> Notice the new warning which is issued when the core file is being
> loaded.  The threads are announced as '[New LWP NN]', and we see two
> threads in the 'info threads' output.  The 'maintenance info sections'
> output shows the result of the section renaming.
> 
> The gdb.arch/core-file-pid0.exp test has been update to check for the
> improved GDB output.
> ---
>  gdb/corelow.c                             | 150 ++++++++++++++++++++++
>  gdb/testsuite/gdb.arch/core-file-pid0.exp |  12 +-
>  2 files changed, 161 insertions(+), 1 deletion(-)

Another great explanation!

LGTM.

Reviewed-by: Kevin Buettner <kevinb@redhat.com>



More information about the Gdb-patches mailing list