[PATCH 3/3] gdb: handle core files with .reg/0 section names
Kevin Buettner
kevinb@redhat.com
Sat Jun 10 00:36:01 GMT 2023
On Mon, 5 Jun 2023 10:11:09 +0100
Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> wrote:
> The previous commit added the test gdb.arch/core-file-pid0.exp which
> tests GDB's ability to load a core file containing threads with an
> lwpid of 0, which is something we GDB can encounter when loading a
> vmcore file -- a core file generated by the Linux kernel. The threads
> with an lwpid of 0 represents idle cores.
>
> While the previous commit added the test, which confirms GDB doesn't
> crash when confronted with such a core file, there are still some
> problems with GDB's handling of these core files. These problems all
> originate from the fact that the core file (once opened by bfd)
> contains multiple sections called .reg/0, these sections all
> represents different threads (cpu cores in the original vmcore dump),
> but GDB gets confused and thinks all of these .reg/0 sections are all
> referencing the same thread.
>
> Here is a GDB session on an x86-64 machine which loads the core file
> from the gdb.arch/core-file-pid0.exp, this core file contains two
> threads, both of which have a pid of 0:
>
> $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q
> (gdb) core-file /tmp/x86_64-pid0-core.core
> [New process 1]
> [New process 1]
> Failed to read a valid object file image from memory.
> Core was generated by `./segv-mt'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> The current thread has terminated
> (gdb) info threads
> Id Target Id Frame
> 2 process 1 0x00000000004017c2 in ?? ()
>
> The current thread <Thread ID 1> has terminated. See `help thread'.
> (gdb) maintenance info sections
> Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64.
> [0] 0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS
> [1] 0x00000000->0x000000d8 at 0x0000039c: .reg/0 HAS_CONTENTS
> [2] 0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS
> [3] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/0 HAS_CONTENTS
> [4] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS
> [5] 0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS
> [6] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/0 HAS_CONTENTS
> [7] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS
> [8] 0x00000000->0x00000200 at 0x000007cc: .reg2/0 HAS_CONTENTS
> [9] 0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS
> [10] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate/0 HAS_CONTENTS
> [11] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS
> [12] 0x00000000->0x000000d8 at 0x00000ea4: .reg/0 HAS_CONTENTS
> [13] 0x00000000->0x00000200 at 0x00000f98: .reg2/0 HAS_CONTENTS
> [14] 0x00000000->0x00000440 at 0x000011ac: .reg-xstate/0 HAS_CONTENTS
> [15] 0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS
> [16] 0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE
> [17] 0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY
> [18] 0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS
> [19] 0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS
> [20] 0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS
> [21] 0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY
> [22] 0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS
> [23] 0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS
> [24] 0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS
> [25] 0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS
> [26] 0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS
> (gdb)
>
> Notice when the core file is first loaded we see two lines like:
>
> [New process 1]
>
> And GDB reports:
>
> The current thread has terminated
>
> Which isn't what we'd expect from a core file -- the core file should
> only contain threads that are live at the point of the crash, one of
> which should be the current thread. The above message is reported
> because GDB has deleted what we think is the current thread!
>
> And in the 'info threads' output we are only seeing a single thread,
> again, this is because GDB has deleted one of the threads.
>
> Finally, the 'maintenance info sections' output shows the cause of all
> our problems, two sections named .reg/0. When GDB sees the first of
> these it creates a new thread. But, when we see the second .reg/0 GDB
> tries to create another new thread, but this thread has the same
> ptid_t as the first thread, so GDB deletes the first thread and
> creates the second thread in its place.
>
> Because both these threads are created with an lwpid of 0 GDB reports
> these are 'New process NN' rather than 'New LWP NN' which is what we
> would normally expect.
>
> The previous commit includes a little more of the history of GDB
> support in this area, but these problems were discussed on the mailing
> list a while ago in this thread:
>
> https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/
>
> In this commit I propose a solution to these problems.
>
> What I propose is that GDB should spot when we have .reg/0 sections
> and, when these are found, should rename these sections using some
> unique non-zero lwpid.
>
> Note in the above output we also have sections like .reg2/0 and
> .reg-xstate/0, these are additional register sets, this commit also
> renumbers these sections inline with their .reg section.
>
> The user is warned that some section renumbering has been performed.
>
> GDB takes care to ensure that the new numbers assigned are unique and
> don't clash with any of the pid's that might already be in use --
> remember, in a real vmcore file, 0 is used to indicate an idle core,
> non-idle cores will have the pid of whichever process was running on
> that core, so we don't want GDB to assign an lwpid that clashes with
> an actual pid that is in use in the core file.
>
> After this commit here's the updated GDB session output:
>
> $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q
> (gdb) core-file /tmp/x86_64-pid0-core.core
> warning: found threads with pid 0, assigned replacement Target Ids: LWP 1, LWP 2
> [New LWP 1]
> [New LWP 2]
> Failed to read a valid object file image from memory.
> Core was generated by `./segv-mt'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x00000000004017c2 in ?? ()
> [Current thread is 1 (LWP 1)]
> (gdb) info threads
> Id Target Id Frame
> * 1 LWP 1 0x00000000004017c2 in ?? ()
> 2 LWP 2 0x000000000040dda5 in ?? ()
> (gdb) maintenance info sections
> Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64.
> [0] 0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS
> [1] 0x00000000->0x000000d8 at 0x0000039c: .reg/1 HAS_CONTENTS
> [2] 0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS
> [3] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/1 HAS_CONTENTS
> [4] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS
> [5] 0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS
> [6] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/1 HAS_CONTENTS
> [7] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS
> [8] 0x00000000->0x00000200 at 0x000007cc: .reg2/1 HAS_CONTENTS
> [9] 0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS
> [10] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate/1 HAS_CONTENTS
> [11] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS
> [12] 0x00000000->0x000000d8 at 0x00000ea4: .reg/2 HAS_CONTENTS
> [13] 0x00000000->0x00000200 at 0x00000f98: .reg2/2 HAS_CONTENTS
> [14] 0x00000000->0x00000440 at 0x000011ac: .reg-xstate/2 HAS_CONTENTS
> [15] 0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS
> [16] 0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE
> [17] 0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY
> [18] 0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS
> [19] 0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS
> [20] 0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS
> [21] 0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY
> [22] 0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS
> [23] 0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS
> [24] 0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS
> [25] 0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS
> [26] 0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS
> (gdb)
>
> Notice the new warning which is issued when the core file is being
> loaded. The threads are announced as '[New LWP NN]', and we see two
> threads in the 'info threads' output. The 'maintenance info sections'
> output shows the result of the section renaming.
>
> The gdb.arch/core-file-pid0.exp test has been update to check for the
> improved GDB output.
> ---
> gdb/corelow.c | 150 ++++++++++++++++++++++
> gdb/testsuite/gdb.arch/core-file-pid0.exp | 12 +-
> 2 files changed, 161 insertions(+), 1 deletion(-)
Another great explanation!
LGTM.
Reviewed-by: Kevin Buettner <kevinb@redhat.com>
More information about the Gdb-patches
mailing list