Bug 26828 - SIGSEGV in follow_die_offset dwarf2/read.c:22950
Summary: SIGSEGV in follow_die_offset dwarf2/read.c:22950
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 10.1
: P2 normal
Target Milestone: 10.2
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-02 10:42 UTC by Nils Gladitz
Modified: 2021-02-23 23:32 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2020-11-03 00:00:00


Attachments
reproducer (205.10 KB, application/x-xz)
2020-11-03 11:36 UTC, Nils Gladitz
Details
second reproducer (175.49 KB, application/x-xz)
2020-11-08 16:16 UTC, Nils Gladitz
Details
Proposed patch series (4.85 KB, patch)
2020-11-13 17:22 UTC, Simon Marchi
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nils Gladitz 2020-11-02 10:42:52 UTC
I am running GDB 10.1 on x86_64 GNU/Linux targeting ARM.

While debugging a core file with "-i=mi" and repeatedly running e.g. "-stack-list-variables --thread 1 --frame 0 --simple-values" results in a segmentation fault.

The first call returns "frame" and "variables" information.
The second call returns just the "variables".
The third call segfaults in "follow_die_offset" (dwarf2/read.c:22950).

The segfaulting line reads:
   target_cu->ancestor = cu;

target_cu is apparently 0.

I can reproduce the issue with current master (e1f57067b162cba9f39e087726c7a2f2cfaae96f) too.

With GDB 9.1 the third (and subsequent) calls don't seem to segfault.
Comment 1 Simon Marchi 2020-11-02 13:25:31 UTC
Hi Nils,

Can you provide some files to reproduce the issue?  Otherwise it's not really possible to look into it.

Simon
Comment 2 Nils Gladitz 2020-11-02 15:21:24 UTC
Hello Simon,

thank you for looking into this.

"git bisect" tells me the issue might exist since:
  17ee85fc2af74471e8c57502714a32bbeac5f1ae
  "Share DWARF partial symtabs"

I don't have any reasonably shareable files yet.

I'll try to construct / reduce a test case somehow but I don't know how long that'll take or how succesfull I'll be; not even entirely sure how I'll go about constructing one yet.
Comment 3 Simon Marchi 2020-11-02 15:36:49 UTC
Ok, thanks.  The problematic commit is one I worked on, so I'll take a look once you upload a reproducer.
Comment 4 Nils Gladitz 2020-11-03 11:36:32 UTC
Created attachment 12941 [details]
reproducer

I've attached an archive with a test case that reproduces the issue for me.

It contains an executable (Foo), core-file (Foo-core), libpthread.so.0 from the arm target system and a shell script (repo.sh) that runs arm-linux-gnueabi-gdb with a sequence of commands that reproduces the crash on my system(s).

I had to increase the number of queries to trigger the segmentation fault but am hoping the test case is still deterministic and does not depend on anything else specific to my system.
Comment 5 Simon Marchi 2020-11-03 13:37:20 UTC
Whatever I do, I can't get a backtrace using your core.  I always get a variation of:

(gdb) bt
#0  0xb6c3809c in pthread_mutex_trylock () from /lib/libpthread.so.0
Backtrace stopped: Cannot access memory at address 0x120


I'm guessing that in your development sysroot, you have debug info for /lib/libpthread.so.0, and that makes GDB able to unwind the frames in that library.  It's probably a separate debug file.  Can you check, and if so, provide it?
Comment 6 Simon Marchi 2020-11-03 14:03:14 UTC
Actually, I made some progress.  The solib-search-path isn't sufficient, I think GDB opens /lib/libpthread.so.0 on my system.  It would be more appropriate to use "set sysroot" in this situation.  When setting the sysroot, to the base of what you provided, then the correct /lib/libpthread.so.0 is loaded by GDB.

Another note for those trying to reproduce: if you have a --enable-targets=all GDB, it will (erroneously) detect the core as having the Symbian osabi and then tell you that it doesn't know how to handle cores.  Use "set osabi GNU/Linux" to force it to use that osabi.

So now the status is that I do get a crash, but I think it's not the same as you, I think it's earlier.  I build with --enable-ubsan, and that looks like an undefined behabior sanitizer message

$ ../gdb -nx --data-directory=../data-directory                                     
(gdb) set osabi GNU/Linux 
(gdb) set sysroot /home/simark/build/binutils-gdb/gdb/repo
(gdb) file Foo
Reading symbols from Foo...
(gdb) core-file Foo-core 
warning: Can't open file /media/mmcblk0p1/install/usr/bin/Foo during file-backed mapping note processing
warning: Can't open file /lib/libm-2.21.so during file-backed mapping note processing
warning: Can't open file /lib/libpthread-2.21.so during file-backed mapping note processing
warning: Can't open file /lib/libgcc_s.so.1 during file-backed mapping note processing
warning: Can't open file /media/mmcblk0p1/install/usr/lib/libstdc++.so.6 during file-backed mapping note processing
warning: Can't open file /lib/libc-2.21.so during file-backed mapping note processing
warning: Can't open file /lib/ld-2.21.so during file-backed mapping note processing
[New LWP 29367]
[New LWP 29368]
warning: Could not load shared library symbols for 5 libraries, e.g. /lib/libc.so.6.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `./Foo'.
Program terminated with signal SIGABRT, Aborted.
#0  0xb6c3809c in pthread_cond_wait () from /home/simark/build/binutils-gdb/gdb/repo/lib/libpthread.so.0
[Current thread is 1 (LWP 29367)]
(gdb) bt
/home/simark/src/binutils-gdb/gdb/arm-tdep.c:1551:30: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
Comment 7 Simon Marchi 2020-11-03 14:32:27 UTC
I fixed the undefined shift behavior problem, and was able to reproduce your problem.  This is the command line you provided that I adapted:

../gdb -nx --data-directory=../data-directory -i mi Foo <<EOF
set sysroot /home/simark/build/binutils-gdb/gdb/repo
set osabi GNU/Linux
core-file Foo-core
-stack-list-variables --thread 2 --frame 2 --simple-values
-stack-list-variables --thread 2 --frame 2 --simple-values
-stack-list-variables --thread 2 --frame 2 --simple-values
-stack-list-variables --thread 2 --frame 2 --simple-values
-stack-list-variables --thread 2 --frame 2 --simple-values
-stack-list-variables --thread 2 --frame 2 --simple-values
EOF


This is what I get:

$ ./repo.sh
warning: Found custom handler for signal 7 (Bus error) preinstalled.
warning: Found custom handler for signal 8 (Floating point exception) preinstalled.
warning: Found custom handler for signal 11 (Segmentation fault) preinstalled.
Some signal dispositions inherited from the environment (SIG_DFL/SIG_IGN)
won't be propagated to spawned programs.
=thread-group-added,id="i1"
~"GNU gdb (GDB) 11.0.50.20201103-git\n"
~"Copyright (C) 2020 Free Software Foundation, Inc.\n"
~"License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law."
~"\nType \"show copying\" and \"show warranty\" for details.\n"
~"This GDB was configured as \"x86_64-pc-linux-gnu\".\n"
~"Type \"show configuration\" for configuration details.\n"
~"For bug reporting instructions, please see:\n"
~"<https://www.gnu.org/software/gdb/bugs/>.\n"
~"Find the GDB manual and other documentation resources online at:\n    <http://www.gnu.org/software/gdb/documentation/>."
~"\n\n"
~"For help, type \"help\".\n"
~"Type \"apropos word\" to search for commands related to \"word\"...\n"
~"Reading symbols from Foo...\n"
(gdb) 
&"set sysroot /home/simark/build/binutils-gdb/gdb/repo\n"
=cmd-param-changed,param="sysroot",value="/home/simark/build/binutils-gdb/gdb/repo"
^done
(gdb) 
&"set osabi GNU/Linux\n"
=cmd-param-changed,param="osabi",value="GNU/Linux"
^done
(gdb) 
&"core-file Foo-core\n"
&"warning: Can't open file /media/mmcblk0p1/install/usr/bin/Foo during file-backed mapping note processing\n"
&"warning: Can't open file /lib/libm-2.21.so during file-backed mapping note processing\n"
&"warning: Can't open file /lib/libpthread-2.21.so during file-backed mapping note processing\n"
&"warning: Can't open file /lib/libgcc_s.so.1 during file-backed mapping note processing\n"
&"warning: Can't open file /media/mmcblk0p1/install/usr/lib/libstdc++.so.6 during file-backed mapping note processing\n"
&"warning: Can't open file /lib/libc-2.21.so during file-backed mapping note processing\n"
&"warning: Can't open file /lib/ld-2.21.so during file-backed mapping note processing\n"
=thread-group-started,id="i1",pid="29367"
=thread-created,id="1",group-id="i1"
~"[New LWP 29367]\n"
=thread-created,id="2",group-id="i1"
~"[New LWP 29368]\n"
=library-loaded,id="/lib/libc.so.6",target-name="/lib/libc.so.6",host-name="/lib/libc.so.6",symbols-loaded="0",thread-group="i1",ranges=[{}]
=library-loaded,id="/media/mmcblk0p1/install/usr/bin/../lib/libstdc++.so.6",target-name="/media/mmcblk0p1/install/usr/bin/../lib/libstdc++.so.6",host-name="/media/mmcblk0p1/install/usr/bin/../lib/libstdc++.so.6",symbols-loaded="0",thread-group="i1",ranges=[{}]
=library-loaded,id="/lib/libgcc_s.so.1",target-name="/lib/libgcc_s.so.1",host-name="/lib/libgcc_s.so.1",symbols-loaded="0",thread-group="i1",ranges=[{}]
=library-loaded,id="/lib/libpthread.so.0",target-name="/lib/libpthread.so.0",host-name="/home/simark/build/binutils-gdb/gdb/repo/lib/libpthread.so.0",symbols-loaded="0",thread-group="i1",ranges=[{from="0xb6c30160",to="0xb6c3ec88"}]
=library-loaded,id="/lib/ld-linux.so.3",target-name="/lib/ld-linux.so.3",host-name="/lib/ld-linux.so.3",symbols-loaded="0",thread-group="i1",ranges=[{}]
=library-loaded,id="/lib/libm.so.6",target-name="/lib/libm.so.6",host-name="/lib/libm.so.6",symbols-loaded="0",thread-group="i1",ranges=[{}]
&"warning: Could not load shared library symbols for 5 libraries, e.g. /lib/libc.so.6.\nUse the \"info sharedlibrary\" command to see the complete listing.\nDo you need \"set solib-search-path\" or \"set sysroot\"?"
&"\n"
&"warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.\n"
&"warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.\n"
~"Core was generated by `./Foo'.\n"
~"Program terminated with signal SIGABRT, Aborted.\n"
~"#0  0xb6c3809c in pthread_cond_wait () from /home/simark/build/binutils-gdb/gdb/repo/lib/libpthread.so.0\n"
~"[Current thread is 1 (LWP 29367)]\n"
^done
(gdb) 
^done,variables=[{name="lock",arg="1",type="boost::asio::detail::conditionally_enabled_mutex::scoped_lock &",value="<synthetic pointer>: {<boost::asio::detail::noncopyable> = {<No data fields>}, mutex_ = @0x2cf50, locked_ = true}"},{name="this",arg="1",type="boost::asio::detail::conditionally_enabled_event * const",value="0x2cf70"}]
(gdb) 
^done,variables=[{name="lock",arg="1",type="boost::asio::detail::conditionally_enabled_mutex::scoped_lock &",value="<synthetic pointer>: {<boost::asio::detail::noncopyable> = {<No data fields>}, mutex_ = @0x2cf50, locked_ = true}"},{name="this",arg="1",type="boost::asio::detail::conditionally_enabled_event * const",value="0x2cf70"}]
(gdb) 
^done,variables=[{name="lock",arg="1",type="boost::asio::detail::conditionally_enabled_mutex::scoped_lock &",value="<synthetic pointer>: {<boost::asio::detail::noncopyable> = {<No data fields>}, mutex_ = @0x2cf50, locked_ = true}"},{name="this",arg="1",type="boost::asio::detail::conditionally_enabled_event * const",value="0x2cf70"}]
(gdb) 
^done,variables=[{name="lock",arg="1",type="boost::asio::detail::conditionally_enabled_mutex::scoped_lock &",value="<synthetic pointer>: {<boost::asio::detail::noncopyable> = {<No data fields>}, mutex_ = @0x2cf50, locked_ = true}"},{name="this",arg="1",type="boost::asio::detail::conditionally_enabled_event * const",value="0x2cf70"}]
(gdb) 
^done,variables=[{name="lock",arg="1",type="boost::asio::detail::conditionally_enabled_mutex::scoped_lock &",value="<synthetic pointer>: {<boost::asio::detail::noncopyable> = {<No data fields>}, mutex_ = @0x2cf50, locked_ = true}"},{name="this",arg="1",type="boost::asio::detail::conditionally_enabled_event * const",value="0x2cf70"}]
(gdb) 
/home/simark/src/binutils-gdb/gdb/dwarf2/read.c:22959:25: runtime error: member access within null pointer of type 'struct dwarf2_cu'
Comment 8 Nils Gladitz 2020-11-03 15:02:47 UTC
Thank you for putting in the extra work in getting this to reproduce.
Quite relieved that it isn't only reproducible on my system.

libpthread.so.0 is located elsewhere on my system which is why I presumably didn't trip there.

Also didn't know about "--enable-targets=all" yet!
Comment 9 Simon Marchi 2020-11-03 17:05:40 UTC
Nils, could you try the following patch?

The issue lies with the fact that we have a DIE in a CU A referring to a DIE in a CU B (using DW_AT_abstract_origin).  The DIEs for CU B are not loaded yet (or rather, have been unloaded due to DWARF CU aging stuff), and we try to load them using:

      /* If necessary, add it to the queue and load its DIEs.  */
      if (maybe_queue_comp_unit (cu, per_cu, per_objfile, cu->language))
	load_full_comp_unit (per_cu, per_objfile, per_objfile->get_cu (per_cu),
			     false, cu->language);

Since the CU is already queued for expansion, maybe_queue_comp_unit returns false and we don't call load_full_comp_unit.  I think maybe_queue_comp_unit is the wrong thing to call here, because "queuing for symtab expansion" is unrelated to "loading the DIEs in memory".  We should rather check: "are the DIEs for per_cu loaded into memory yet?  if not do it now", which is what the patch below implements.


@@ -22937,12 +22939,15 @@ follow_die_offset (sect_offset sect_off, int offset_in_dwz,
       per_cu = dwarf2_find_containing_comp_unit (sect_off, offset_in_dwz,
                                                 per_objfile);
 
-      /* If necessary, add it to the queue and load its DIEs.  */
-      if (maybe_queue_comp_unit (cu, per_cu, per_objfile, cu->language))
-       load_full_comp_unit (per_cu, per_objfile, per_objfile->get_cu (per_cu),
-                            false, cu->language);
-
       target_cu = per_objfile->get_cu (per_cu);
+      if (target_cu == nullptr)
+       {
+         load_full_comp_unit (per_cu, per_objfile, per_objfile->get_cu (per_cu),
+                              false, cu->language);
+         target_cu = per_objfile->get_cu (per_cu);
+       }
+
+      gdb_assert (target_cu != nullptr);
     }
   else if (cu->dies == NULL)
     {
Comment 10 Nils Gladitz 2020-11-03 19:45:27 UTC
While the attached test case now seems to pass my original case now fails with:

../../../../binutils-gdb/gdb/dwarf2/read.c:9251: internal-error: void load_full_comp_unit(dwarf2_per_cu_data*, dwarf2_per_objfile*, dwarf2_cu*, bool, language): Assertion `cu->die_hash == NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

Not sure if that means this is a related or an independent issue.
Comment 11 Simon Marchi 2020-11-03 19:52:19 UTC
(In reply to Nils Gladitz from comment #10)
> While the attached test case now seems to pass my original case now fails
> with:
> 
> ../../../../binutils-gdb/gdb/dwarf2/read.c:9251: internal-error: void
> load_full_comp_unit(dwarf2_per_cu_data*, dwarf2_per_objfile*, dwarf2_cu*,
> bool, language): Assertion `cu->die_hash == NULL' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> 
> Not sure if that means this is a related or an independent issue.

It's possibly related.  Is that load_full_comp_unit call the one I touched in the patch, or is it a later one?  Could you provide the backtrace of this new assertion failure?
Comment 12 Nils Gladitz 2020-11-03 20:14:20 UTC
Looks like it comes from a different call:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fc878bb0859 in __GI_abort () at abort.c:79
#2  0x000056229e1f0e76 in dump_core () at ../../../../binutils-gdb/gdb/utils.c:204
#3  0x000056229e1f1799 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x56229f7c09e0 <internal_error_problem>, 
    file=0x56229e7cb208 "../../../../binutils-gdb/gdb/dwarf2/read.c", line=9251, 
    fmt=0x56229e7ca848 "%s: Assertion `%s' failed.", ap=0x7ffc18d82410)
    at ../../../../binutils-gdb/gdb/utils.c:414
#4  0x000056229e1f1890 in internal_verror (
    file=0x56229e7cb208 "../../../../binutils-gdb/gdb/dwarf2/read.c", line=9251, 
    fmt=0x56229e7ca848 "%s: Assertion `%s' failed.", ap=0x7ffc18d82410)
    at ../../../../binutils-gdb/gdb/utils.c:439
#5  0x000056229e685fda in internal_error (
    file=0x56229e7cb208 "../../../../binutils-gdb/gdb/dwarf2/read.c", line=9251, 
    fmt=0x56229e7ca848 "%s: Assertion `%s' failed.") at ../../../../binutils-gdb/gdbsupport/errors.cc:55
#6  0x000056229d7c1103 in load_full_comp_unit (this_cu=0x5622a1d2f220, per_objfile=0x5622a1351a10, 
    existing_cu=0x5622b648d2f0, skip_partial=false, pretend_language=language_minimal)
    at ../../../../binutils-gdb/gdb/dwarf2/read.c:9251
#7  0x000056229d783f35 in load_cu (per_cu=0x5622a1d2f220, per_objfile=0x5622a1351a10, 
    skip_partial=false) at ../../../../binutils-gdb/gdb/dwarf2/read.c:2389
#8  0x000056229d7840ad in dw2_do_instantiate_symtab (per_cu=0x5622a1d2f220, per_objfile=0x5622a1351a10, 
    skip_partial=false) at ../../../../binutils-gdb/gdb/dwarf2/read.c:2420
#9  0x000056229d7c0926 in dwarf2_psymtab::expand_psymtab (this=0x5622a293efa0, objfile=0x5622a1d5f4b0)
    at ../../../../binutils-gdb/gdb/dwarf2/read.c:9183
#10 0x000056229d7bf853 in dwarf2_psymtab::read_symtab (this=0x5622a293efa0, objfile=0x5622a1d5f4b0)
    at ../../../../binutils-gdb/gdb/dwarf2/read.c:9031


in load_cu:
 else
    load_full_comp_unit (per_cu, per_objfile, per_objfile->get_cu (per_cu),
             skip_partial, language_minimal);

  dwarf2_cu *cu = per_objfile->get_cu (per_cu);
  if (cu == nullptr)
    return nullptr;  /* Dummy CU.  */
Comment 13 Simon Marchi 2020-11-03 20:52:31 UTC
Yeah it looks like this is a consequence of my change.  load_cu doesn't like that the CU was already loaded in memory.  The obvious fix would be to make it skip the load if it already is loaded, see patch below.

Although to be sure of what happens, I would need to be able to debug it myself, if you have a change to make a reproducer for this.


diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 7d258f30eba7..150f54335433 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -2383,13 +2383,19 @@ static dwarf2_cu *
 load_cu (dwarf2_per_cu_data *per_cu, dwarf2_per_objfile *per_objfile,
         bool skip_partial)
 {
-  if (per_cu->is_debug_types)
-    load_full_type_unit (per_cu, per_objfile);
-  else
-    load_full_comp_unit (per_cu, per_objfile, per_objfile->get_cu (per_cu),
-                        skip_partial, language_minimal);
-
   dwarf2_cu *cu = per_objfile->get_cu (per_cu);
+
+  if (cu == nullptr)
+    {
+      if (per_cu->is_debug_types)
+       load_full_type_unit (per_cu, per_objfile);
+      else
+       load_full_comp_unit (per_cu, per_objfile, per_objfile->get_cu (per_cu),
+                            skip_partial, language_minimal);
+
+      cu = per_objfile->get_cu (per_cu);
+    }
+
   if (cu == nullptr)
     return nullptr;  /* Dummy CU.  */
Comment 14 Nils Gladitz 2020-11-04 07:43:41 UTC
Thanks, this does seem to fix the issue for me now!

Is this sufficient feedback or do you still want a new reproducer for this?

Creating the first one was quite time consuming (chiseling down a somewhat large project and continuously deploying to ARM) and I think I might have gotten somewhat lucky too. Unsure how much it'll take to create a new reproducer given that I don't understand how my core dumps are specifically triggering the first or the second issue or why this doesn't come up as an issue elsewhere (none of this CU loading / unloading stuff sounds like it would be ARM specific or in any sense rare?).
Comment 15 Simon Marchi 2020-11-04 16:53:53 UTC
A reproducer is always preferable, because I wouldn't be comfortable committing a fix that I don't understand precisely and can't explain, but I understand if you don't have time.
Comment 16 Nils Gladitz 2020-11-08 16:16:06 UTC
Created attachment 12947 [details]
second reproducer
Comment 17 Nils Gladitz 2020-11-08 16:17:15 UTC
I've attached a second test case that reproduces both issues for me (i.e. does not pass with just the first patch applied).
Comment 18 Simon Marchi 2020-11-08 16:45:59 UTC
Awesome, thanks.  I'll take a look when I have a bit of free time.
Comment 19 Simon Marchi 2020-11-10 14:15:19 UTC
Ok, I think I was able to reproduce:

~"/home/simark/src/binutils-gdb/gdb/dwarf2/read.c:9255: internal-error: void load_full_comp_unit(dwarf2_per_cu_data*, dwarf2_per_objfile*, dwarf2_cu*, bool, language): Assertion `cu->die_hash == 
NULL' failed.\nA problem internal to GDB has been detected,\nfurther debugging may prove unreliable."
~"\nQuit this debugging session? "
~"(y or n) [answered Y; input not from terminal]\n"
&"\nThis is a bug, please report it."
&"  For instructions, see:\n"
&"<https://www.gnu.org/software/gdb/bugs/>.\n\n" 
~"/home/simark/src/binutils-gdb/gdb/dwarf2/read.c:9255: internal-error: void load_full_comp_unit(dwarf2_per_cu_data*, dwarf2_per_objfile*, dwarf2_cu*, bool, language): Assertion `cu->die_hash == 
NULL' failed.\nA problem internal to GDB has been detected,\nfurther debugging may prove unreliable."
~"\nCreate a core file of GDB? "
~"(y or n) [answered Y; input not from terminal]\n"
Comment 20 cvs-commit@gcc.gnu.org 2020-11-13 17:03:32 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=9ecab40c77fd414fe408967d0f92f00494aa11b9

commit 9ecab40c77fd414fe408967d0f92f00494aa11b9
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Fri Nov 13 11:58:37 2020 -0500

    gdb/arm: avoid undefined behavior shift when decoding immediate value
    
    When loading the code file provided in PR 26828 and GDB is build with
    UBSan, we get:
    
        Core was generated by `./Foo'.
        Program terminated with signal SIGABRT, Aborted.
        #0  0xb6c3809c in pthread_cond_wait () from /home/simark/build/binutils-gdb/gdb/repo/lib/libpthread.so.0
        [Current thread is 1 (LWP 29367)]
        (gdb) bt
        /home/simark/src/binutils-gdb/gdb/arm-tdep.c:1551:30: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
    
    The sequence of instructions at pthread_cond_wait, in the
    libpthread.so.0 library, contains this instruction with an immediate
    constant with a "rotate amount" of 0:
    
        e24dd044        sub     sp, sp, #68     ; 0x44
    
    Since arm_analyze_prologue shifts by "32 - rotate amount", it does a 32
    bit shift of a 32 bit type, which is caught by UBSan.
    
    Fix it by factoring out the decoding of immediates in a new function,
    arm_expand_immediate.
    
    I added a selftest for arm_analyze_prologue that replicates the
    instruction sequence.  Without the fix, it crashes GDB if it is build
    with --enable-ubsan.
    
    I initially wanted to re-use the abstract_memory_reader class already in
    arm-tdep.c, used to make arm_process_record testable.  However,
    arm_process_record and arm_analyze_prologue don't use the same kind of
    memory reading functions.  arm_process_record uses a function that
    returns an error status on failure while arm_analyze_prologue uses one
    that throws an exception.  Since i didn't want to introduce any other
    behavior change, I decided to just introduce a separate interface
    (arm_instruction_reader).  It is derived from
    abstract_instruction_reader in aarch64-tdep.c.
    
    gdb/ChangeLog:
    
            PR gdb/26835
            * arm-tdep.c (class arm_instruction_reader): New.
            (target_arm_instruction_reader): New.
            (arm_analyze_prologue): Add instruction reader parameter and use
            it.  Use arm_expand_immediate.
            (class target_arm_instruction_reader): Adjust.
            (arm_skip_prologue): Adjust.
            (arm_expand_immediate): New.
            (arm_scan_prologue): Adjust.
            (arm_analyze_prologue_test): New.
            (class test_arm_instruction_reader): New.
    
    Change-Id: Ieb1c1799bd66f8c7421384f44f5c2777b578ff8d
Comment 21 Simon Marchi 2020-11-13 17:22:53 UTC
Created attachment 12955 [details]
Proposed patch series

Hi Nils,

Can you try this patch series see if it's good on your side?  On my side, the second reproducer you sent works correctly, and I see no regression in GDB's testsuite.  The previous patchlets that I asked you to try were not correct, as they broke other things.

I am based on commit 9ecab40c77fd414fe408967d0f92f00494aa11b9.  The series.patch file contains all 4 patches, you can apply them with

  $ git am series.patch

The series breaks some things in the middle only to fix them in the last patch, so in the end I will probably re-order them, but the end result will be the same.
Comment 22 Nils Gladitz 2020-11-13 18:13:31 UTC
Works for me, thanks!
Comment 23 Simon Marchi 2020-11-17 19:13:14 UTC
Series posted here: https://sourceware.org/pipermail/gdb-patches/2020-November/173372.html
Comment 24 cvs-commit@gcc.gnu.org 2021-01-21 02:05:10 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=de53369b2efcae78adf431437cb096c5a0728f96

commit de53369b2efcae78adf431437cb096c5a0728f96
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Wed Jan 20 21:04:43 2021 -0500

    gdb/dwarf: add assertion in maybe_queue_comp_unit
    
    The symptom that leads to this is the crash described in PR 26828:
    
    /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:23478:25: runtime error: member access within null pointer of type 'struct dwarf2_cu'
    
    The line of the crash is the following, in follow_die_offset:
    
      if (target_cu != cu)
        target_cu->ancestor = cu;  <--- HERE
    
    The line that assign nullptr to `target_cu` is the `per_objfile->get_cu`
    call after having called maybe_queue_comp_unit:
    
          /* If necessary, add it to the queue and load its DIEs.  */
          if (maybe_queue_comp_unit (cu, per_cu, per_objfile, cu->language))
            load_full_comp_unit (per_cu, per_objfile, per_objfile->get_cu (per_cu),
                                 false, cu->language);
    
          target_cu = per_objfile->get_cu (per_cu);  <--- HERE
    
    Some background: there is an invariant, documented in
    maybe_queue_comp_unit's doc, that if a CU is queued for expansion
    (present in dwarf2_per_bfd::queue), then its DIEs are loaded in memory.
    "its DIEs are loaded in memory" is a synonym for saying that a dwarf2_cu
    object exists for this CU.  Yet another way to say it is that
    `per_objfile->get_cu (per_cu)` returns something not nullptr for that
    CU.
    
    The crash documented in PR 26828 triggers some hard-to-reproduce
    sequence that ends up violating the invariant:
    
    - dwarf2_fetch_die_type_sect_off gets called for a DIE in CU A
    - The DIE in CU A requires some DIE in CU B
    - follow_die_offset calls maybe_queue_comp_unit.  maybe_queue_comp_unit
      sees CU B is not queued and its DIEs are not loaded, so it enqueues it
      and returns 1 to its caller - meaning "the DIEs are not loaded, you
      should load them" - prompting follow_die_offset to load the DIEs by
      calling load_full_comp_unit
    - Note that CU B is enqueued by maybe_queue_comp_unit even if it has
      already been expanded.  It's a bit useless (and causes trouble, see
      next patch), but that's how it works right now.
    - Since we entered the dwarf2/read code through
      dwarf2_fetch_die_type_sect_off, nothing processes the queue, so we
      exit the dwarf2/read code with CU B still lingering in the queue.
    
    - dwarf2_fetch_die_type_sect_off gets called for a DIE in CU A, again
    - The DIE in CU A requires some DIE in CU B, again
    - This time, maybe_queue_comp_unit sees that CU B is in the queue.
      Because of the invariant that if a CU is in the queue, its DIEs are
      loaded in the memory, it returns 0 to its caller, meaning "you don't
      need to load the DIEs!".
    - That happens to be true, so everything is fine for now.
    
    - Time passes, some things call dwarf2_per_objfile::age_comp_units
      enough so that CU B's age becomes past the dwarf_max_cache_age
      threshold.  age_comp_units proceeds to free CU B's DIEs.  Remember
      that CU B is still lingering in the queue (oops, the invariant just
      got violated).
    
    - dwarf2_fetch_die_type_sect_off gets called for a DIE in CU A, again
    - The DIE in CU A requires some DIE in CU B, again
    - maybe_queue_comp_unit sees that CU B is in the queue, so returns to
      its caller "you don't need to load the DIEs!".  However, we know at
      this point this is false.
    - follow_die_offset doesn't load the DIEs and tries to obtain the DIEs for
      CU B:
    
        target_cu = per_objfile->get_cu (per_cu);
    
      But since they are not loaded, target_cu is nullptr, and we get the
      crash mentioned above a few lines after that.
    
    This patch adds an assertions in maybe_queue_comp_unit to verify the
    invariant, to make sure it doesn't return a falsehood to its caller.
    
    The current patch doesn't fix the issue (the next patch does), but it
    makes it so we catch the problem earlier and get this assertion failure
    instead of a segmentation fault:
    
        /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:9100: internal-error:
            int maybe_queue_comp_unit(dwarf2_cu*, dwarf2_per_cu_data*, dwarf2_per_objfile*, language):
            Assertion `per_objfile->get_cu (per_cu) != nullptr' failed.
    
    gdb/ChangeLog:
    
            PR gdb/26828
            * dwarf2/read.c (maybe_queue_comp_unit): Add assertion.
    
    Change-Id: I4e51bd7bd58773f9fadf480179cbc4bae61508fe
Comment 26 Simon Marchi 2021-02-20 20:40:22 UTC
(In reply to Davide Repetto from comment #25)
> I have a pair of new core dumps, should they be useful:
> 
> gdb coredump:
> https://www.dropbox.com/s/95p12ot0c0ya0l2/core.gdb.1000.
> 32f860ba566941c786efd4ac4c070fb2.4549.1613756314000000.zst?dl=0
> 
> gnote coredump:
> https://www.dropbox.com/s/7a9v9x2x4bl3ksm/core.gnote.1000.
> 0148951bf12149deab85f35229f2233e.108686.1613752706000000.zst?dl=0

Thanks.  I think the bug will be fixed with the series I posted here (which is pending review):

https://sourceware.org/pipermail/gdb-patches/2020-November/173372.html

Is there a chance you can test it against your use case?  If you have trouble applying patches, I can provide a git branch containing the change.
Comment 27 cvs-commit@gcc.gnu.org 2021-02-23 18:39:30 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=616c069a3f1a841e5bc63d20aec8e5b71b499f6c

commit 616c069a3f1a841e5bc63d20aec8e5b71b499f6c
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Tue Feb 23 12:07:10 2021 -0500

    gdb/dwarf: don't enqueue CU in maybe_queue_comp_unit if already expanded
    
    The previous commit log described how items could be left lingering in
    the dwarf2_per_bfd::queue and how that could cause trouble.
    
    This patch fixes the issue by changing maybe_queue_comp_unit so that it
    doesn't put a CU in the to-expand queue if that CU is already expanded.
    This will make it so that when dwarf2_fetch_die_type_sect_off calls
    follow_die_offset and maybe_queue_comp_unit, it won't enqueue the target
    CU, because it will see the CU is already expanded.
    
    This assumes that if a CU is dwarf2_fetch_die_type_sect_off's target CU,
    it will have previously been expanded.  I think it is the case, but I
    can't be 100% sure.  If that's not true, the assertions added in the
    following patch will catch it, and it means we'll have to re-think a bit
    more how things work (it wouldn't be well handled at all today anyway).
    
    This fixes something else in maybe_queue_comp_unit that looks wrong.
    Imagine the DIEs of a CU are loaded in memory, but that CU is not
    expanded.  In that case, maybe_queue_comp_unit will use this early
    return:
    
      /* If the compilation unit is already loaded, just mark it as
         used.  */
      dwarf2_cu *cu = per_objfile->get_cu (per_cu);
      if (cu != nullptr)
        {
          cu->last_used = 0;
          return 0;
        }
    
    ... so the CU won't be queued for expansion.  Whether the DIEs of a CU
    are loaded in memory and whether that CU is expanded are two orthogonal
    things, but that function appears to mix them.  So, move the queuing
    above that check / early return, so that if the CU's DIEs are loaded in
    memory but the CU is not expanded yet, it gets enqueued.
    
    I tried to improve maybe_queue_comp_unit's documentation to clarify what
    the return value means.  By clarifying this, I noticed that two callers
    (follow_die_offset and follow_die_sig_1) access the CU's DIEs after
    calling maybe_queue_comp_unit, only relying on maybe_queue_comp_unit's
    return value to tell whether DIEs need to be loaded first or not.  As
    explained in the new comment, this is problematic:
    maybe_queue_comp_unit's return value doesn't tell whether DIEs are
    currently loaded, it means whether maybe_queue_comp_unit requires the
    caller to load them.  If the CU is already expanded but the DIEs to have
    been freed, maybe_queue_comp_unit returns 0, meaning "I don't need you
    to load the DIEs".  So if these two functions (follow_die_offset and
    follow_die_sig_1) need to access the DIEs in any case, for their own
    usage, they should make sure to load them if they are not loaded
    already.  I therefore added an extra check to the condition they use,
    making it so they will always load the DIEs if they aren't already.
    
    From what I found, other callers don't care for the CU's DIEs, they call
    maybe_queue_comp_unit to ensure the CU gets expanded eventually, but
    don't care for it after that.
    
    gdb/ChangeLog:
    
            PR gdb/26828
            * dwarf2/read.c (maybe_queue_comp_unit): Check if CU is expanded
            to decide whether or not to enqueue it for expansion.
            (follow_die_offset, follow_die_sig_1): Ensure we load the DIEs
            after calling maybe_queue_comp_unit.
    
    Change-Id: Id98c6b60669f4b4b21b9be16d0518fc62bdf686a
Comment 28 cvs-commit@gcc.gnu.org 2021-02-23 18:39:35 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=08ac57714cd20e528efe9b4e169f3a2778cf6e30

commit 08ac57714cd20e528efe9b4e169f3a2778cf6e30
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Tue Feb 23 13:37:44 2021 -0500

    gdb/dwarf: create and destroy dwarf2_per_bfd's CUs-to-expand queue
    
    As described in the log of patch "gdb/dwarf: add assertion in
    maybe_queue_comp_unit", it would happen that a call to
    maybe_queue_comp_unit would enqueue a CU in the to-expand queue while
    nothing up the stack was processing the queue.  This is not desirable,
    as items are then left lingering in the queue when we exit the
    dwarf2/read code.  This is an inconsistent state.
    
    The normal case of using the queue is when we go through
    dw2_do_instantiate_symtab and process_queue.  As depended-on CUs are
    found, they get added to the queue.  process_queue expands CUs until the
    queue is empty.
    
    To catch these cases where things are enqueued while nothing up the
    stack is processing the queue, change dwarf2_per_bfd::queue to be an
    optional.  The optional is instantiated in dwarf2_queue_guard, just
    before where we call process_queue.  In the dwarf2_queue_guard
    destructor, the optional gets reset.  Therefore, the queue object is
    instantiated only when something up the stack is handling it.  If
    another entry point tries to enqueue a CU for expansion, an assertion
    will fail and we know we have something to fix.
    
    dwarf2_queue_guard sounds like the good place for this, as it's
    currently responsible for making sure the queue gets cleared if we exit
    due to an error.
    
    This also allows asserting that when age_comp_units or remove_all_cus
    run, the queue is not instantiated, and gives us one more level of
    assurance that we won't free the DIEs of a CU that is in the
    CUs-to-expand queue.
    
    gdb/ChangeLog:
    
            PR gdb/26828
            * dwarf2/read.c (dwarf2_queue_guard) <dwarf2_queue_guard>:
            Instantiate queue.
            (~dwarf2_queue_guard): Clear queue.
            (queue_comp_unit): Assert that queue is
            instantiated.
            (process_queue): Adjust.
            * dwarf2/read.h (struct dwarf2_per_bfd) <queue>: Make optional.
    
    Change-Id: I8fe3d77845bb4ad3d309eac906acebe79d9f0a9d
Comment 29 cvs-commit@gcc.gnu.org 2021-02-23 23:32:07 UTC
The gdb-10-branch branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=1c8a25b683c7ea9621b7fccf6c07630bd67f071f

commit 1c8a25b683c7ea9621b7fccf6c07630bd67f071f
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Tue Feb 23 12:07:10 2021 -0500

    gdb/dwarf: don't enqueue CU in maybe_queue_comp_unit if already expanded
    
    The previous commit log described how items could be left lingering in
    the dwarf2_per_bfd::queue and how that could cause trouble.
    
    This patch fixes the issue by changing maybe_queue_comp_unit so that it
    doesn't put a CU in the to-expand queue if that CU is already expanded.
    This will make it so that when dwarf2_fetch_die_type_sect_off calls
    follow_die_offset and maybe_queue_comp_unit, it won't enqueue the target
    CU, because it will see the CU is already expanded.
    
    This assumes that if a CU is dwarf2_fetch_die_type_sect_off's target CU,
    it will have previously been expanded.  I think it is the case, but I
    can't be 100% sure.  If that's not true, the assertions added in the
    following patch will catch it, and it means we'll have to re-think a bit
    more how things work (it wouldn't be well handled at all today anyway).
    
    This fixes something else in maybe_queue_comp_unit that looks wrong.
    Imagine the DIEs of a CU are loaded in memory, but that CU is not
    expanded.  In that case, maybe_queue_comp_unit will use this early
    return:
    
      /* If the compilation unit is already loaded, just mark it as
         used.  */
      dwarf2_cu *cu = per_objfile->get_cu (per_cu);
      if (cu != nullptr)
        {
          cu->last_used = 0;
          return 0;
        }
    
    ... so the CU won't be queued for expansion.  Whether the DIEs of a CU
    are loaded in memory and whether that CU is expanded are two orthogonal
    things, but that function appears to mix them.  So, move the queuing
    above that check / early return, so that if the CU's DIEs are loaded in
    memory but the CU is not expanded yet, it gets enqueued.
    
    I tried to improve maybe_queue_comp_unit's documentation to clarify what
    the return value means.  By clarifying this, I noticed that two callers
    (follow_die_offset and follow_die_sig_1) access the CU's DIEs after
    calling maybe_queue_comp_unit, only relying on maybe_queue_comp_unit's
    return value to tell whether DIEs need to be loaded first or not.  As
    explained in the new comment, this is problematic:
    maybe_queue_comp_unit's return value doesn't tell whether DIEs are
    currently loaded, it means whether maybe_queue_comp_unit requires the
    caller to load them.  If the CU is already expanded but the DIEs to have
    been freed, maybe_queue_comp_unit returns 0, meaning "I don't need you
    to load the DIEs".  So if these two functions (follow_die_offset and
    follow_die_sig_1) need to access the DIEs in any case, for their own
    usage, they should make sure to load them if they are not loaded
    already.  I therefore added an extra check to the condition they use,
    making it so they will always load the DIEs if they aren't already.
    
    From what I found, other callers don't care for the CU's DIEs, they call
    maybe_queue_comp_unit to ensure the CU gets expanded eventually, but
    don't care for it after that.
    
    gdb/ChangeLog:
    
            PR gdb/26828
            * dwarf2/read.c (maybe_queue_comp_unit): Check if CU is expanded
            to decide whether or not to enqueue it for expansion.
            (follow_die_offset, follow_die_sig_1): Ensure we load the DIEs
            after calling maybe_queue_comp_unit.
    
    Change-Id: Id98c6b60669f4b4b21b9be16d0518fc62bdf686a
Comment 30 cvs-commit@gcc.gnu.org 2021-02-23 23:32:12 UTC
The gdb-10-branch branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ad4eb5acbab1b420b346236f2c8c272230477ab7

commit ad4eb5acbab1b420b346236f2c8c272230477ab7
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Tue Feb 23 13:37:44 2021 -0500

    gdb/dwarf: create and destroy dwarf2_per_bfd's CUs-to-expand queue
    
    As described in the log of patch "gdb/dwarf: add assertion in
    maybe_queue_comp_unit", it would happen that a call to
    maybe_queue_comp_unit would enqueue a CU in the to-expand queue while
    nothing up the stack was processing the queue.  This is not desirable,
    as items are then left lingering in the queue when we exit the
    dwarf2/read code.  This is an inconsistent state.
    
    The normal case of using the queue is when we go through
    dw2_do_instantiate_symtab and process_queue.  As depended-on CUs are
    found, they get added to the queue.  process_queue expands CUs until the
    queue is empty.
    
    To catch these cases where things are enqueued while nothing up the
    stack is processing the queue, change dwarf2_per_bfd::queue to be an
    optional.  The optional is instantiated in dwarf2_queue_guard, just
    before where we call process_queue.  In the dwarf2_queue_guard
    destructor, the optional gets reset.  Therefore, the queue object is
    instantiated only when something up the stack is handling it.  If
    another entry point tries to enqueue a CU for expansion, an assertion
    will fail and we know we have something to fix.
    
    dwarf2_queue_guard sounds like the good place for this, as it's
    currently responsible for making sure the queue gets cleared if we exit
    due to an error.
    
    This also allows asserting that when age_comp_units or remove_all_cus
    run, the queue is not instantiated, and gives us one more level of
    assurance that we won't free the DIEs of a CU that is in the
    CUs-to-expand queue.
    
    gdb/ChangeLog:
    
            PR gdb/26828
            * dwarf2/read.c (dwarf2_queue_guard) <dwarf2_queue_guard>:
            Instantiate queue.
            (~dwarf2_queue_guard): Clear queue.
            (queue_comp_unit): Assert that queue is
            instantiated.
            (process_queue): Adjust.
            * dwarf2/read.h (struct dwarf2_per_bfd) <queue>: Make optional.
    
    Change-Id: I8fe3d77845bb4ad3d309eac906acebe79d9f0a9d
Comment 31 Simon Marchi 2021-02-23 23:32:58 UTC
This is hopefully fixed now.