Bug 16577

Summary: GDB crash on attempted read from deleted shared library
Product: gdb Reporter: Nathaniel.McIntosh
Component: backtraceAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: carl.lagerstedt, etesta, jeremip11, mgulick, scott, simark, ubhuqhmudpy
Priority: P2    
Version: 7.7   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: tar file containing reproducer
Updated reproducer to reproduce crash on current git tip

Description Nathaniel.McIntosh 2014-02-13 16:10:45 UTC
Created attachment 7413 [details]
tar file containing reproducer

Developers in my team are seeing a crash in GDB 7.7. The application being debugged incorporates 3rd party code (Java "JNA") that creates shared libraries on the fly, calls into them, then deletes them. This behavior is causing GDB 7.7 to crash (used to work ok with GDB 7.5.1). Representative error message:

BFD: reopening /tmp/jna/jna3319969727431950835.tmp: No such file or directory
BFD: reopening /tmp/jna/jna3319969727431950835.tmp: No such file or directory
Can't read data for section '.eh_frame' in file '/tmp/jna/jna3319969727431950835.tmp'
Debugger segmentation fault

I built a debuggable GDB 7.7 and debugged it while it was debugging our application. Here is a stack trace from the crash:

where
#0  bfd_getl32 (p=0x0) at libbfd.c:623
#1  0x0000000000616a56 in read_initial_length (bytes_read_ptr=0x7fff92d4ec4c, buf=0x0, abfd=<optimized out>) at dwarf2-frame.c:1554
#2  decode_frame_entry_1 (entry_type=<optimized out>, fde_table=0x7fff92d4ecd0, cie_table=0x7fff92d4ecc0, eh_frame_p=1, start=0x0, unit=0x9a60640) at dwarf2-frame.c:1861
#3  decode_frame_entry (unit=unit@entry=0x9a60640, start=0x0, eh_frame_p=eh_frame_p@entry=1, cie_table=cie_table@entry=0x7fff92d4ecc0, fde_table=fde_table@entry=0x7fff92d4ecd0, entry_type=entry_type@entry=EH_CIE_OR_FDE_TYPE_ID) at dwarf2-frame.c:2159
#4  0x00000000006186f9 in dwarf2_build_frame_info (objfile=objfile@entry=0x9923d00) at dwarf2-frame.c:2316
#5  0x00000000006188b8 in dwarf2_frame_find_fde (pc=pc@entry=0x7fff92d4ed68, out_offset=out_offset@entry=0x0) at dwarf2-frame.c:1777
#6  0x0000000000619492 in dwarf2_frame_sniffer (self=0x7eee00, this_frame=0x1f515960, this_cache=<optimized out>) at dwarf2-frame.c:1414
#7  0x000000000066b5c9 in frame_unwind_find_by_frame (this_frame=this_frame@entry=0x1f515960, this_cache=this_cache@entry=0x1f515978) at frame-unwind.c:112
#8  0x00000000006680cb in compute_frame_id (fi=0x1f515960) at frame.c:448
#9  get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0x1f5158a0) at frame.c:1737
#10 0x0000000000669c18 in get_prev_frame_1 (this_frame=this_frame@entry=0x1f5158a0) at frame.c:1910
#11 0x000000000066a030 in get_prev_frame (this_frame=this_frame@entry=0x1f5158a0) at frame.c:2124
#12 0x000000000066a24c in unwind_to_current_frame (ui_out=<optimized out>, args=args@entry=0x1f5158a0) at frame.c:1425
#13 0x000000000059c715 in catch_exceptions_with_msg (func_uiout=<optimized out>, func=func@entry=0x66a240 <unwind_to_current_frame>, func_args=func_args@entry=0x1f5158a0, gdberrmsg=gdberrmsg@entry=0x0, mask=mask@entry=RETURN_MASK_ERROR) at exceptions.c:476
#14 0x000000000059c83a in catch_exceptions (uiout=<optimized out>, func=func@entry=0x66a240 <unwind_to_current_frame>, func_args=func_args@entry=0x1f5158a0, mask=mask@entry=RETURN_MASK_ERROR) at exceptions.c:456
#15 0x00000000006683e0 in get_current_frame () at frame.c:1464
#16 0x0000000000589353 in normal_stop () at infrun.c:6087
#17 0x0000000000590971 in proceed (addr=addr@entry=18446744073709551615, siggnal=siggnal@entry=GDB_SIGNAL_DEFAULT, step=step@entry=0) at infrun.c:2332
#18 0x0000000000584ed2 in continue_1 (all_threads=all_threads@entry=0) at infcmd.c:729
#19 0x000000000058501a in continue_command (args=0x0, from_tty=1) at infcmd.c:821
#20 0x000000000065f464 in execute_command (p=<optimized out>, p@entry=0x123d240 "c", from_tty=1) at top.c:468
#21 0x00000000005a5521 in command_handler (command=0x123d240 "c") at event-top.c:435
#22 0x00000000005a5ecc in command_line_handler (rl=<optimized out>) at event-top.c:632
#23 0x00000000006a5df9 in rl_callback_read_char () at callback.c:220
#24 0x00000000005a5589 in rl_callback_read_char_wrapper (client_data=<optimized out>) at event-top.c:164
#25 0x00000000005a4794 in process_event () at event-loop.c:342
#26 process_event () at event-loop.c:314
#27 0x00000000005a4b1f in gdb_do_one_event () at event-loop.c:394
#28 0x00000000005a4cd5 in start_event_loop () at event-loop.c:431
#29 0x000000000059e373 in captured_command_loop (data=data@entry=0x0) at main.c:267
#30 0x000000000059c8ae in catch_errors (func=func@entry=0x59e360 <captured_command_loop>, func_args=func_args@entry=0x0, errstring=errstring@entry=0x76759b "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:524
#31 0x000000000059ebbe in captured_main (data=data@entry=0x7fff92d4f400) at main.c:1067
#32 0x000000000059c8ae in catch_errors (func=func@entry=0x59e550 <captured_main>, func_args=func_args@entry=0x7fff92d4f400, errstring=errstring@entry=0x76759b "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:524
#33 0x000000000059f504 in gdb_main (args=args@entry=0x7fff92d4f400) at main.c:1076
#34 0x000000000045964e in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:34
(gdb)

Apparently GDB is trying to read the .eh_frame section from the missing /tmp/jna shared library, and this is then triggering the crash.

I was able to write small test case that reproduces the problem (or some variant of the problem-- since I don't have access to the 3rd party code that is at the root of the issue, I am guessing about the failure mode).  I am attaching a tar file that contains the srcs and instructions on how to reproduce. The README from the tar file follows below.

I tried the same test case with gdb 7.5.1. and it was able to handle it (no crash). 7.6.2 has the problem, as does top-of trunk (gdb-7.7.50.20140213).

README.txt:

To reproduce the problem:

1. compile with "sh -x ./badcomp.sh"

      % sh -x ./badcomp.sh
      + set -x
      + gcc -fPIC -g -c badlib.c
      + gcc -g -o badlib.so -shared badlib.o
      + gcc -g -c badmain.c
      + gcc -g -o badmain badmain.o -ldl

2. Run gdb on badmain; after you see "iter 6" in the output, interrupt
   using control-C. Then walk back up the stack. Sample output:

      % /local-ssd/gdb77/gdb-7.7/gdb/gdb -n badmain
      ...   
      GNU gdb (GDB) 7.7

      Reading symbols from badmain...done.
      (gdb) run
      Starting program: /tmp/gdbstuff/badmain 
      iter 1
      iter 2
      iter 3
      iter 4
      iter 5
      iter 6
      ^C
      Program received signal SIGINT, Interrupt.
      0x00007ffff78fabc0 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
      (gdb) up 10
      BFD: reopening /tmp/gdbstuff/badlib.so: No such file or directory

      BFD: reopening /tmp/gdbstuff/badlib.so: No such file or directory

      BFD: reopening /tmp/gdbstuff/badlib.so: No such file or directory

      BFD: reopening /tmp/gdbstuff/badlib.so: No such file or directory

      Can't read data for section '.eh_frame' in file '/tmp/gdbstuff/badlib.so'
      (gdb) up
      #1  0x00007ffff78faa50 in sleep () from /lib/x86_64-linux-gnu/libc.so.6
      (gdb) up
      Segmentation fault

3. Repeating the same sequence with GDB 7.5.1 seems to handle it ok (make sure
   to restore/rebuild badlib.so, since running badmain deletes it). Ex:

      Program received signal SIGINT, Interrupt.
      0x00007ffff78fabc0 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
      (gdb) up
      BFD: reopening /tmp/gdbstuff/badlib.so: No such file or directory

      BFD: reopening /tmp/gdbstuff/badlib.so: No such file or directory

      #1  0x00007ffff78faa50 in sleep () from /lib/x86_64-linux-gnu/libc.so.6
      (gdb) 
      BFD: reopening /tmp/gdbstuff/badlib.so: No such file or directory

      BFD: reopening /tmp/gdbstuff/badlib.so: No such file or directory

      Dwarf Error: Can't read DWARF data from '/tmp/gdbstuff/badlib.so'
      (gdb) 
      #3  0x00000000004006b4 in callintolib (x=7) at badmain.c:17
      17	    return fp(x);
      (gdb) 
      #4  0x00000000004006ec in main () at badmain.c:26
      26	        jj = callintolib(jj);
      (gdb)
Comment 1 Christian Stadelmann 2016-12-21 20:09:44 UTC
Same crash (similar backtrace) here when debugging a Java application which does the same. Same error message of GDB before it dies:

Can't read data for section '.eh_frame' in file '/tmp/jna/jna5161650616368516625.tmp'

running gdb-7.12-29.fc25.x86_64 on Fedora 25.
Comment 2 Scott French 2017-07-30 22:37:01 UTC
This is still happening as of gdb 8.0.
Comment 3 Mike Gulick 2017-10-18 20:00:43 UTC
I bisected this crash and git points to commit https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=4bf44c1cf1abad13fcda09e20983757f175c6dca

Debugging gdb built from this revision gives the following stack trace:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000739dd8 in bfd_getl32 (p=0x0) at libbfd.c:622
622	  v = (unsigned long) addr[0];
(gdb) bt
#0  0x0000000000739dd8 in bfd_getl32 (p=0x0) at libbfd.c:622
#1  0x00000000006694ac in read_initial_length (abfd=0x11fe4110, buf=0x0, 
    bytes_read_ptr=0x7ffc4e4dcd84) at dwarf2-frame.c:1526
#2  0x0000000000669eb6 in decode_frame_entry_1 (unit=0x145a6410, start=0x0, 
    eh_frame_p=1, cie_table=0x7ffc4e4dcef0, fde_table=0x7ffc4e4dcee0, 
    entry_type=EH_CIE_OR_FDE_TYPE_ID) at dwarf2-frame.c:1837
#3  0x000000000066aa43 in decode_frame_entry (unit=0x145a6410, start=0x0, 
    eh_frame_p=1, cie_table=0x7ffc4e4dcef0, fde_table=0x7ffc4e4dcee0, 
    entry_type=EH_CIE_OR_FDE_TYPE_ID) at dwarf2-frame.c:2135
#4  0x000000000066af60 in dwarf2_build_frame_info (objfile=0x1484e2a0)
    at dwarf2-frame.c:2292
#5  0x0000000000669b86 in dwarf2_frame_find_fde (pc=0x7ffc4e4dd070, 
    out_offset=0x0) at dwarf2-frame.c:1749
#6  0x0000000000669245 in dwarf2_frame_sniffer (
    self=0x8a4da0 <dwarf2_frame_unwind>, this_frame=0x116fd00, 
    this_cache=0x116fd18) at dwarf2-frame.c:1382
#7  0x00000000006d735f in frame_unwind_find_by_frame (this_frame=0x116fd00, 
    this_cache=0x116fd18) at frame-unwind.c:112
#8  0x00000000006d2a1a in get_frame_id (fi=0x116fd00) at frame.c:334
#9  0x00000000005b6399 in step_1 (skip_subroutines=1, single_inst=0, 
    count_string=0x0) at infcmd.c:917
#10 0x00000000005b61bf in next_command (count_string=0x0, from_tty=1)
    at infcmd.c:855

I believe there are two separate issues to (possibly) fix here:

1. The gdb segfault when stepping the debugger.
2. The errors/warnings about not being able to open the jna file.

I have a preliminary patch to fix 1) that I will post to gdb-patches for feedback.  It is debatable whether 2) should be fixed, as that likely involves a special-case to detect and bypass jna files that could yield false-positives.
Comment 4 Mike Gulick 2017-10-19 19:14:22 UTC
Created attachment 10544 [details]
Updated reproducer to reproduce crash on current git tip

See README.TXT for instructions.

Updated reproducer no longer relies on being in /tmp, and reproduces the problem instantly (no more sleep()).
Comment 5 Sourceware Commits 2018-01-17 17:59:06 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=416675305692976aca45860e24b963982a2e682a

commit 416675305692976aca45860e24b963982a2e682a
Author: Mike Gulick <mike.gulick@mathworks.com>
Date:   Mon Oct 30 18:13:44 2017 -0400

    Fix gdb segv when objfile can't be opened
    
    This fixes PR 16577.
    
    This patch changes gdb_bfd_map_section to issue a warning rather than an error
    if it is unable to read the object file, and sets the size of the section/frame
    that it attempted to read to 0 on error.
    
    The description of gdb_bfd_map_section states that it will try to read or map
    the contents of the section SECT, and if successful, the section data is
    returned and *SIZE is set to the size of the section data.  This function was
    throwing an error and leaving *SIZE as-is.  Setting the section size to 0
    indicates to dwarf2_build_frame_info that there is no data to read, otherwise
    it will try to read from an invalid frame pointer.
    
    Changing the error to a warning allows this to be handled gracefully.
    Additionally, the error was clobbering the breakpoint output indicating the
    current frame (function name, arguments, source file, and line number).  E.g.
    
    Thread 3 "foo" hit Breakpoint 1, BFD: reopening /tmp/jna-1013829440/jna2973250704389291330.tmp: No such file or directory
    BFD: reopening /tmp/jna-1013829440/jna2973250704389291330.tmp: No such file or directory
    (gdb)
    
    While the "BFD: reopening ..." messages will still appear interspersed in the
    breakpoint output, the current frame info is now displayed:
    
    Thread 3 "foo" hit Breakpoint 1, BFD: reopening /tmp/jna-1013829440/jna1875755897659885075.tmp: No such file or directory
    BFD: reopening /tmp/jna-1013829440/jna1875755897659885075.tmp: No such file or directory
    warning: Can't read data for section '.eh_frame' in file '/tmp/jna-1013829440/jna1875755897659885075.tmp'
    do_something () at file.cpp:80
    80	{
    (gdb)
Comment 6 Simon Marchi 2018-01-17 18:01:48 UTC
Fixed by patch mentioned above.
Comment 7 Simon Marchi 2018-03-24 22:15:39 UTC
*** Bug 22995 has been marked as a duplicate of this bug. ***