Bug 19503 - internal-error: linux_nat_resume: Assertion `lp != NULL' failed.
Summary: internal-error: linux_nat_resume: Assertion `lp != NULL' failed.
Status: RESOLVED DUPLICATE of bug 19461
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 7.10
: P2 normal
Target Milestone: 7.11
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-20 20:14 UTC by smark
Modified: 2016-02-01 12:59 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
gdb output with set debug infrun 1 set debug lin-lwp 1 (6.92 KB, text/plain)
2016-01-22 15:16 UTC, smark
Details
gdb trace from eclipse cdt (36.39 KB, text/plain)
2016-01-22 16:12 UTC, smark
Details

Note You need to log in before you can comment on or make changes to this bug.
Description smark 2016-01-20 20:14:31 UTC
I'm using the latest eclipse mars cdt on the latest xubuntu 15.10 and while running through the debugger, during a call to popen()

I get this:

/build/gdb-HnfxP_/gdb-7.10/gdb/linux-nat.c:1773: internal-error: linux_nat_resume: Assertion `lp != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) [answered Y; input not from terminal]
/build/gdb-HnfxP_/gdb-7.10/gdb/linux-nat.c:1773: internal-error: linux_nat_resume: Assertion `lp != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) [answered Y; input not from terminal]

This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
Comment 1 Pedro Alves 2016-01-21 22:58:08 UTC
Hi.

This is:

static void
linux_nat_resume (struct target_ops *ops,
		  ptid_t ptid, int step, enum gdb_signal signo)
{
...
  if (resume_many)
    lp = find_lwp_pid (inferior_ptid);
  else
    lp = find_lwp_pid (ptid);
  gdb_assert (lp != NULL);

GDB core told the ptrace layer to resume a thread that the ptrace layer thinks doesn't exist.  This assertion is still present in master.  I've never seen it trigger before.  There's no workaround.

It's unfortunately impossible to debug this sort of problem from the internal-error alone.  We'd need to see "set debug infrun 1 + set debug lin-lwp 1" logs, and/or have a small reproducer we could try ourselves.
Comment 2 smark 2016-01-22 15:16:37 UTC
Created attachment 8917 [details]
gdb output with  set debug infrun 1 set debug lin-lwp 1

I can reliably cause the problem so I put this in .gdb init and ran my program.

set debug infrun 1
set debug lin-lwp 1

Attached is the output
Comment 3 Pedro Alves 2016-01-22 15:19:53 UTC
I don't see an internal error in that log.
Comment 4 smark 2016-01-22 15:20:21 UTC
hold on, I see now it ran and didn't blow up.
lemme see if I can make it blow up.
Comment 5 Pedro Alves 2016-01-22 15:24:32 UTC
I do see several forks:

> infrun: target_wait (-1.0.0, status) =
> infrun:   15913.15977.0 [Thread 0x7fffeaffd700 (LWP 15977)],
> infrun:   status->kind = forked
> infrun: TARGET_WAITKIND_FORKED

along with:

> Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.21.so...done.
> Error in re-setting breakpoint 1: Cannot access memory at address 0x47e956
> Error in re-setting breakpoint 2: Cannot access memory at address 0x47e956
> Error in re-setting breakpoint 3: Cannot access memory at address 0x47e956
> Error in re-setting breakpoint 4: Cannot access memory at address 0x433470

So I wonder whether this series would fix your bug too:

  https://sourceware.org/ml/gdb-patches/2016-01/msg00443.html

That is pushed on the users/palves/fork-bugs branch on the upstream git repo.
Could you give that a try and see if it fixes it for you?
Comment 6 smark 2016-01-22 15:25:51 UTC
yeah so now it's working fine and the breakpoints (which was my original problem) are working correctly too. Not sure what to tell you. I'll report back if it happens again.
Comment 7 smark 2016-01-22 15:27:05 UTC
okay, I'll give that a go, thanks.
Comment 8 Pedro Alves 2016-01-22 15:29:11 UTC
The bug described in that URL and which causes the breakpoint errors can lead to all sort of odd things on the ptrace backend (linux-nat.c, where your assertion triggered).  So really wouldn't be that surprised if that series fixes it for you.
Comment 9 smark 2016-01-22 16:02:26 UTC
So I checked out that fork-bugs branch and rebuilt and ran, and I still get those "cannot access memory at address..." errors.


Although now there are line breaks whereas the were on the same ine before. So that's something.


infrun: target_wait (-1.0.0, status) =
infrun:   16975.16975.0 [process 16975],
infrun:   status->kind = execd
process 16975 is executing new program: /bin/dash
infrun: TARGET_WAITKIND_EXECD
infrun: Switching context from Thread 0x7ffff7fc77c0 (LWP 16925) to process 16975
Error in re-setting breakpoint 1: Warning:
Cannot insert breakpoint 6.
Cannot access memory at address 0x436825
Cannot insert breakpoint 2.
Cannot access memory at address 0x436845

Error in re-setting breakpoint 2: Warning:
Cannot insert breakpoint 6.
Cannot access memory at address 0x436825

Error in re-setting breakpoint 3: Warning:
Cannot insert breakpoint 6.
Cannot access memory at address 0x436825

Error in re-setting breakpoint 4: Warning:
Cannot insert breakpoint 6.
Cannot access memory at address 0x436825

Error in re-setting breakpoint 5: Warning:
Cannot insert breakpoint 6.
Cannot access memory at address 0x436825
Comment 10 Pedro Alves 2016-01-22 16:08:23 UTC
Huh.  Do you see them always after TARGET_WAITKIND_EXECD?  Can you share more details on how you ran gdb (non-stop vs all-stop, commands used, etc.)

I think the warning may be legit if you installed the breakpoint by address, like "b *0x436825", and that address isn't actually mapped after the exec.  But given the low address, it sounds like the kind of address a function in the main executable would have?
Comment 11 smark 2016-01-22 16:12:10 UTC
Created attachment 8918 [details]
gdb trace from eclipse cdt
Comment 12 smark 2016-01-22 16:12:32 UTC
Well, to be fair, the program is running, not blowing up, and the breakpoints seem to be working. And I haven't seen the assertion problem again...

I'm running this from cdt, and there's a whole lotta stuff that it does, so I attached the trace.
Comment 13 Pedro Alves 2016-01-22 16:18:49 UTC
Thanks:

 013,782 4-gdb-set breakpoint pending on
 013,782 5-gdb-set detach-on-fork off
 013,784 =cmd-param-changed,param="follow-fork-mode",value="child"
 013,784 15-gdb-set target-async on
 013,784 16-gdb-set pagination off
 013,785 17-gdb-set non-stop on

So non-stop=on, detach-on-fork=off, and follow-fork=child.
Comment 14 Pedro Alves 2016-01-22 16:33:48 UTC
We first see the program fork:

016,901 &"infrun: target_wait (-1.0.0, status) =\ninfrun:   25213.25259.0 [Thread 0x7fffeaffd700 (LW\
P 25259)],\ninfrun:   status->kind = forked\n"
016,901 &"infrun: TARGET_WAITKIND_FORKED\n"
016,901 &"infrun: Switching context from Thread 0x7ffff7fc77c0 (LWP 25213) to Thread 0x7fffeaffd700 \
(LWP 25259)\n"
016,901 &"Attaching after Thread 0x7fffeaffd700 (LWP 25259) fork to child process 25260.\n"
016,901 =thread-group-added,id="i2"
016,901 =thread-group-started,id="i2",pid="25260"
016,901 =thread-created,id="7",group-id="i2"
016,901 ~"[New process 25260]\n"

... which results in breakpoint 5 gaining a new location for the new inferior (i2) :

017,063 =breakpoint-modified,bkpt={number="5",type="breakpoint",disp="keep",enabled="y",addr="<MULTI\
PLE>",times="0",original-location="/home/smark/git/dlad/Src/AsyncWriterThread.cpp:95"},{number="5.1"\
,enabled="y",addr="0x0000000000436845",func="dla::AsyncWriterThread::run()",file="../Src/AsyncWriter\
Thread.cpp",fullname="/home/smark/git/dlad/Src/AsyncWriterThread.cpp",line="95",thread-groups=["i1"]\
},{number="5.2",enabled="y",addr="0x0000000000436845",func="dla::AsyncWriterThread::run()",file="../\
Src/AsyncWriterThread.cpp",fullname="/home/smark/git/dlad/Src/AsyncWriterThread.cpp",line="95",threa\
d-groups=["i2"]}

... then on comes the exec:

017,067 &"infrun: target_wait (-1.0.0, status) =\ninfrun:   25260.25260.0 [process 25260],\ninfrun: \
  status->kind = execd\n"
017,067 &"infrun: TARGET_WAITKIND_EXECD\n"
017,067 &"infrun: Switching context from Thread 0x7ffff7fc77c0 (LWP 25213) to process 25260\n"
017,067 ~"process 25260 is executing new program: /bin/dash\n"

So the process execs dash.

... this is followed by a bunch of unload events:

017,071 =library-unloaded,id="/usr/lib/x86_64-linux-gnu/libhx509.so.5",target-name="/usr/lib/x86_64-\
linux-gnu/libhx509.so.5",host-name="/usr/lib/x86_64-linux-gnu/libhx509.so.5",thread-group="i2"
017,071 =library-unloaded,id="/lib/x86_64-linux-gnu/libcrypt.so.1",target-name="/lib/x86_64-linux-gn\
u/libcrypt.so.1",host-name="/lib/x86_64-linux-gnu/libcrypt.so.1",thread-group="i2"

... getting rid of symbols etc. for the DSOs that were loaded in the process before the exec, and then we see the warnings:

 017,073 &"Error in re-setting breakpoint 1: Warning:\n"
 017,073 &"Cannot insert breakpoint 5.\n"
 017,073 &"Cannot access memory at address 0x436845\n"
 017,073 &"\n"

Unfortunately, gdb's log output isn't very clear here, but it looks like gdb is wrongly trying to insert a breakpoint that was originally set on inferior 2, before the exec, on inferior 2's new fresh set of memory pages, after the exec, even though the breakpoint's symbol and address doesn't make sense for the post-exec program (dash).  On my system's dash, (x86_64 F20), that address is not mapped in:

$ gdb /bin/dash
...
(gdb) start
...
(gdb) info proc mappings
          Start Addr           End Addr       Size     Offset objfile
            0x400000           0x41a000    0x1a000        0x0 /usr/bin/dash
            0x619000           0x61a000     0x1000    0x19000 /usr/bin/dash
            0x61a000           0x61b000     0x1000    0x1a000 /usr/bin/dash
            0x61b000           0x61e000     0x3000        0x0 [heap]

Likely on yours it's similar.  That would explain the warning -- you can't write to unmapped memory addresses... 

Looks wrong to even try to install the breakpoint, though.

And we may be trying to remove other breakpoints and not seeing any warning just because the bp address is mapped.  That'd be very bad, as it means we'd be poking the wrong instruction at the wrong address...
Comment 15 Pedro Alves 2016-01-22 16:37:51 UTC
Tentatively setting milestone to 7.11 and I want to try to reproduce and check whether this is a regression.
Comment 16 Pedro Alves 2016-01-22 16:38:07 UTC
s/and/as/...
Comment 17 smark 2016-01-22 17:04:16 UTC
I agree 100%! But since I have no idea about the internals of gdb, I'm going to trust that everything you said was true and you have my full backing. Good luck soldier. We're all in this together!
Comment 18 Pedro Alves 2016-02-01 12:39:18 UTC
I managed to reproduce this.  We need to have more than one inferior, and a breakpoint that has a location spec that matches in both inferiors.  If, say, inferior 2 execs, we'll re-set all the locations of its pspace.

The trouble is that after resetting each individual breakpoint, we'll try to insert _all_ breakpoints immediately, before we had a chance of re-setting them all, and thus end up trying to insert a breakpoint location that will be zapped when that location owner's turn to re-set comes around...

The error case comes from here:

(top-gdb) bt
#0  insert_breakpoint_locations () at /home/pedro/gdb/mygit/src/gdb/breakpoint.c:3156
#1  0x00000000005bedcd in update_global_location_list (insert_mode=UGLL_MAY_INSERT) at /home/pedro/gdb/mygit/src/gdb/breakpoint.c:12732
#2  0x00000000005c1948 in update_breakpoint_locations (b=0x31a6710, filter_pspace=0x335d300, sals=..., sals_end=...) at /home/pedro/gdb/mygit/src/gdb/breakpoint.c:14381
#3  0x00000000005c1e80 in breakpoint_re_set_default (b=0x31a6710) at /home/pedro/gdb/mygit/src/gdb/breakpoint.c:14506
#4  0x00000000005bf5ae in bkpt_re_set (b=0x31a6710) at /home/pedro/gdb/mygit/src/gdb/breakpoint.c:13084
#5  0x00000000005c209d in breakpoint_re_set_one (bint=0x31a6710) at /home/pedro/gdb/mygit/src/gdb/breakpoint.c:14601
#6  0x000000000064f566 in catch_errors (func=0x5c2065 <breakpoint_re_set_one>, func_args=0x31a6710, errstring=0x38951c0 "Error in re-setting breakpoint 1: ", 
    mask=RETURN_MASK_ALL) at /home/pedro/gdb/mygit/src/gdb/exceptions.c:240
#7  0x00000000005c212f in breakpoint_re_set () at /home/pedro/gdb/mygit/src/gdb/breakpoint.c:14627
#8  0x000000000078d976 in solib_add (pattern=0x0, from_tty=0, target=0xdfe9e0 <current_target>, readsyms=1) at /home/pedro/gdb/mygit/src/gdb/solib.c:1033
#9  0x00000000004aacbb in enable_break (info=0x3869060, from_tty=0) at /home/pedro/gdb/mygit/src/gdb/solib-svr4.c:2455
#10 0x00000000004ac1b5 in svr4_solib_create_inferior_hook (from_tty=0) at /home/pedro/gdb/mygit/src/gdb/solib-svr4.c:3103
#11 0x000000000078df90 in solib_create_inferior_hook (from_tty=0) at /home/pedro/gdb/mygit/src/gdb/solib.c:1276
#12 0x0000000000631c8e in follow_exec (ptid=..., execd_pathname=0x7ffd5ec29a20 "/usr/bin/dash") at /home/pedro/gdb/mygit/src/gdb/infrun.c:1245
#13 0x0000000000638f38 in handle_inferior_event_1 (ecs=0x7ffd5ec29c90) at /home/pedro/gdb/mygit/src/gdb/infrun.c:5276
#14 0x000000000063924f in handle_inferior_event (ecs=0x7ffd5ec29c90) at /home/pedro/gdb/mygit/src/gdb/infrun.c:5361
Comment 19 Pedro Alves 2016-02-01 12:59:40 UTC
I believe the original assertion has been fixed with the fix for PR 19461.  I've opened Bug 19548 for the breakpoint re-set issues, and I'm closing this one as dup.

*** This bug has been marked as a duplicate of bug 19461 ***