Bug 31727

Summary: -exec-next fails in mingw (infrun.c:2794: internal-error: resume_1: Assertion `pc_in_thread_step_range (pc, tp)' failed)
Product: gdb Reporter: Dmitry Neverov <dmitry.neverov>
Component: gdbAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: brobecker, simon.marchi, ssbssa, tromey
Priority: P2    
Version: HEAD   
Target Milestone: 15.2   
Host: Target:
Build: Last reconfirmed: 2024-06-05 00:00:00
Attachments: gdb-13.2 stepping log (works fine)
gdb-14.2 stepping log (crashes)

Description Dmitry Neverov 2024-05-10 15:50:54 UTC
I'm debugging the same binary in gdb 13.2, 14.1, 14.2 and HEAD.
I add a breakpoint and when it is reached, issue the -exec-next command.
The command works fine in 13.2, but fails in 14.1, 14.2, and HEAD.

In 14.x it fails with the error from the summary:

>47-exec-next --thread 2
<^done
<(gdb)
<47^running
<*running,thread-id="all"
<(gdb)
<~"../../gdb-14.1/gdb/infrun.c:2794: internal-error: resume_1: Assertion `pc_in_thread_step_range (pc, tp)' failed.\nA problem internal to GDB has been detected,\nfurther debugging may prove unreliable."
<~"\nQuit this debugging session? "
<~"(y or n) [answered Y; input not from terminal]\n"
<&"\nThis is a bug, please report it."
<&"  For instructions, see:\n"
<&"<https://www.gnu.org/software/gdb/bugs/>.\n\n"
<~"../../gdb-14.1/gdb/infrun.c:2794: internal-error: resume_1: Assertion `pc_in_thread_step_range (pc, tp)' failed.\nA problem internal to GDB has been detected,\nfurther debugging may prove unreliable."
<~"\nCreate a core file of GDB? "
<~"(y or n) [answered Y; input not from terminal]\n"

If I run 'info line' before -exec-next, in 13.2 it outputs:

(gdb) info line
Line 106 of "../../Samples/Games/TP/Source/TP_5_3_2\TP_5_3_2Character.cpp" starts at address 0x17ad3dac <ATP_5_3_2Character::Move(FInputActionValue const&)+268> and ends at 0x17ad3dc8 <ATP_5_3_2Character::Move(FInputActionValue const&)+296>.

In gdb 14.x and HEAD:

(gdb) info line
warning: (Internal error: pc 0x800cffe in read in CU, but not in symtab.)
warning: (Error: pc 0x800cffe in address map, but not in symtab.)
Line 106 of "../../Samples/Games/TP/Source/TP_5_3_2\TP_5_3_2Character.cpp" starts at address 0x17ad3dac <ATP_5_3_2Character::Move(FInputActionValue const&)+268> and ends at 0x800cffe.

In HEAD -exec-next fails a little bit differently:

>47-exec-next --thread 2
<^done
<(gdb)
<47^running
<*running,thread-id="all"
<47^error,msg="Protocol error: QThreadEvents (thread-events) conflicting enabled responses."
<(gdb)
Comment 1 Dmitry Neverov 2024-05-10 17:41:03 UTC
Gdb runs on x86_64-w64-mingw32 and attaches to a remote aarch64 target.
Comment 2 Hannes Domani 2024-05-18 11:24:47 UTC
If you just start gdb with your executable, and do 'info line TP_5_3_2Character.cpp:106', do you get the same warnings?

And would it be possible to share that executable (or some other simple reproducer)?
Comment 3 Dmitry Neverov 2024-05-21 14:19:24 UTC
When I run 'info line' after loading the binary, there is no warning in 14.2 and master:

(gdb) info line TP_5_3_2Character.cpp:106
Line 106 of "../../Samples/Games/TP/Source/TP_5_3_2\TP_5_3_2Character.cpp"
   starts at address 0xfac6dac <_ZN18ATP_5_3_2Character4MoveERK17FInputActionValue+268>
   and ends at 0xfac6dc8 <_ZN18ATP_5_3_2Character4MoveERK17FInputActionValue+296>.
(gdb)

> And would it be possible to share that executable (or some other simple reproducer)?

The executable where I get an error is larger than 2gb, not sure I can share it. I was trying to reproduce it on smaller program with no success so far. Maybe I can run more commands on the binary and report the results?
Comment 4 Simon Marchi 2024-05-21 16:38:10 UTC
You could run with "set debug infrun on" and attach the logs.  Also, a backtrace at the point of the crash would be useful.


> If I run 'info line' before -exec-next, in 13.2 it outputs:
> 
> (gdb) info line
> Line 106 of "../../Samples/Games/TP/Source/TP_5_3_2\TP_5_3_2Character.cpp"
> starts at address 0x17ad3dac <ATP_5_3_2Character::Move(FInputActionValue
> const&)+268> and ends at 0x17ad3dc8
> <ATP_5_3_2Character::Move(FInputActionValue const&)+296>.
> 
> In gdb 14.x and HEAD:
> 
> (gdb) info line
> warning: (Internal error: pc 0x800cffe in read in CU, but not in symtab.)
> warning: (Error: pc 0x800cffe in address map, but not in symtab.)
> Line 106 of "../../Samples/Games/TP/Source/TP_5_3_2\TP_5_3_2Character.cpp"
> starts at address 0x17ad3dac <ATP_5_3_2Character::Move(FInputActionValue
> const&)+268> and ends at 0x800cffe.

It's hard to tell if it's related to the -exec-next or not.  You could perhaps file a separate bug for this one to get more visibility, and if we realize they are related we can link the two.

Since it looks like a clear regression, you could try to bisect to see which commit introduced the bug.  After that, it's easier to poke the author of the commit to see if they can have a look.

> 
> In HEAD -exec-next fails a little bit differently:
> 
> >47-exec-next --thread 2
> <^done
> <(gdb)
> <47^running
> <*running,thread-id="all"
> <47^error,msg="Protocol error: QThreadEvents (thread-events) conflicting
> enabled responses."
> <(gdb)

This sounds like another bug, related to:

https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=65c459abebf70bd5a64dcee11d4d7d4a8498465f

I think it would be worth filing a separate bug for this one, with some details about how you reproduce: what is the server (gdbserver or other), what version, logs with "set debug remote 1" enabled.
Comment 5 Dmitry Neverov 2024-05-22 13:42:12 UTC
Created attachment 15529 [details]
gdb-13.2 stepping log (works fine)
Comment 6 Dmitry Neverov 2024-05-22 13:42:36 UTC
Created attachment 15530 [details]
gdb-14.2 stepping log (crashes)
Comment 7 Dmitry Neverov 2024-05-22 13:43:40 UTC
> You could run with "set debug infrun on" and attach the logs.  Also, a backtrace > at the point of the crash would be useful.

Attached. The backtrace on crash in 14.2 is:

resume_1 infrun.c:2794
resume infrun.c:2810
keep_going_pass_signal infrun.c:8557
keep_going infrun.c:8576
process_event_stop_test infrun.c:7752
handle_signal_stop infrun.c:6886
handle_inferior_event infrun.c:6114
fetch_inferior_event infrun.c:4466
inferior_event_handler inf-loop.c:42
remote_async_serial_handler remote.c:14859
run_async_handler_and_reschedule ser-base.c:138
fd_event ser-base.c:189
handle_file_event event-loop.cc:573
gdb_wait_for_event event-loop.cc:716
gdb_do_one_event event-loop.cc:264
start_event_loop main.c:407
captured_command_loop main.c:471
captured_main main.c:1324
gdb_main main.c:1343
main gdb.c:39
Comment 8 Dmitry Neverov 2024-05-24 08:42:20 UTC
Judging by bisect, it was introduced by

commit 1acc9dca423f78e44553928f0de839b618c13766
Author: Tom Tromey <tom@tromey.com>
Date:   Tue Mar 7 17:37:45 2023 -0700

    Change linetables to be objfile-independent
Comment 9 Dmitry Neverov 2024-05-26 16:08:49 UTC
> I think it would be worth filing a separate bug for this one

Filed https://sourceware.org/bugzilla/show_bug.cgi?id=31801
Comment 10 Tom Tromey 2024-05-28 18:24:08 UTC
> warning: (Internal error: pc 0x800cffe in read in CU, but not in symtab.)
> warning: (Error: pc 0x800cffe in address map, but not in symtab.)

This is definitely a red flag fwiw.
Normally it means the DWARF indexer is out of sync
with the full reader somehow.
Comment 11 Tom Tromey 2024-05-28 18:46:50 UTC
I wonder if this could possibly be tripping over this:

https://inbox.sourceware.org/gdb-patches/20240217-dwarf-race-relocate-v1-7-d3d2d908c1e8@tromey.com/

Just a wild guess, since that's the only issue I've run
across in this area lately.

Otherwise I guess we'll need some way to reproduce & then
debug gdb.
Comment 12 Tom Tromey 2024-05-28 19:02:45 UTC
> Line 106 of "../../Samples/Games/TP/Source/TP_5_3_2\TP_5_3_2Character.cpp" starts at address 0x17ad3dac <ATP_5_3_2Character::Move(FInputActionValue const&)+268> and ends at 0x800cffe.

One thing I notice here is that the start address is
relocated but the end address is not.
That seems extremely peculiar.
I wonder if you could set a breakpoint in find_line_pc_range
and see what's going wrong here.
Looking at find_pc_sect_line I don't really see how
it could happen.
Comment 13 Dmitry Neverov 2024-06-05 08:51:40 UTC
I've added breakpoins in find_line_pc_range, but they are not triggered.

I've debugged find_pc_sect_line and noticed 2 things.

1. line table contains entries with address 0xFFFFFFFFFFFFFFFE (-2).

lnp_state_machine::check_line_address() checks address -1 and
mentions https://reviews.llvm.org/D81784 in
a8caed5d7faa639a1e6769eba551d15d8ddd9510.

It looks like lld used value -2 for pre-DWARF-v5:
https://github.com/llvm/llvm-project/commit/e618ccbf431f6730edb6d1467a127c3a52fd57f7#diff-7d58449b03500d25cfeb298e5b0591bba14e8fbcf5bfb899d20dfb8007f38854

It doesn't do that any more:
https://github.com/llvm/llvm-project/commit/004be4037e1e9c6092323c5c9268acb3ecf9176c

Maybe lnp_state_machine::check_line_address should check -2 as well?


2. The linetable_entry::operator<() is not called in
symtab.c:3215 (1acc9dca423f78e44553928f0de839b618c13766). It
looks like changing the line to

      if (best && *item < *last && item->raw_pc () > best->raw_pc ()
	  && (best_end == 0 || best_end > item->pc (objfile)))
	best_end = item->pc (objfile);

fixes the crash.
Comment 14 Dmitry Neverov 2024-06-05 09:35:30 UTC
I guess the item < last comparison doesn't use the linetable_entry::operator<() intentionally.

I think assert started to fail because before 1acc9dca423f78e44553928f0de839b618c13766 best_end was compared to what is now called a raw_pc. For entries with address -2, best_end > item->pc was false. 

Now the comparison is with item->pc (objfile), and for entries with address -2, item->pc (objfile) wraps to a value below best_end, and best_end is updated.

I wonder if it is expected that the best_end can come from an item with a line and a file different than the line and the file in best?
Comment 15 Tom Tromey 2024-06-05 18:25:07 UTC
(In reply to Dmitry Neverov from comment #13)

> Maybe lnp_state_machine::check_line_address should check -2 as well?

Can you try it and see if it helps?
Comment 16 Dmitry Neverov 2024-06-06 07:10:41 UTC
I get no crash if I change condition in lnp_state_machine::check_line_address to

  if ((address == 0 && address < unrelocated_lowpc)
      || address == (CORE_ADDR) -1 || address == (CORE_ADDR) -2)
Comment 17 Tom Tromey 2024-06-07 21:37:12 UTC
I think it would be worthwhile to submit a patch for this, then.
The code could have a comment mentioning the clang/lld work here
to explain the rationale for that -2.
Comment 18 Sourceware Commits 2024-08-25 08:44:35 UTC
The master branch has been updated by Dmitrii Neverov <neverov@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=e814012b2b108743e21b7ef2799310a0f4e0a86d

commit e814012b2b108743e21b7ef2799310a0f4e0a86d
Author: Dmitry Neverov <dmitry.neverov@jetbrains.com>
Date:   Sat Jun 8 10:41:31 2024 +0200

    Recognize -2 as a tombstone value in .debug_line
    
    Commit a8caed5d7faa639a1e6769eba551d15d8ddd9510 handled the tombstone
    value -1 used by lld (https://reviews.llvm.org/D81784).  The
    referenced lld commit also uses the tombstone value -2 for
    pre-DWARF-v5
    (https://github.com/llvm/llvm-project/commit/e618ccbf431f6730edb6d1467a127c3a52fd57f7).
    
    If not handled, -2 breaks the pc step range calculation and triggers
    the assertion:
    
      gdb/infrun.c:2794: internal-error: resume_1: Assertion
      `pc_in_thread_step_range (pc, tp)' failed.
    
    This commit adds -2 tombstone value and handles it in the same way as -1.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31727
    Approved-By: Tom Tromey <tom@tromey.com>
Comment 19 Joel Brobecker 2024-09-01 18:42:07 UTC
Set the target milestone to 15.2 for this one, as it hides an issue is caused by a certain stub which has a bug in it where it deviates from the protocol specifications (described in https://sourceware.org/bugzilla/show_bug.cgi?id=31801 ).

Due to the _apparent_ regression aspect, we marked that other PR for release 15.2, but in fact there is no fix to be done in that other PR, so marking this one for 15.2 instead. Not critical for 15.2, but if we can manage to get it in, it will help some users.

Thus, next steps:

* Have a Global Maintainer either approve or reject the backport of this patch to the gdb-15-branch;
* If the backport is approved, cherry-pick the patch on gdb-15-branch, and then close;
If the backport is  rejected, then change the target milestone to 16.1, and then close.
Comment 20 Sourceware Commits 2024-09-04 18:38:29 UTC
The gdb-15-branch branch has been updated by Dmitrii Neverov <neverov@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=9542d1b3a05477c415d870b7b652cdde75a5c8ea

commit 9542d1b3a05477c415d870b7b652cdde75a5c8ea
Author: Dmitry Neverov <dmitry.neverov@jetbrains.com>
Date:   Sat Jun 8 10:41:31 2024 +0200

    Recognize -2 as a tombstone value in .debug_line
    
    Commit a8caed5d7faa639a1e6769eba551d15d8ddd9510 handled the tombstone
    value -1 used by lld (https://reviews.llvm.org/D81784).  The
    referenced lld commit also uses the tombstone value -2 for
    pre-DWARF-v5
    (https://github.com/llvm/llvm-project/commit/e618ccbf431f6730edb6d1467a127c3a52fd57f7).
    
    If not handled, -2 breaks the pc step range calculation and triggers
    the assertion:
    
      gdb/infrun.c:2794: internal-error: resume_1: Assertion
      `pc_in_thread_step_range (pc, tp)' failed.
    
    This commit adds -2 tombstone value and handles it in the same way as -1.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31727
    Cherry-picked from e814012b2b108743e21b7ef2799310a0f4e0a86d
    Approved-By: Tom Tromey <tom@tromey.com>
Comment 21 Joel Brobecker 2024-09-08 13:54:44 UTC
closing, now that the patch has been backported