Bug 21221 - gdb hangs while stepping an empty loop
Summary: gdb hangs while stepping an empty loop
Status: ASSIGNED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-06 13:20 UTC by Prakhar Bahuguna
Modified: 2023-03-09 12:49 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2019-10-25 00:00:00


Attachments
Reproducer test case (93 bytes, text/x-csrc)
2017-03-06 13:20 UTC, Prakhar Bahuguna
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Prakhar Bahuguna 2017-03-06 13:20:28 UTC
Created attachment 9871 [details]
Reproducer test case

Given the empty for loop in the reproducer with a large number of iterations, gdb hangs while attempting to step over a single iteration of the loop. The server hangs and must be interrupted manually.

The issue does not present itself if the loop contains an instruction - adding __asm("NOP"); inside the loop is sufficient to suppress the bug and allow the loop to be stepped correctly.

This issue can be reproduced on trunk for both ARM and x86 platforms.
Comment 1 Omair Javaid 2018-02-20 08:51:11 UTC
This is line info generation issue and not a gdb bug.

I did a test build of attached sample code on Ubuntu 16.04 (x86_64) 

With gcc version 5.4.0:
gcc -ggdb3 -O0 -o file-gcc file.c

(gdb) info line file.c:8
Line 8 of "file.c" is at address 0x4004f0 <main+26> but contains no code.

With clang version 5.0.0-3~16.04.1:
clang-5.0 -ggdb3 -O0 -o file-clang file.c

(gdb) info line file.c:8
Line 8 of "file.c" starts at address 0x4004df <main+31>
   and ends at 0x4004e4 <main+36>.

gcc generates no line information for empty brace and considers the for loop as a single statement but clang generates line information for empty braces as well. Therefore you will see clang generated exe doing stepping between start and end of inner loop. GCC however will wait for the loop to complete for successful step. gdb native or remote debugging does not hang but rather stepping for the loop statement requires a lot of time. For example RaspberryPi2 Model B+ completes 50 loop iterations in one second.
Comment 2 Maxim Kuvyrkov 2018-04-29 11:09:34 UTC
Hi Omair,

Do I understand correctly that GCC generates wrong line information?  If that's the case, would you please post listing of current debug line info (the buggy one) and what should be the correct line info.

We'll then take this into GCC community to investigate and fix.

Thank you.
Comment 3 Omair Javaid 2018-05-07 11:09:50 UTC
GCC Code Generated for main.c:
objdump -S gcc.out

int main (void)
{
  4004d6:	55                   	push   %rbp
  4004d7:	48 89 e5             	mov    %rsp,%rbp
  while (1)
  {
    for (unsigned int i = 0U; i < 0xFFFFFU; i++)
  4004da:	c7 45 fc 00 00 00 00 	movl   $0x0,-0x4(%rbp)
  4004e1:	eb 04                	jmp    4004e7 <main+0x11>
  4004e3:	83 45 fc 01          	addl   $0x1,-0x4(%rbp)
  4004e7:	81 7d fc fe ff 0f 00 	cmpl   $0xffffe,-0x4(%rbp)
  4004ee:	76 f3                	jbe    4004e3 <main+0xd>
    {
      ;
    }
  }
  4004f0:	eb e8                	jmp    4004da <main+0x4>
  4004f2:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4004f9:	00 00 00 
  4004fc:	0f 1f 40 00          	nopl   0x0(%rax)

GCC Line info:
objdump --dwarf=decodedline gcc.out

CU: ./main.c:
File name                            Line number    Starting address
main.c                                         2            0x4004d6
main.c                                         5            0x4004da
main.c                                         5            0x4004e3
main.c                                         5            0x4004e7
main.c                                         9            0x4004f0

Gcc generates 3 line infos for addresses corresponding to loop.
There is no line info generated against:
  4004ee:	76 f3                	jbe    4004e3 <main+0xd>
Which should be line no 8 in this case.

Clang Code Generated for main.c:
objdump -S clang.out

int main (void)
{
  4004c0:	55                   	push   %rbp
  4004c1:	48 89 e5             	mov    %rsp,%rbp
  4004c4:	c7 45 fc 00 00 00 00 	movl   $0x0,-0x4(%rbp)
  while (1)
  {
    for (unsigned int i = 0U; i < 0xFFFFFU; i++)
  4004cb:	c7 45 f8 00 00 00 00 	movl   $0x0,-0x8(%rbp)
  4004d2:	81 7d f8 ff ff 0f 00 	cmpl   $0xfffff,-0x8(%rbp)
  4004d9:	0f 83 13 00 00 00    	jae    4004f2 <main+0x32>
    {
      ;
    }
  4004df:	e9 00 00 00 00       	jmpq   4004e4 <main+0x24>
int main (void)
{
  while (1)
  {
    for (unsigned int i = 0U; i < 0xFFFFFU; i++)
  4004e4:	8b 45 f8             	mov    -0x8(%rbp),%eax
  4004e7:	83 c0 01             	add    $0x1,%eax
  4004ea:	89 45 f8             	mov    %eax,-0x8(%rbp)
  4004ed:	e9 e0 ff ff ff       	jmpq   4004d2 <main+0x12>
int main (void)
{
  while (1)
  4004f2:	e9 d4 ff ff ff       	jmpq   4004cb <main+0xb>
  4004f7:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  4004fe:	00 00 


Clang Line info:
objdump --dwarf=decodedline clang.out

CU: main.c:
File name                            Line number    Starting address
main.c                                         2            0x4004c0
main.c                                         5            0x4004cb
main.c                                         5            0x4004d2
main.c                                         5            0x4004d9
main.c                                         8            0x4004df
main.c                                         5            0x4004e4
main.c                                         5            0x4004ed
main.c                                         3            0x4004f2

Clang generates 2 separate line information 1 for start of the loop at line no 5
and other for last brace which is at line no 8 at address 0x4004df
Comment 4 Alexander Fedotov 2018-12-19 20:16:05 UTC
Still reproduces on GCC 6.3 and 7.3.
Test case:

int main(void)
{
int var = 0;

for (;;)
{
  var++;
}

return 0;
}

arm-eabi-gcc --version
arm-eabi-gcc (Linaro GCC 7.3-2018.05) 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701]

arm-eabi-gcc -c -g test.c && arm-eabi-objdump.exe --dwarf=decodedline test.o

CU: test.c:
File name                            Line number    Starting address
test.c                                         2                   0
test.c                                         3                 0xc
test.c                                         7                0x14



While 4.9.3 is fine:

arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 227977]

arm-none-eabi-gcc -c -g test.c && arm-none-eabi-objdump.exe --dwarf=decodedline test.o

CU: test.c:
File name                            Line number    Starting address
test.c                                         2                   0
test.c                                         3                 0xc
test.c                                         7                0x14
test.c                                         8                0x20
Comment 5 Alexander Fedotov 2018-12-19 20:37:27 UTC
Using Linaro prebuilt toolchains I've narrowed to state:
  gcc-linaro-5.5.0-2017.10 is fine (generates line info for line #8)
  gcc-linaro-6.1.1-2016.08 is bad (no line info for line #8)

Hope this helps :)
Comment 6 Luis Machado 2019-10-09 23:59:17 UTC
After investigating this in more detail, we're really dealing with some corner/degenerate cases here. It seems unlikely we will be able to fix all of the variations of this annoyance without some more complex changes. Then the question that comes to mind is if it is really worth the effort.

In the worst case, we have an empty loop, a jump instruction that jumps to itself. GDB won't see line changes when we next/step over this. So it will be equally "stuck" (in reality it keeps moving, as Omair said, but it appears to the user that it is stuck).

Then there are other cases where we have some tight/empty loop that is written in some particular way that will cause compilers to not generate a line transition from the for loop header to the statements in the body.

More generally, even non-corner-case code will hit this annoyance if we craft the code in a particular way. For example, if we construct a for loop, complete with header and non-empty body, but write it in a single line, GDB will also display the same behavior. But in this particular case we could argue that it is a GDB problem for not noticing the column transitions as opposed to line transitions.

In summary, each particular case may require a slightly different approach to get the compiler to output a meaningful line transition. Take, for example, the following testcase:

1  int main (void)
2  {
3    while (1)
4    {
5      for (unsigned int i = 0U; i < 0xFFFFFU; i++)
6      {
7        ;
8      }
9    }
10 }

We could force GCC to output a line transition to line 8 when we reach the instruction that jumps back to the header. Then again, is that jump really properly mapped to the for loop's body's closing brace? What would happen if we just opt to not have the opening/closing braces? What would the right transition be in that case?

Another example is the following:

1  int main(void)
2  {
3  int var = 0;
4  
5  for (;;)
6  {
7    var++;
8  }
9  
10 return 0;
11 }

The above will also confuse GDB due to not having a proper for loop header. So we jump into line 7, but then we don't have a line transition when we do the next iteration. The instruction that jumps to the header is now mapped to line 7. It doesn't map to line 8 (the for loop's closing brace) or 5 (the for loop's header).
Comment 7 Liviu Ionescu 2019-10-10 06:07:04 UTC
> if it is really worth the effort

I can't tell if it is worth the effort, but I can definitely tell that it is very annoying to loose control in a debug session and to have to plan in advance to place breakpoints after the loops. 

And since clang did it, it should not be Mission Impossible.
Comment 8 Luis Machado 2019-10-10 10:37:44 UTC
I completely agree that this is very annoying and non-intuitive.

I'm still investigating this and trying to break it down into separate cases. Some of them are more complex to handle, while others should be simpler.

Clang does handle *some* cases better than GCC, but in others it still generates line number transitions that will lead GDB to skip whole loops in single-step mode.

Ideally we'd come up with a solution that handles the most common cases.
Comment 9 Luis Machado 2019-10-25 17:22:49 UTC
I'm experimenting with a GCC patch that generates additional line numbers in loop corner cases, preventing GDB from getting "stuck" while line-stepping.
Comment 10 Luis Machado 2020-05-19 04:53:26 UTC
This patch series could improve things by considering column information.

https://sourceware.org/pipermail/gdb-patches/2020-May/168673.html
Comment 11 Daniel O'Connor 2022-11-25 12:55:06 UTC
I just ran into this issue recently using a Black Magic probe to debug an STM32 and I think it is also a GDB bug.

When this is triggered you have to kill GDB because ctrl-c does not interrupt it.

See this bug report: https://github.com/blackmagic-debug/blackmagic/issues/190

Notably GDB is not hung, nor is the BMP probe - GDB is simply stuck in a loop talking to the BMP setting breakpoints etc.

I don't really know where to start looking in the GDB code to see why it blocks being interrupted though so it's hard to see how difficult it would be to mitigate the problem.

Obviously with the 'nop' work around it's not a big deal but it was quite a frustrating experience until I learnt that :)
Comment 12 Luis Machado 2022-11-25 13:02:11 UTC
GDB should always acknowledge an interrupt request, so that is clearly a bug. I'm not sure if it should be handled in this particular ticket though.

It is related to this ticket, but it seems to be a different issue. Maybe one involving the remote target advertising the proper features so GDB can properly communicate the ctrl-c sequence.
Comment 13 Daniel O'Connor 2022-11-27 02:51:19 UTC
I don't think this is an issue with the remote target / GDB communication since the trace shows GDB is still issuing commands rather than waiting for something.

I'm happy to open a new bug for GDB not being interruptible in this situation, although I think in that case this bug would be closed because it doesn't cover anything (since all that would be left is the GCC side which looks like is handled in a different bug system)
Comment 14 kristof mulier 2023-01-31 10:18:11 UTC
We use GDB as our debugger backend in "Embeetle IDE" (a new IDE for microcontroller coding, we developed it from scratch, it's not based on Eclipse or any other existing IDE).
Unfortunately, this "GDB hangs while stepping an empty loop" bug is blocking our progress considerably. I'm looking for someone who can fix this bug in return for a payment/donation. Please contact me. My name is Kristof Mulier. You can find my contact info on the "Embeetle IDE" website.
Comment 15 Luis Machado 2023-01-31 10:39:01 UTC
Do you have a case of a 1 line loop that debuggers get stuck on?

Are you seeing this mostly with gcc or clang as well?
Comment 16 kristof mulier 2023-01-31 10:51:39 UTC
Hi @Luis,
Thanks for your quick reply!
We have this problem with GCC, that's what we are using. Here is an example of the loop code:

    void delay_1ms(uint32_t count)
    {
        delay = count;

        while(0U != delay){
        } // <- this loop causes the problem
    }
Comment 17 Luis Machado 2023-01-31 11:40:41 UTC
(In reply to kristof mulier from comment #16)
> Hi @Luis,
> Thanks for your quick reply!
> We have this problem with GCC, that's what we are using. Here is an example
> of the loop code:
> 
>     void delay_1ms(uint32_t count)
>     {
>         delay = count;
> 
>         while(0U != delay){
>         } // <- this loop causes the problem
>     }

Right. That while case *should* be a case where we can handle things in a sane way if there are more than a single instruction. I believe clang might generate code that doesn't cause gdb to get stuck.

gcc is a different matter though. Last time I was investigating this, gcc used to get rid of opening/closing brace line entries, whereas clang doesn't. The opening/closing brace is something compilers can use to generate a line entry, which gdb would like to see for a step/next command.

Handling these cases (they are a set of multiple corner cases depending on how you write your code) in gdb is not trivial unfortunately.
Comment 18 kristof mulier 2023-01-31 15:00:41 UTC
> Handling these cases (they are a set of multiple corner cases depending on how you write your code) in gdb is not trivial unfortunately.

Indeed, we understand this is a very complicated problem to solve.

> I believe clang might generate code that doesn't cause GDB to get stuck.

Unfortunately, we need a solution that's independent of the compiler. We cannot force our users to use Clang.

Do you think a solution would be possible, that works for the current GCC versions? Or is this a fundamental problem?

Many microcontroller sample projects use these kinds of empty loops for delays. If you stop the program, there's always a big probability the PC is in such a loop. So, this "GDB hangs while stepping an empty loop" issue is really important to many GDB users.
Comment 19 Luis Machado 2023-02-03 10:20:54 UTC
When it happens, can you actually interrupt it so you regain control of gdb?
Comment 20 Daniel O'Connor 2023-02-03 10:23:24 UTC
(In reply to Luis Machado from comment #19)
> When it happens, can you actually interrupt it so you regain control of gdb?

No, it's stuck sending this to the remote debugger:
Breakpoint 1, i2c_write7_v1 (i2c=1073763328, addr=80, data=0x2001ffa4 "", n=2, start=true) at hal.c:62
62	    if (start) {
(gdb) next
63		for (tout = I2C_TIMEOUT; tout > 0 && (I2C_SR2(i2c) & I2C_SR2_BUSY); tout--)
(gdb) set debug remote 1
(gdb) next
[remote] Sending packet: $Z1,8000546,2#7c
[remote] Received Ack
[remote] Packet received: OK
[remote] Sending packet: $m800054e,2#61
[remote] Received Ack
[remote] Packet received: 404b
[remote] Sending packet: $m800054e,2#61
[remote] Received Ack
[remote] Packet received: 404b
[remote] Sending packet: $m8000550,2#2d
[remote] Received Ack
[remote] Packet received: 7b61
[remote] Sending packet: $m800054e,4#63
[remote] Received Ack
[remote] Packet received: 404b7b61
[remote] Sending packet: $m800054e,2#61
[remote] Received Ack
[remote] Packet received: 404b
[remote] Sending packet: $m800054e,2#61
[remote] Received Ack
[remote] Packet received: 404b
[remote] Sending packet: $Z1,8000550,2#77
[remote] Received Ack
[remote] Packet received: OK
[remote] Sending packet: $c#63
[remote] Received Ack
[remote] wait: enter
[remote] wait: exit
[remote] wait: enter
  [remote] Packet received: T05
  [remote] select_thread_for_ambiguous_stop_reply: enter
    [remote] select_thread_for_ambiguous_stop_reply: process_wide_stop = 0
    [remote] select_thread_for_ambiguous_stop_reply: first resumed thread is Thread 1
    [remote] select_thread_for_ambiguous_stop_reply: is this guess ambiguous? = 0
  [remote] select_thread_for_ambiguous_stop_reply: exit
[remote] wait: exit

(See https://github.com/blackmagic-debug/blackmagic/issues/190#issuecomment-1327190470)
Comment 21 Luis Machado 2023-02-03 10:32:18 UTC
Thanks for the input. That behavior sounds like a different bug in gdb. When in motion, gdb should always be interruptible so that you don't run into situations like this.

There isn't much gdb can do about the tight loops themselves, but there should be a graceful way out of it with the interruption.

In the past there used to be a race condition between gdb sending the interrupt request and the remote replying with a stop reply, and then gdb for some reason used to disregard the interrupt request and proceed with the motion (issuing stepi/continue over and over again).

I'll have to play with this a bit more to see if I can reproduce the same situation.

Unrelated, but I did notice the black magic debugging stub is still using ACK packets though. If it isn't dealing with unreliable communication channels, that can be dropped.
Comment 22 Pedro Alves 2023-02-06 17:53:32 UTC
IME, this issue of ctrl-c-not-interrupting-tight-loop is most often caused by the stub itself not handling the interrupt request properly.  Typically, gdb sends the interrupt "packet" (\003), and but the stub reports a SIGTRAP stop instead of a SIGINT stop, because of single-stepping or hitting a breakpoint, and the interrupt request ends up lost.

So I'd start by confirming whether gdb did send the \003 over the wire, and then checking whether the stub ended up dropping it for a reason similar to what I mentioned.