Created attachment 9871 [details] Reproducer test case Given the empty for loop in the reproducer with a large number of iterations, gdb hangs while attempting to step over a single iteration of the loop. The server hangs and must be interrupted manually. The issue does not present itself if the loop contains an instruction - adding __asm("NOP"); inside the loop is sufficient to suppress the bug and allow the loop to be stepped correctly. This issue can be reproduced on trunk for both ARM and x86 platforms.
This is line info generation issue and not a gdb bug. I did a test build of attached sample code on Ubuntu 16.04 (x86_64) With gcc version 5.4.0: gcc -ggdb3 -O0 -o file-gcc file.c (gdb) info line file.c:8 Line 8 of "file.c" is at address 0x4004f0 <main+26> but contains no code. With clang version 5.0.0-3~16.04.1: clang-5.0 -ggdb3 -O0 -o file-clang file.c (gdb) info line file.c:8 Line 8 of "file.c" starts at address 0x4004df <main+31> and ends at 0x4004e4 <main+36>. gcc generates no line information for empty brace and considers the for loop as a single statement but clang generates line information for empty braces as well. Therefore you will see clang generated exe doing stepping between start and end of inner loop. GCC however will wait for the loop to complete for successful step. gdb native or remote debugging does not hang but rather stepping for the loop statement requires a lot of time. For example RaspberryPi2 Model B+ completes 50 loop iterations in one second.
Hi Omair, Do I understand correctly that GCC generates wrong line information? If that's the case, would you please post listing of current debug line info (the buggy one) and what should be the correct line info. We'll then take this into GCC community to investigate and fix. Thank you.
GCC Code Generated for main.c: objdump -S gcc.out int main (void) { 4004d6: 55 push %rbp 4004d7: 48 89 e5 mov %rsp,%rbp while (1) { for (unsigned int i = 0U; i < 0xFFFFFU; i++) 4004da: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) 4004e1: eb 04 jmp 4004e7 <main+0x11> 4004e3: 83 45 fc 01 addl $0x1,-0x4(%rbp) 4004e7: 81 7d fc fe ff 0f 00 cmpl $0xffffe,-0x4(%rbp) 4004ee: 76 f3 jbe 4004e3 <main+0xd> { ; } } 4004f0: eb e8 jmp 4004da <main+0x4> 4004f2: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 4004f9: 00 00 00 4004fc: 0f 1f 40 00 nopl 0x0(%rax) GCC Line info: objdump --dwarf=decodedline gcc.out CU: ./main.c: File name Line number Starting address main.c 2 0x4004d6 main.c 5 0x4004da main.c 5 0x4004e3 main.c 5 0x4004e7 main.c 9 0x4004f0 Gcc generates 3 line infos for addresses corresponding to loop. There is no line info generated against: 4004ee: 76 f3 jbe 4004e3 <main+0xd> Which should be line no 8 in this case. Clang Code Generated for main.c: objdump -S clang.out int main (void) { 4004c0: 55 push %rbp 4004c1: 48 89 e5 mov %rsp,%rbp 4004c4: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) while (1) { for (unsigned int i = 0U; i < 0xFFFFFU; i++) 4004cb: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp) 4004d2: 81 7d f8 ff ff 0f 00 cmpl $0xfffff,-0x8(%rbp) 4004d9: 0f 83 13 00 00 00 jae 4004f2 <main+0x32> { ; } 4004df: e9 00 00 00 00 jmpq 4004e4 <main+0x24> int main (void) { while (1) { for (unsigned int i = 0U; i < 0xFFFFFU; i++) 4004e4: 8b 45 f8 mov -0x8(%rbp),%eax 4004e7: 83 c0 01 add $0x1,%eax 4004ea: 89 45 f8 mov %eax,-0x8(%rbp) 4004ed: e9 e0 ff ff ff jmpq 4004d2 <main+0x12> int main (void) { while (1) 4004f2: e9 d4 ff ff ff jmpq 4004cb <main+0xb> 4004f7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 4004fe: 00 00 Clang Line info: objdump --dwarf=decodedline clang.out CU: main.c: File name Line number Starting address main.c 2 0x4004c0 main.c 5 0x4004cb main.c 5 0x4004d2 main.c 5 0x4004d9 main.c 8 0x4004df main.c 5 0x4004e4 main.c 5 0x4004ed main.c 3 0x4004f2 Clang generates 2 separate line information 1 for start of the loop at line no 5 and other for last brace which is at line no 8 at address 0x4004df
Still reproduces on GCC 6.3 and 7.3. Test case: int main(void) { int var = 0; for (;;) { var++; } return 0; } arm-eabi-gcc --version arm-eabi-gcc (Linaro GCC 7.3-2018.05) 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] arm-eabi-gcc -c -g test.c && arm-eabi-objdump.exe --dwarf=decodedline test.o CU: test.c: File name Line number Starting address test.c 2 0 test.c 3 0xc test.c 7 0x14 While 4.9.3 is fine: arm-none-eabi-gcc --version arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 227977] arm-none-eabi-gcc -c -g test.c && arm-none-eabi-objdump.exe --dwarf=decodedline test.o CU: test.c: File name Line number Starting address test.c 2 0 test.c 3 0xc test.c 7 0x14 test.c 8 0x20
Using Linaro prebuilt toolchains I've narrowed to state: gcc-linaro-5.5.0-2017.10 is fine (generates line info for line #8) gcc-linaro-6.1.1-2016.08 is bad (no line info for line #8) Hope this helps :)
After investigating this in more detail, we're really dealing with some corner/degenerate cases here. It seems unlikely we will be able to fix all of the variations of this annoyance without some more complex changes. Then the question that comes to mind is if it is really worth the effort. In the worst case, we have an empty loop, a jump instruction that jumps to itself. GDB won't see line changes when we next/step over this. So it will be equally "stuck" (in reality it keeps moving, as Omair said, but it appears to the user that it is stuck). Then there are other cases where we have some tight/empty loop that is written in some particular way that will cause compilers to not generate a line transition from the for loop header to the statements in the body. More generally, even non-corner-case code will hit this annoyance if we craft the code in a particular way. For example, if we construct a for loop, complete with header and non-empty body, but write it in a single line, GDB will also display the same behavior. But in this particular case we could argue that it is a GDB problem for not noticing the column transitions as opposed to line transitions. In summary, each particular case may require a slightly different approach to get the compiler to output a meaningful line transition. Take, for example, the following testcase: 1 int main (void) 2 { 3 while (1) 4 { 5 for (unsigned int i = 0U; i < 0xFFFFFU; i++) 6 { 7 ; 8 } 9 } 10 } We could force GCC to output a line transition to line 8 when we reach the instruction that jumps back to the header. Then again, is that jump really properly mapped to the for loop's body's closing brace? What would happen if we just opt to not have the opening/closing braces? What would the right transition be in that case? Another example is the following: 1 int main(void) 2 { 3 int var = 0; 4 5 for (;;) 6 { 7 var++; 8 } 9 10 return 0; 11 } The above will also confuse GDB due to not having a proper for loop header. So we jump into line 7, but then we don't have a line transition when we do the next iteration. The instruction that jumps to the header is now mapped to line 7. It doesn't map to line 8 (the for loop's closing brace) or 5 (the for loop's header).
> if it is really worth the effort I can't tell if it is worth the effort, but I can definitely tell that it is very annoying to loose control in a debug session and to have to plan in advance to place breakpoints after the loops. And since clang did it, it should not be Mission Impossible.
I completely agree that this is very annoying and non-intuitive. I'm still investigating this and trying to break it down into separate cases. Some of them are more complex to handle, while others should be simpler. Clang does handle *some* cases better than GCC, but in others it still generates line number transitions that will lead GDB to skip whole loops in single-step mode. Ideally we'd come up with a solution that handles the most common cases.
I'm experimenting with a GCC patch that generates additional line numbers in loop corner cases, preventing GDB from getting "stuck" while line-stepping.
This patch series could improve things by considering column information. https://sourceware.org/pipermail/gdb-patches/2020-May/168673.html
I just ran into this issue recently using a Black Magic probe to debug an STM32 and I think it is also a GDB bug. When this is triggered you have to kill GDB because ctrl-c does not interrupt it. See this bug report: https://github.com/blackmagic-debug/blackmagic/issues/190 Notably GDB is not hung, nor is the BMP probe - GDB is simply stuck in a loop talking to the BMP setting breakpoints etc. I don't really know where to start looking in the GDB code to see why it blocks being interrupted though so it's hard to see how difficult it would be to mitigate the problem. Obviously with the 'nop' work around it's not a big deal but it was quite a frustrating experience until I learnt that :)
GDB should always acknowledge an interrupt request, so that is clearly a bug. I'm not sure if it should be handled in this particular ticket though. It is related to this ticket, but it seems to be a different issue. Maybe one involving the remote target advertising the proper features so GDB can properly communicate the ctrl-c sequence.
I don't think this is an issue with the remote target / GDB communication since the trace shows GDB is still issuing commands rather than waiting for something. I'm happy to open a new bug for GDB not being interruptible in this situation, although I think in that case this bug would be closed because it doesn't cover anything (since all that would be left is the GCC side which looks like is handled in a different bug system)
We use GDB as our debugger backend in "Embeetle IDE" (a new IDE for microcontroller coding, we developed it from scratch, it's not based on Eclipse or any other existing IDE). Unfortunately, this "GDB hangs while stepping an empty loop" bug is blocking our progress considerably. I'm looking for someone who can fix this bug in return for a payment/donation. Please contact me. My name is Kristof Mulier. You can find my contact info on the "Embeetle IDE" website.
Do you have a case of a 1 line loop that debuggers get stuck on? Are you seeing this mostly with gcc or clang as well?
Hi @Luis, Thanks for your quick reply! We have this problem with GCC, that's what we are using. Here is an example of the loop code: void delay_1ms(uint32_t count) { delay = count; while(0U != delay){ } // <- this loop causes the problem }
(In reply to kristof mulier from comment #16) > Hi @Luis, > Thanks for your quick reply! > We have this problem with GCC, that's what we are using. Here is an example > of the loop code: > > void delay_1ms(uint32_t count) > { > delay = count; > > while(0U != delay){ > } // <- this loop causes the problem > } Right. That while case *should* be a case where we can handle things in a sane way if there are more than a single instruction. I believe clang might generate code that doesn't cause gdb to get stuck. gcc is a different matter though. Last time I was investigating this, gcc used to get rid of opening/closing brace line entries, whereas clang doesn't. The opening/closing brace is something compilers can use to generate a line entry, which gdb would like to see for a step/next command. Handling these cases (they are a set of multiple corner cases depending on how you write your code) in gdb is not trivial unfortunately.
> Handling these cases (they are a set of multiple corner cases depending on how you write your code) in gdb is not trivial unfortunately. Indeed, we understand this is a very complicated problem to solve. > I believe clang might generate code that doesn't cause GDB to get stuck. Unfortunately, we need a solution that's independent of the compiler. We cannot force our users to use Clang. Do you think a solution would be possible, that works for the current GCC versions? Or is this a fundamental problem? Many microcontroller sample projects use these kinds of empty loops for delays. If you stop the program, there's always a big probability the PC is in such a loop. So, this "GDB hangs while stepping an empty loop" issue is really important to many GDB users.
When it happens, can you actually interrupt it so you regain control of gdb?
(In reply to Luis Machado from comment #19) > When it happens, can you actually interrupt it so you regain control of gdb? No, it's stuck sending this to the remote debugger: Breakpoint 1, i2c_write7_v1 (i2c=1073763328, addr=80, data=0x2001ffa4 "", n=2, start=true) at hal.c:62 62 if (start) { (gdb) next 63 for (tout = I2C_TIMEOUT; tout > 0 && (I2C_SR2(i2c) & I2C_SR2_BUSY); tout--) (gdb) set debug remote 1 (gdb) next [remote] Sending packet: $Z1,8000546,2#7c [remote] Received Ack [remote] Packet received: OK [remote] Sending packet: $m800054e,2#61 [remote] Received Ack [remote] Packet received: 404b [remote] Sending packet: $m800054e,2#61 [remote] Received Ack [remote] Packet received: 404b [remote] Sending packet: $m8000550,2#2d [remote] Received Ack [remote] Packet received: 7b61 [remote] Sending packet: $m800054e,4#63 [remote] Received Ack [remote] Packet received: 404b7b61 [remote] Sending packet: $m800054e,2#61 [remote] Received Ack [remote] Packet received: 404b [remote] Sending packet: $m800054e,2#61 [remote] Received Ack [remote] Packet received: 404b [remote] Sending packet: $Z1,8000550,2#77 [remote] Received Ack [remote] Packet received: OK [remote] Sending packet: $c#63 [remote] Received Ack [remote] wait: enter [remote] wait: exit [remote] wait: enter [remote] Packet received: T05 [remote] select_thread_for_ambiguous_stop_reply: enter [remote] select_thread_for_ambiguous_stop_reply: process_wide_stop = 0 [remote] select_thread_for_ambiguous_stop_reply: first resumed thread is Thread 1 [remote] select_thread_for_ambiguous_stop_reply: is this guess ambiguous? = 0 [remote] select_thread_for_ambiguous_stop_reply: exit [remote] wait: exit (See https://github.com/blackmagic-debug/blackmagic/issues/190#issuecomment-1327190470)
Thanks for the input. That behavior sounds like a different bug in gdb. When in motion, gdb should always be interruptible so that you don't run into situations like this. There isn't much gdb can do about the tight loops themselves, but there should be a graceful way out of it with the interruption. In the past there used to be a race condition between gdb sending the interrupt request and the remote replying with a stop reply, and then gdb for some reason used to disregard the interrupt request and proceed with the motion (issuing stepi/continue over and over again). I'll have to play with this a bit more to see if I can reproduce the same situation. Unrelated, but I did notice the black magic debugging stub is still using ACK packets though. If it isn't dealing with unreliable communication channels, that can be dropped.
IME, this issue of ctrl-c-not-interrupting-tight-loop is most often caused by the stub itself not handling the interrupt request properly. Typically, gdb sends the interrupt "packet" (\003), and but the stub reports a SIGTRAP stop instead of a SIGINT stop, because of single-stepping or hitting a breakpoint, and the interrupt request ends up lost. So I'd start by confirming whether gdb did send the \003 over the wire, and then checking whether the stub ended up dropping it for a reason similar to what I mentioned.