SIGSEGV on exit from subroutines -- problem with non-stop ?
chris.hall.list@highwayman.com
chris.hall.list@highwayman.com
Mon Mar 7 11:00:00 GMT 2011
Hi,
I am using gdb 7.2-14.fc14 to work on a large multi-threaded
application, in C, x86-64.
I have .gdbinit, per the book:
set target async 1
set pagination off
set non-stop on
When I step using 's' or 'n', as it leaves some subroutines I keep
getting SIGSEGV, such as:
Program received signal SIGSEGV, Segmentation fault.
signal_set (signo=Cannot access memory at address
0xffffffffffffff5c)
at ...
When I 'disass' the current instruction is a leaveq. Examining the
registers I observe that rbp is zero, which is clearly nonsense.
I found one instance which was repeatable, which happened to be before
any threads were started: if I 'ni' through a particular function, it
gets to the leaveq, and gets stuck there. Each time I do ni, the rsp
and the rbp are updated by the repeated leaveq, until it goes bang.
So... I began to think this isn't something complicated to do with
multiple threads... so here is a test:
<<--test.c-----------------------------------------------
#include <stdio.h>
#include <stdlib.h>
static void
target(const char* message) {
printf("%s ...BANG!\n", message) ;
}
int main(int argc, char* argv[]) {
target("Light the blue touch paper") ;
return 0 ;
}
------------------------------------------------------->>
Compiled by gcc 4.5.1 "-g -O0".
If I do "gdb test", stepping by "n":
<<-------------------------------------------------------
(gdb) show non-stop
Controlling the inferior in non-stop mode is on.
(gdb) b target
Breakpoint 1 at 0x4004d0: file test.c, line 6.
(gdb) run
Starting program: ...........test
Breakpoint 1, target (message=0x400615 "Light the blue touch paper")
at test.c:6
6 printf("%s ...BANG!\n", message) ;
(gdb) n
Light the blue touch paper ...BANG!
7 }
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
target (message=Cannot access memory at address 0xfffffffffffffff8
) at test.c:7
7 }
(gdb) info reg
....
rbp 0x0 0x0
rsp 0x7fffffffe248 0x7fffffffe248
....
rip 0x4004e9 0x4004e9 <target+37>
....
------------------------------------------------------->>
Or, stepping by 'ni':
<<-------------------------------------------------------
(gdb) show non-stop
Controlling the inferior in non-stop mode is on.
(gdb) b target
Breakpoint 1 at 0x4004d0: file test.c, line 6.
(gdb) disass target
Dump of assembler code for function target:
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
0x00000000004004c8 <+4>: sub $0x10,%rsp
0x00000000004004cc <+8>: mov %rdi,-0x8(%rbp)
0x00000000004004d0 <+12>: mov $0x400608,%eax
0x00000000004004d5 <+17>: mov -0x8(%rbp),%rdx
0x00000000004004d9 <+21>: mov %rdx,%rsi
0x00000000004004dc <+24>: mov %rax,%rdi
0x00000000004004df <+27>: mov $0x0,%eax
0x00000000004004e4 <+32>: callq 0x4003b8 <printf@plt>
0x00000000004004e9 <+37>: leaveq
0x00000000004004ea <+38>: retq
End of assembler dump.
(gdb) disp/i $pc
(gdb) run
Starting program: .......test
Breakpoint 1, target (message=0x400615 "Light the blue touch paper")
at test.c:6
6 printf("%s ...BANG!\n", message) ;
.....
1: x/i $pc
=> 0x4004e4 <target+32>: callq 0x4003b8 <printf@plt>
(gdb) ni
Light the blue touch paper ...BANG!
7 }
1: x/i $pc
=> 0x4004e9 <target+37>: leaveq
(gdb) ni
target (message=0x100000000 <Address 0x100000000 out of bounds>) at
test.c:7
7 }
1: x/i $pc
=> 0x4004e9 <target+37>: leaveq
(gdb) ni
Cannot access memory at address 0x8
(gdb) ni
The program is not being run.
------------------------------------------------------->>
I note that if I turn off the "non-stop" option, it works. So this is
something to do with debugging multi-threaded !
I note also that if I change the target to:
static int
target(const char* message) {
printf("%s ...BANG!\n", message) ;
return 0 ;
}
the problem goes away... so one extra instruction between the callq
and the leaveq makes a difference:
0x00000000004004dc <+24>: mov %rax,%rdi
0x00000000004004df <+27>: mov $0x0,%eax
0x00000000004004e4 <+32>: callq 0x4003b8 <printf@plt>
0x00000000004004e9 <+37>: mov $0x0,%eax
0x00000000004004ee <+42>: leaveq
0x00000000004004ef <+43>: retq
This goes some way to explaining why it appeared to be a sporadic
problem.
Is this me, or is this a bug ? It used to work :-(
Thanks,
Chris
More information about the Gdb
mailing list