Q: mutlithreaded tracees && clone/exit
Oleg Nesterov
oleg@redhat.com
Wed Jul 21 08:32:00 GMT 2010
On 07/20, Roland McGrath wrote:
>
> > Probably this is fine for gdb. But ugdb was started to prototype the
> > new general purpose API. Say, vAttach attaches the whole thread group,
> > there is no way to debug a single thread. Not good in general. The same
> > for D command and for W/X notifications from gdbserver.
>
> It seems fine and normal for whole process to be the granularity of
> attaching. You need to be able to control the individual threads, of
> course. But it doesn't really make a lot of sense to "debug" one thread
> and not another in the same process.
I disagree. But currently this is off-topic.
> > However, when this thread exits, gdbserver sends nothing and gdb
> > continues to wait. For what? Another (main) thead is TASK_TRACED,
> > it can do nothing unless it is SIGKILLED.
>
> Yes, it seems like gdb is confusing itself here.
> Perhaps it is not confused that way when in non-stop mode.
No, I did this testing in non-stop mode. With or without target-async.
Just in case, more info. So, gdb hangs when the sub-thread exits
(to remind, gdbserver sends nothing).
If I press ^C, gdb sends "vCont;t:pTGID.PID" and gdbserver replies
"OK". Now this looks like a bug in gdbserver. This thread no longer
exists, it was already reaped.
So, gdb hangs again after ^C waiting for gdbserver which does nothing.
This is what gdbserver does when the sub-thread exits:
select(5, [3 4], [], [3 4], NULL) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
(the tracee exits)
read(3, 0x7fffc13431bf, 1) = -1 EAGAIN (Resource temporarily unavailable)
write(5, "+", 1) = 1
rt_sigreturn(0x5) = -1 EINTR (Interrupted system call)
select(5, [3 4], [], [3 4], NULL) = 1 (in [3])
read(3, "+", 1) = 1
read(3, 0x7fffc13434bf, 1) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
wait4(-1, 0x7fffc134356c, WNOHANG, NULL) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|__WCLONE, NULL) = 6538
(this means release_task(), this thread doesn't exist any longer)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
wait4(-1, 0x7fffc134356c, WNOHANG, NULL) = 0
wait4(-1, 0x7fffc134356c, WNOHANG|__WCLONE, NULL) = -1 ECHILD (No child processes)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
select(5, [3 4], [], [3 4], NULL <unfinished ...>
So, it sends nothing to gdb. When I press ^C, gdb sends vCont and:
select(5, [3 4], [], [3 4], NULL) = 1 (in [4])
--- SIGIO (I/O possible) @ 0 (0) ---
read(4, "$vCont;t:p1989.198a#6f", 8192) = 22
write(4, "$OK#9a", 6) = 6
select(5, [3 4], [], [3 4], NULL <unfinished ...>
gdbserver sends the bogus "OK".
The bug is not "fatal", if I press ^C again gdb sends T, gets the
correct "E01", and detects the fact it has exited. Still this looks
like a obvious bug.
Oleg.
More information about the Archer
mailing list