Q: mutlithreaded tracees && clone/exit

Oleg Nesterov oleg@redhat.com
Wed Jul 21 08:32:00 GMT 2010


On 07/20, Roland McGrath wrote:
>
> > Probably this is fine for gdb. But ugdb was started to prototype the
> > new general purpose API. Say, vAttach attaches the whole thread group,
> > there is no way to debug a single thread. Not good in general. The same
> > for D command and for W/X notifications from gdbserver.
>
> It seems fine and normal for whole process to be the granularity of
> attaching.  You need to be able to control the individual threads, of
> course.  But it doesn't really make a lot of sense to "debug" one thread
> and not another in the same process.

I disagree. But currently this is off-topic.

> > However, when this thread exits, gdbserver sends nothing and gdb
> > continues to wait. For what? Another (main) thead is TASK_TRACED,
> > it can do nothing unless it is SIGKILLED.
>
> Yes, it seems like gdb is confusing itself here.
> Perhaps it is not confused that way when in non-stop mode.

No, I did this testing in non-stop mode. With or without target-async.

Just in case, more info. So, gdb hangs when the sub-thread exits
(to remind, gdbserver sends nothing).

If I press ^C, gdb sends "vCont;t:pTGID.PID" and gdbserver replies
"OK". Now  this looks like a bug in gdbserver. This thread no longer
exists, it was already reaped.

So, gdb hangs again after ^C waiting for gdbserver which does nothing.


This is what gdbserver does when the sub-thread exits:

	select(5, [3 4], [], [3 4], NULL)       = ? ERESTARTNOHAND (To be restarted)
	--- SIGCHLD (Child exited) @ 0 (0) ---

	(the tracee exits)

	read(3, 0x7fffc13431bf, 1)              = -1 EAGAIN (Resource temporarily unavailable)
	write(5, "+", 1)                        = 1
	rt_sigreturn(0x5)                       = -1 EINTR (Interrupted system call)
	select(5, [3 4], [], [3 4], NULL)       = 1 (in [3])
	read(3, "+", 1)                         = 1
	read(3, 0x7fffc13434bf, 1)              = -1 EAGAIN (Resource temporarily unavailable)
	rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
	wait4(-1, 0x7fffc134356c, WNOHANG, NULL) = 0
	wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|__WCLONE, NULL) = 6538

	(this means release_task(), this thread doesn't exist any longer)

	rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
	rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
	wait4(-1, 0x7fffc134356c, WNOHANG, NULL) = 0
	wait4(-1, 0x7fffc134356c, WNOHANG|__WCLONE, NULL) = -1 ECHILD (No child processes)
	rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
	select(5, [3 4], [], [3 4], NULL <unfinished ...>

So, it sends nothing to gdb. When I press ^C, gdb sends vCont and:

	select(5, [3 4], [], [3 4], NULL)       = 1 (in [4])
	--- SIGIO (I/O possible) @ 0 (0) ---
	read(4, "$vCont;t:p1989.198a#6f", 8192) = 22
	write(4, "$OK#9a", 6)                   = 6
	select(5, [3 4], [], [3 4], NULL <unfinished ...>

gdbserver sends the bogus "OK".


The bug is not "fatal", if I press ^C again gdb sends T, gets the
correct "E01", and detects the fact it has exited. Still this looks
like a obvious bug.

Oleg.



More information about the Archer mailing list