With test.sh -s running thread-unwindonsignal.exp on the native-gdbserver board, I get: ... $ ( total=10; pass=0; for n in $(seq 1 $total); do ./test.sh -s; if [ $? -eq 0 ]; then pass=$(($pass + 1)); fi; done; echo "PASS: $pass/$total" ) PASS: 7/10 ...
Created attachment 11744 [details] Patch adding gdb.threads/thread-unwindonsignal-minimize.exp
Created attachment 11745 [details] script to reproduce outside of make check once make check has run once
Created attachment 11746 [details] gdbserver --debug log when hanging
Created attachment 11747 [details] gdbserver --debug log when not hanging
Hmm, the executable exited: ... 7280 pts/5 00:00:00 gdbserver 7281 pts/5 00:00:00 gdb 7287 pts/5 00:00:00 thread-unwindon <defunct> ... but gdbserver is stuck in the even loop, in select: ... (gdb) bt #0 0x00007f66478b6ea7 in select () from /lib64/libc.so.6 #1 0x000000000042101e in wait_for_event () at /data/gdb_versions/devel/src/gdb/gdbserver/event-loop.c:468 #2 0x0000000000421239 in start_event_loop () at /data/gdb_versions/devel/src/gdb/gdbserver/event-loop.c:561 #3 0x0000000000436457 in captured_main (argc=5, argv=0x7ffcd5c4eea8) at /data/gdb_versions/devel/src/gdb/gdbserver/server.c:3873 #4 0x00000000004366af in main (argc=5, argv=0x7ffcd5c4eea8) at /data/gdb_versions/devel/src/gdb/gdbserver/server.c:3959 ... Using top -H -p 7287 shows two threads in zombie state: ... PID PPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7287 7280 vries 20 0 0 0 0 Z 0,000 0,000 0:00.00 thread-unwindon 7294 7280 vries 20 0 0 0 0 Z 0,000 0,000 0:00.00 thread-unwindon ...
So, the program creates 4 threads which start running and then hang, and the main thread waits for all 4 threads to start running, and then exits. Exiting calls exit_group, which exits all the threads. In the non-hang case, waitpid (-1) first returns all the non-main threads, and finally the main thread, in a single linux_wait_1 call: ... LWFE: waitpid(-1, ...) returned 6644, ERRNO-OK LLW: waitpid 6644 received 0 (exited) LWFE: waitpid(-1, ...) returned 6643, ERRNO-OK LLW: waitpid 6643 received 0 (exited) LWFE: waitpid(-1, ...) returned 6642, ERRNO-OK LLW: waitpid 6642 received 0 (exited) LWFE: waitpid(-1, ...) returned 6641, ERRNO-OK LLW: waitpid 6641 received 0 (exited) LWFE: waitpid(-1, ...) returned 6636, ERRNO-OK LLW: waitpid 6636 received 0 (exited) LWFE: waitpid(-1, ...) returned -1, No child processes ... In the hang case, waitpid returns all but one non-main threads in a single linux_wait_1 call: ... LWFE: waitpid(-1, ...) returned 6124, ERRNO-OK LLW: waitpid 6124 received 0 (exited) LWFE: waitpid(-1, ...) returned 6122, ERRNO-OK LLW: waitpid 6122 received 0 (exited) LWFE: waitpid(-1, ...) returned 6121, ERRNO-OK LLW: waitpid 6121 received 0 (exited) LWFE: waitpid(-1, ...) returned 0, ERRNO-OK ... which then goes on to stop-resume the remaining non-main thread: ... RSRL: resuming stopped-resumed LWP LWP 6116.6123 at 7ffff7bc689d: step=0 continue from pc 0x7ffff7bc689d Resuming lwp 6123 (continue, signal 0, stop not expected) ... and delete the zombie main thread: ... leader_pid=6116, leader_lp!=NULL=1, num_lwps=2, zombie=1 CZL: Thread group leader 6116 zombie (it exited, or another thread execd). deleting 6116 ... after which we get stuck here: ... LLW: exit (no unwaited-for LWP) linux_wait_1 ret = null_ptid, TARGET_WAITKIND_NO_RESUMED <<<< exiting ptid_t linux_wait_1(ptid_t, target_waitstatus*, int) Writing resume reply for <null thread>:13 sigchld_handler handling possible serial event ...