Bug 24465 - thread-unwindonsignal.exp fails 1/3 of the runs with native-gdbserver
Summary: thread-unwindonsignal.exp fails 1/3 of the runs with native-gdbserver
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-18 15:10 UTC by Tom de Vries
Modified: 2019-04-19 11:10 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Patch adding gdb.threads/thread-unwindonsignal-minimize.exp (1.02 KB, patch)
2019-04-19 06:35 UTC, Tom de Vries
Details | Diff
script to reproduce outside of make check once make check has run once (282 bytes, application/x-shellscript)
2019-04-19 06:45 UTC, Tom de Vries
Details
gdbserver --debug log when hanging (2.43 KB, text/plain)
2019-04-19 06:47 UTC, Tom de Vries
Details
gdbserver --debug log when not hanging (2.39 KB, text/plain)
2019-04-19 06:54 UTC, Tom de Vries
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2019-04-18 15:10:58 UTC
With test.sh -s running thread-unwindonsignal.exp on the native-gdbserver board, I get:
...
$ ( total=10; pass=0; for n in $(seq 1 $total); do ./test.sh -s; if [ $? -eq 0 ]; then pass=$(($pass + 1)); fi; done; echo "PASS: $pass/$total" )
PASS: 7/10
...
Comment 1 Tom de Vries 2019-04-19 06:35:51 UTC
Created attachment 11744 [details]
Patch adding gdb.threads/thread-unwindonsignal-minimize.exp
Comment 2 Tom de Vries 2019-04-19 06:45:06 UTC
Created attachment 11745 [details]
script to reproduce outside of make check once make check has run once
Comment 3 Tom de Vries 2019-04-19 06:47:45 UTC
Created attachment 11746 [details]
gdbserver --debug log when hanging
Comment 4 Tom de Vries 2019-04-19 06:54:33 UTC
Created attachment 11747 [details]
gdbserver --debug log when not hanging
Comment 5 Tom de Vries 2019-04-19 07:44:52 UTC
Hmm, the executable exited:
...
 7280 pts/5    00:00:00 gdbserver
 7281 pts/5    00:00:00 gdb
 7287 pts/5    00:00:00 thread-unwindon <defunct>
...
but gdbserver is stuck in the even loop, in select:
...
(gdb) bt
#0  0x00007f66478b6ea7 in select () from /lib64/libc.so.6
#1  0x000000000042101e in wait_for_event () at /data/gdb_versions/devel/src/gdb/gdbserver/event-loop.c:468
#2  0x0000000000421239 in start_event_loop () at /data/gdb_versions/devel/src/gdb/gdbserver/event-loop.c:561
#3  0x0000000000436457 in captured_main (argc=5, argv=0x7ffcd5c4eea8)
    at /data/gdb_versions/devel/src/gdb/gdbserver/server.c:3873
#4  0x00000000004366af in main (argc=5, argv=0x7ffcd5c4eea8)
    at /data/gdb_versions/devel/src/gdb/gdbserver/server.c:3959
...

Using top -H -p 7287 shows two threads in zombie state:
...
  PID  PPID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                
 7287  7280 vries     20   0       0      0      0 Z 0,000 0,000   0:00.00 thread-unwindon                        
 7294  7280 vries     20   0       0      0      0 Z 0,000 0,000   0:00.00 thread-unwindon
...
Comment 6 Tom de Vries 2019-04-19 11:10:21 UTC
So, the program creates 4 threads which start running and then hang, and the main thread waits for all 4 threads to start running, and then exits.

Exiting calls exit_group, which exits all the threads.

In the non-hang case, waitpid (-1) first returns all the non-main threads, and finally the main thread, in a single linux_wait_1 call:
...
LWFE: waitpid(-1, ...) returned 6644, ERRNO-OK
LLW: waitpid 6644 received 0 (exited)
LWFE: waitpid(-1, ...) returned 6643, ERRNO-OK
LLW: waitpid 6643 received 0 (exited)
LWFE: waitpid(-1, ...) returned 6642, ERRNO-OK
LLW: waitpid 6642 received 0 (exited)
LWFE: waitpid(-1, ...) returned 6641, ERRNO-OK
LLW: waitpid 6641 received 0 (exited)
LWFE: waitpid(-1, ...) returned 6636, ERRNO-OK
LLW: waitpid 6636 received 0 (exited)
LWFE: waitpid(-1, ...) returned -1, No child processes
...

In the hang case, waitpid returns all but one non-main threads in a single linux_wait_1 call:
...
LWFE: waitpid(-1, ...) returned 6124, ERRNO-OK
LLW: waitpid 6124 received 0 (exited)
LWFE: waitpid(-1, ...) returned 6122, ERRNO-OK
LLW: waitpid 6122 received 0 (exited)
LWFE: waitpid(-1, ...) returned 6121, ERRNO-OK
LLW: waitpid 6121 received 0 (exited)
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
...
which then goes on to stop-resume the remaining non-main thread:
...
RSRL: resuming stopped-resumed LWP LWP 6116.6123 at 7ffff7bc689d: step=0
  continue from pc 0x7ffff7bc689d
Resuming lwp 6123 (continue, signal 0, stop not expected)
...
and delete the zombie main thread:
...
leader_pid=6116, leader_lp!=NULL=1, num_lwps=2, zombie=1
CZL: Thread group leader 6116 zombie (it exited, or another thread execd).
deleting 6116
...
after which we get stuck here:
...
LLW: exit (no unwaited-for LWP)
linux_wait_1 ret = null_ptid, TARGET_WAITKIND_NO_RESUMED
<<<< exiting ptid_t linux_wait_1(ptid_t, target_waitstatus*, int)
Writing resume reply for <null thread>:13
sigchld_handler
handling possible serial event
...