This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Bug testsuite/20600] New: parallet testsuite hang in [nd_]syscall.exp
- From: "dsmith at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: systemtap at sourceware dot org
- Date: Mon, 12 Sep 2016 14:57:56 +0000
- Subject: [Bug testsuite/20600] New: parallet testsuite hang in [nd_]syscall.exp
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=20600
Bug ID: 20600
Summary: parallet testsuite hang in [nd_]syscall.exp
Product: systemtap
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: testsuite
Assignee: systemtap at sourceware dot org
Reporter: dsmith at redhat dot com
Target Milestone: ---
When I run the testsuite in parallel mode with at lest 3 concurrent jobs, I'm
getting a testsuite "hang". The testsuite will run to completion, except for
either the syscall.exp or nd_syscall.exp test case. That test case will hang in
one of the tests, typically in the execve or getrlimit subtest. The stapio
process for that test is in the defunct state:
====
# ps ax | fgrep stap
14534 pts/0 S+ 0:00 grep -F --color=auto stap
24933 ? Zl 0:10 [stapio] <defunct>
# tail testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.log
Executing on host: gcc /root/src/testsuite/systemtap.syscall/getpriority.c
-lrt -lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptestgbSi0f/getpriority
(timeout = 300)
spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getpriority.c
-lrt -lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptestgbSi0f/getpriority
PASS: 64-bit getpriority nd_syscall
Testing 64-bit getrandom nd_syscall
Executing on host: gcc /root/src/testsuite/systemtap.syscall/getrandom.c -lrt
-lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptest9QHupy/getrandom
(timeout = 300)
spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getrandom.c -lrt
-lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptest9QHupy/getrandom
PASS: 64-bit getrandom nd_syscall
Testing 64-bit getrlimit nd_syscall
Executing on host: gcc /root/src/testsuite/systemtap.syscall/getrlimit.c -lrt
-lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptest4a2xe9/getrlimit
(timeout = 300)
spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getrlimit.c -lrt
-lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptest4a2xe9/getrlimit
# ll testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.log
-rwxr-xr-x. 1 root root 21289 Sep 10 01:19
testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.lo
====
So, for over 9 hours that test has just sat there. If I do a 'kill -9' on that
defunct stapio process, the [nd_]syscall.exp test will finish (and the full
testsuite will also finish).
Note that on the same system the full testsuite (and the [nd_]syscall.exp test
cases) will run to completion when run in non-parallel mode.
This "hang" is fairly repeatable, happening at least 50% of the time.
I'd guess that one of the other tests is interfering with the [nd_]syscall.exp
test case somehow, but I can't quite think of how.
--
You are receiving this mail because:
You are the assignee for the bug.