This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
[RFA] Fix internal error in wait_lwp (interrupted system call)
- From: Ulrich Weigand <uweigand at de dot ibm dot com>
- To: gdb-patches at sources dot redhat dot com
- Date: Thu, 12 May 2005 21:06:38 +0200 (CEST)
- Subject: [RFA] Fix internal error in wait_lwp (interrupted system call)
Hello,
we've had reports from our JVM/JIT development group that for them,
gdb 6.3 frequently fails with internal errors like:
linux-nat.c:1152: internal-error: wait_lwp: Assertion `pid == GET_LWP (lp->ptid)' failed.
It turned out that this happens when a SIGCHLD arrives during
execution of the waitpid call. This causes the signal handler
to be executed, and subsequently the system call returns with
errno equal to EINTR.
Now, looking through the linux-nat.c file, it would appear that this
type of problem has been addressed at various places in different
ways. In linux_handle_extended_wait, the waitpid call is wrapped
into an explicit do { } while (ret == -1 && errno == EINTR) loop.
In linux_test_for_tracefork, this very loop is abstracted into a
my_waitpid routine. In child_wait and linux_nat_wait, there are
larger loops that will handle this situation as well. Finally,
in lin_lwp_attach_lwp, SIGCHLD is actually blocked during the
execution of the waitpid call.
However, there remain some places where waitpid is called without
any such precaution, and wait_lwp is one of these. When debugging
a process making very heavy use of threads, as the JVM, this can
lead to the error shown above.
Now, as far as I can see, there is really *no* place where GDB
actually *wants* a system call to be interrupted by the SIGCHLD
signal handler. Thus, I'd propose to fix the problem at its
root by simply installing the handler with the SA_RESTART flag,
causing any interrupted system call to be automatically restarted.
The patch below does this, and fixes all problems for the JVM team.
It also passes regression testing on s390-ibm-linux and s390x-ibm-linux.
OK to commit?
Bye,
Ulrich
ChangeLog:
* linux-nat.c (_initialize_linux_nat): Install SIGCHLD handler
using the SA_RESTART flag.
Index: gdb/linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.c,v
retrieving revision 1.27
diff -c -p -r1.27 linux-nat.c
*** gdb/linux-nat.c 6 Mar 2005 16:42:20 -0000 1.27
--- gdb/linux-nat.c 12 May 2005 18:50:42 -0000
*************** Specify any of the following keywords fo
*** 3095,3101 ****
action.sa_handler = sigchld_handler;
sigemptyset (&action.sa_mask);
! action.sa_flags = 0;
sigaction (SIGCHLD, &action, NULL);
/* Make sure we don't block SIGCHLD during a sigsuspend. */
--- 3095,3101 ----
action.sa_handler = sigchld_handler;
sigemptyset (&action.sa_mask);
! action.sa_flags = SA_RESTART;
sigaction (SIGCHLD, &action, NULL);
/* Make sure we don't block SIGCHLD during a sigsuspend. */
--
Dr. Ulrich Weigand
Linux on zSeries Development
Ulrich.Weigand@de.ibm.com