This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFA] Fix internal error in wait_lwp (interrupted system call)


Hello,

we've had reports from our JVM/JIT development group that for them,
gdb 6.3 frequently fails with internal errors like:
linux-nat.c:1152: internal-error: wait_lwp: Assertion `pid == GET_LWP (lp->ptid)' failed.

It turned out that this happens when a SIGCHLD arrives during
execution of the waitpid call.  This causes the signal handler
to be executed, and subsequently the system call returns with
errno equal to EINTR.

Now, looking through the linux-nat.c file, it would appear that this
type of problem has been addressed at various places in different
ways.  In linux_handle_extended_wait, the waitpid call is wrapped
into an explicit do { } while (ret == -1 && errno == EINTR) loop.
In linux_test_for_tracefork, this very loop is abstracted into a
my_waitpid routine.  In child_wait and linux_nat_wait, there are
larger loops that will handle this situation as well.  Finally,
in lin_lwp_attach_lwp, SIGCHLD is actually blocked during the
execution of the waitpid call.

However, there remain some places where waitpid is called without
any such precaution, and wait_lwp is one of these.  When debugging
a process making very heavy use of threads, as the JVM, this can
lead to the error shown above.

Now, as far as I can see, there is really *no* place where GDB
actually *wants* a system call to be interrupted by the SIGCHLD
signal handler.  Thus, I'd propose to fix the problem at its 
root by simply installing the handler with the SA_RESTART flag,
causing any interrupted system call to be automatically restarted.

The patch below does this, and fixes all problems for the JVM team.
It also passes regression testing on s390-ibm-linux and s390x-ibm-linux.

OK to commit?

Bye,
Ulrich


ChangeLog:

	* linux-nat.c (_initialize_linux_nat): Install SIGCHLD handler
	using the SA_RESTART flag.

Index: gdb/linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.c,v
retrieving revision 1.27
diff -c -p -r1.27 linux-nat.c
*** gdb/linux-nat.c	6 Mar 2005 16:42:20 -0000	1.27
--- gdb/linux-nat.c	12 May 2005 18:50:42 -0000
*************** Specify any of the following keywords fo
*** 3095,3101 ****
  
    action.sa_handler = sigchld_handler;
    sigemptyset (&action.sa_mask);
!   action.sa_flags = 0;
    sigaction (SIGCHLD, &action, NULL);
  
    /* Make sure we don't block SIGCHLD during a sigsuspend.  */
--- 3095,3101 ----
  
    action.sa_handler = sigchld_handler;
    sigemptyset (&action.sa_mask);
!   action.sa_flags = SA_RESTART;
    sigaction (SIGCHLD, &action, NULL);
  
    /* Make sure we don't block SIGCHLD during a sigsuspend.  */
-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]