This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PING] Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt

From: Don Breazeal <donb at codesourcery dot com>
To: <gdb-patches at sourceware dot org>, <palves at redhat dot com>
Date: Thu, 3 Mar 2016 10:19:46 -0800
Subject: [PING] Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
Authentication-results: sourceware.org; auth=none
References: <1455150484-12600-1-git-send-email-donb at codesourcery dot com> <56CF39CF dot 5080007 at codesourcery dot com>

Ping.
I checked, the patch still applies cleanly to mainline.
Thanks
--Don

On 2/25/2016 9:28 AM, Don Breazeal wrote:
> Ping
> Thanks,
> --Don
> 
> On 2/10/2016 4:28 PM, Don Breazeal wrote:
>> Hi Pedro,
>>
>> On 2/1/2016 11:38 AM, Pedro Alves wrote:
>>> On 01/28/2016 12:48 AM, Don Breazeal wrote:
>>>> This patch addresses "fork:Interrupted system call" (or wait:) failures
>>>> in gdb.threads/forking-threads-plus-breakpoint.exp.
>>>>
>>>> The test program spawns ten threads, each of which do ten fork/waitpid
>>>> sequences.  The cause of the problem was that when one of the fork
>>>> children exited before the corresponding fork parent could initiate its
>>>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
>>>> or waitpid in another thread.
>>
>> In fact, I think my diagnosis here was incorrect, or at least incorrect
>> in some cases.  I believe at least some of the interruptions are caused
>> by SIGSTOP, sent by GDB when stopping all the threads.  More below.
>>
>>>>
>>>> The fix was to wrap the system calls in a loop to retry the call if
>>>> it was interrupted, like:
>>>>
>>>> do
>>>>   {
>>>>     pid = fork ();
>>>>   }
>>>> while (pid == -1 && errno == EINTR);
>>>>
>>>> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
>>>>
>>>> Tested on Nios II Linux target with x86 Linux host.
>>>
>>> I'd prefer to avoid this if possible.  These loops potentially hide
>>> bugs like ERESTARTSYS escaping out of a syscall and mishandling of
>>> signals.  See bc9540e842eb5639ca59cb133adef211d252843c for example:
>>>    https://sourceware.org/ml/gdb-patches/2015-02/msg00654.html
>>>
>>> How about setting SIGCHLD to SIG_IGN, or making SIGCHLD be SA_RESTART?
>>
>> I spent a couple of days trying to find an alternate solution, but
>> couldn't find one that was reliable.  Here is a snapshot of what I tried:
>>
>> 1) SIG_IGN: results in an ECHILD from waitpid.  The man page for waitpid
>>    says "This can happen for one's own child if the action for SIGCHLD is
>>    set to SIG_IGN."
>>
>> 2) SA_RESTART: While waitpid is listed as a system call that can be
>>    restarted by SA_RESTART, fork is not.  Even if I leave the "EINTR loop"
>>    in place for fork, using SA_RESTART I still see an interrupted system
>>    call for waitpid.  Possibly because the problem is SIGSTOP and not
>>    SIGCHLD.
>>
>> 3) pthread_sigblock: With this set for SIGCHLD in all the threads, I
>>    still saw an interrupted system call.  You can't block SIGSTOP.
>>
>> 4) pthread_sigblock with sigwait: using pthread_sigblock on all the
>>    blockable signals with a signal thread that called sigwait for all
>>    the signals in a loop, the signal thread would see a bunch of SIGCHLDs,
>>    but there would eventually be an interrupted system call.
>>
>> 5) bsd_signal: this function is supposed to automatically restart blocking
>>    system calls.  fork is not a blocking system call, but it doesn't help
>>    for waitpid either.
>>
>> I found this in the ptrace(2) man page: "Note that a suppressed signal
>> still causes system calls to return prematurely.  In this case, system
>> calls will be restarted: the tracer will observe the tracee to reexecute
>> the interrupted system call (or restart_syscall(2) system call for a few
>> system calls which use a different mechanism for restarting) if the tracer
>> uses PTRACE_SYSCALL.  Even system calls (such as poll(2)) which are not
>> restartable after signal are restarted after signal is suppressed; however,
>> kernel bugs exist which cause some system calls to fail with EINTR even
>> though no observable signal is injected to the tracee."
>>
>> The GDB manual mentions something similar about interrupted system calls.
>>
>> So, the bottom line is that I haven't changed the fix for the interrupted
>> system calls, because I can't find anything that works as well as the
>> original fix.  Perhaps this test puts enough stress on the kernel that the
>> kernel bugs mentioned above are exposed.
>>
>> One change I did make from the previous version was to increase the
>> timeout to 90 seconds, which was necessary to get more reliable results
>> on the Nios II target.
>>
>> Let me know what you think.
>> Thanks!
>> --Don
>>
>> ---
>> This patch addresses "fork:Interrupted system call" (or wait:) failures
>> in gdb.threads/forking-threads-plus-breakpoint.exp.
>>
>> The test program spawns ten threads, each of which do ten fork/waitpid
>> sequences.  The cause of the problem was that when one of the fork
>> children exited before the corresponding fork parent could initiate its
>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
>> or waitpid in another thread.
>>
>> The fix was to wrap the system calls in a loop to retry the call if
>> it was interrupted, like:
>>
>> do
>>   {
>>     pid = fork ();
>>   }
>> while (pid == -1 && errno == EINTR);
>>
>> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
>> I tried a number of alternative fixes using SIG_IGN, SA_RESTART,
>> pthread_sigblock, and bsd_signal, but none of these worked as well.
>>
>> Tested on Nios II Linux target with x86 Linux host.
>>
>> gdb/testsuite/ChangeLog:
>> 2016-02-10  Don Breazeal  <donb@codesourcery.com>
>>
>> 	* gdb.threads/forking-threads-plus-breakpoint.c (thread_forks):
>> 	Retry fork and waitpid on interrupted system call errors.
>> 	* gdb.threads/forking-threads-plus-breakpoint.exp: (do_test):
>> 	Increase timeout to 90.
>>
>> ---
>>  .../gdb.threads/forking-threads-plus-breakpoint.c          | 14 ++++++++++++--
>>  .../gdb.threads/forking-threads-plus-breakpoint.exp        |  3 +++
>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>> index fc64d93..c169e18 100644
>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>> @@ -22,6 +22,7 @@
>>  #include <sys/types.h>
>>  #include <sys/wait.h>
>>  #include <stdlib.h>
>> +#include <errno.h>
>>  
>>  /* Number of threads.  Each thread continuously spawns a fork and wait
>>     for it.  If we have another thread continuously start a step over,
>> @@ -49,14 +50,23 @@ thread_forks (void *arg)
>>      {
>>        pid_t pid;
>>  
>> -      pid = fork ();
>> +      do
>> +	{
>> +	  pid = fork ();
>> +	}
>> +      while (pid == -1 && errno == EINTR);
>>  
>>        if (pid > 0)
>>  	{
>>  	  int status;
>>  
>>  	  /* Parent.  */
>> -	  pid = waitpid (pid, &status, 0);
>> +	  do
>> +	    {
>> +	      pid = waitpid (pid, &status, 0);
>> +	    }
>> +	  while (pid == -1 && errno == EINTR);
>> +
>>  	  if (pid == -1)
>>  	    {
>>  	      perror ("wait");
>> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>> index ff3ca9a..6889c2b 100644
>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>> @@ -73,6 +73,9 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>>      global linenum
>>      global is_remote_target
>>  
>> +    global timeout
>> +    set timeout 90
>> +
>>      set saved_gdbflags $GDBFLAGS
>>      set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
>>      clean_restart $binfile
>>
>

Follow-Ups:
- Re: [PING] Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  - From: Don Breazeal

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]