[PATCHv2] gdb: Don't drop SIGSTOP during stop_all_threads

Fri Jun 15 12:12:00 GMT 2018

On 06/14/2018 08:46 PM, Andrew Burgess wrote:
> This is an update based on Pedro's feedback:
> 

Thank you.

> So, in this version the preload library is a little more complex.  I
> now try to emulate the kernels sending of SIGCHLD after a short period
> of time in order to make it appear like the child thread status really
> has just arrived from the kernel later than it really did.  This works
> better than I had hoped, with this change in place not only does the
> test now pass on gdbserver, but with the preload library I can (in a
> very simple test) use GDB as normal.

Awesome.

I found a few more small nits, but this is OK with those fixed.

> +#
> +# GDB would then trigger a call to stop_all_threads which would
> +# continue to wait for all of the outstanding threads to stop, when
> +# the outstanding stop events finally arrived GDB would then
> +# (incorrectly) discard the stop event, resume the thread, and
> +# continue to wait for the thread to stop.... which it now never
> +# would.
> +#
> +# In order to try and expose this issue reliably, this test preloads a
> +# library that intercepts waitpid calls.  All waitpid calls targeting
> +# pid -1 with the WNOHANG flag are rate limited so that only 1 per
> +# second can complete.  Additional calls are forced to return 0
> +# indicating no event waiting.  This is enough to trigger the bug
> +# during the attach phase, however, it breaks the rest of GDB's
> +# behaviour, so we can't do more than attach with this library in
> +# place.

Does this last sentence ("breaks the rest of GDB") need updating?

> +   If the kernel is slow in either delivering the signal, or making the
> +   result available to the waitpid call then GDB will enter a sigsuspend
> +   call in order to wait for the inferior threads to change state, this is
> +   signalled to GDB with a SIGCHLD.
> +
> +   A bug in GDB meant that in some cases we would deadlock during this
> +   process.  This was rarely seen as the kernel is usually quick at
> +   delivering signals and making the results available to waitpid, so quick
> +   that GDB would gather the statuses from all inferior threads in the
> +   original pass.
> +
> +   The idea in this library is to rate limit calls to waitpid (where pid is
> +   -1 and the WNOHANG option is set) so that only 1 per second can return
> +   an answer.  Any additional calls will report that no threads are
> +   currently ready.  This should match the behaviour we see on a slow
> +   kernel.
> +
> +   However, given that usually when using this library, the kernel does
> +   have the waitpid result ready this means that the kernel will never send
> +   GDB a SIGCHLD.  This means that when GDB enters sigsuspend it will block
> +   forever.  Alternatively, if GDB enters it's polling loop the lack of

s/is's/its/

> +   SIGCHLD means that we will never see an event on the child threads.  To
> +   resolve these problems the library intercepts calls to sigsuspend and
> +   forces the call to exit if there is a pending waitpid result.  Also,
> +   when we know that there's a waitpid result that we've ignored, we create
> +   a new thread which, after a short delay, will send GDB a SIGCHLD.  */
> +

> +
> +/* Cache the result of a waitpid call that has not been reported back to
> +   GDB yet.  We only ever cache a single result.  Once we have a result
> +   cached then later calls to waitpid with the WNOHANG option will return a
> +   result of 0.  */
> +
> +struct
> +{
> +  /* Flag to indicate when we have a result cached.  */
> +  int cached_p;
> +
> +  /* The cached result fields from a waitpid call.  */
> +  pid_t pid;
> +  int wstatus;
> +} cached_wait_status;

"static" to avoid interposition issues.

> +
> +/* Lock to hold when modifying SIGNAL_THREAD_ACTIVE_P.  */
> +
> +pthread_mutex_t thread_creation_lock_obj = PTHREAD_MUTEX_INITIALIZER;

Ditto.

> +#define thread_creation_lock (&thread_creation_lock_obj)
> +
> +/* This flag is only modified while holding the THREAD_CREATION_LOCK mutex.
> +   When this flag is true then there is a signal thread alive that will be
> +   sending a SIGCHLD at some point in the future.  */
> +
> +int signal_thread_active_p;

Ditto.

> +
> +/* When we last allowed a waitpid to complete.  */
> +
> +static struct timeval last_waitpid_time = { 0, 0 };
> +
> +/* The number of seconds that must elapse between calls to waitpid where
> +   the pid is -1 and the WNOHANG option is set.  If calls occur faster than
> +   this then we force a result of 0 to be returned from waitpid.  */
> +
> +#define WAITPID_MIN_TIME (1)
> +
> +/* Return true (non-zero) if we should skip this call to waitpid, or false
> +   (zero) if this waitpid call should be handled with a call to the "real"
> +   waitpid library.  Allows 1 waitpid call per second.  */

s/waitpid library/waitpid function/  ?

> +
> +static int
> +should_skip_waitpid (void)
> +{
> +  struct timeval *tv = &last_waitpid_time;
> +  if (tv->tv_sec == 0)
> +    {
> +      if (gettimeofday (tv, NULL) < 0)
> +	error ("error: gettimeofday failed\n");
> +      return 0; /* Don't skip.  */
> +    }
> +  else
> +    {
> +      struct timeval new_tv;
> +
> +      if (gettimeofday (&new_tv, NULL) < 0)
> +	error ("error: gettimeofday failed\n");
> +
> +      if ((new_tv.tv_sec - tv->tv_sec) < WAITPID_MIN_TIME)
> +	return 1; /* Skip.  */
> +
> +      *tv = new_tv;
> +    }
> +
> +  /* Don't skip.  */
> +  return 0;
> +}
> +
> +/* Perform a real waitpid call.  */
> +
> +static pid_t
> +real_waitpid (pid_t pid, int *wstatus, int options)
> +{
> +  typedef pid_t (*fptr_t) (pid_t, int *, int);
> +  static fptr_t real_func = NULL;
> +
> +  if (real_func == NULL)
> +    {
> +      real_func = dlsym (RTLD_NEXT, "waitpid");
> +      if (real_func == NULL)
> +	error ("error: failed to find real waitpid\n");
> +    }
> +
> +  return (*real_func) (pid, wstatus, options);
> +}
> +
> +/* Thread worked created when we cached a waitpid result.  Delays for

s/worked/worker/
s/cached/cache/

> +   a short period of time and then sends SIGCHLD to the GDB process.  This
> +   should trigger GDB to call waitpid again, at which point we will make
> +   the cached waitpid result available.  */
> +
> +static void*
> +send_sigchld_thread (void *arg)
> +{
> +  /* Delay one second longer than WAITPID_MIN_TIME so that there can be no
> +     chance that a call to SHOULD_SKIP_WAITPID will return true one the

s/one/once/

Thanks,
Pedro Alves