This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Speedup various nptl/tst-cancel20 and nptl/tst-cancel21 tests.


On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote:
...
> I've tried to just remove the sleeps. This works fine with tst-cancel20.
> But with tst-cancelx20 (same code as tst-cancel20.c but compiled with
> -fexceptions -fasynchronous-unwind-tables) it sometimes fails with:
> called cleanups 124 => cleanup-handler in tf_body was not called.
>
> In the "good" case, SIGHUP is received while tf_body is blocking in read.
> In the "error" case, SIGHUP is received while tf_body is waiting in
> pthread_barrier_wait.
> (This occures on s390x in the same way as on e.g. x86_64; I've used gcc
> version 9.1.1)
...
> According to the .gcc_except_table, the region with our cleanup-handler
> starts - after pthread_barrier_wait - directly before the read call,
> even though pthread_barrier_wait is within pthread_cleanup_push / pop:
...
> Is this behaviour intended?
>
> The difference between those calls is "nothrow":
> extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
> vs
> extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
>       __attribute__ ((__nothrow__))  __attribute__ ((__nonnull__ (1)));

This looks like GCC is assuming it can hoist a call to a nothrow
function out of a cleanup region that lexically contains it.  That's
not true when an exception can be thrown by a signal handler.  I can
see an argument that this is an incorrect optimization when
-fasynchronous-unwind-tables is in use, but the documentation makes it
sound like -fasynchronous-unwind-tables is only intended to enable
*stack tracing* from an arbitrary instruction, not exceptions.
(See the documentation for -fnon-call-exceptions as well as for
-fasynchronous-exception-tables.)

It is not entirely clear to me what this test is supposed to test, but
I think the intent is for the SIGHUP to be delivered *only* after we
are sure tf_body is blocked on read.  If it's delivered at any other
point, the test might not be testing the right thing.  Instead of your
change to use a second pipe, therefore, I suggest use of
pthread_sigmask and ppoll: remove the barriers; block SIGHUP using
pthread_sigmask in do_test (this will inherit to all child threads);
then change tf_body to something like this:

static void __attribute__((noinline))
tf_body (void)
{
  char c;

  pthread_cleanup_push (cl, (void *) 3L);

  sigset_t unblock_SIGHUP;
  sigemptyset(&unblock_SIGHUP);

  struct pollfd pfds[1];
  pfds[0].fd = fd[0];
  pfds[0].events = POLLIN;

  // this call is expected to be interrupted by a SIGHUP
  // before anything is written to the pipe
  if (ppoll (pfds, 1, 0, &unblock_SIGHUP) != -1)
    {
      puts ("read succeeded");
      exit (1);
    }

  // drain the pipe
  read (fd[0], &c, 1);

  pthread_cleanup_pop (0);
}

ppoll _atomically_ sets the signal mask before waiting, and restores
it afterward, so, this should ensure that the SIGHUP is delivered at
exactly the right point.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]