This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Speedup various nptl/tst-cancel20 and nptl/tst-cancel21 tests.


On 9/20/19 5:17 PM, Zack Weinberg wrote:
On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote:
...
I've tried to just remove the sleeps. This works fine with tst-cancel20.
But with tst-cancelx20 (same code as tst-cancel20.c but compiled with
-fexceptions -fasynchronous-unwind-tables) it sometimes fails with:
called cleanups 124 => cleanup-handler in tf_body was not called.

In the "good" case, SIGHUP is received while tf_body is blocking in read.
In the "error" case, SIGHUP is received while tf_body is waiting in
pthread_barrier_wait.
(This occures on s390x in the same way as on e.g. x86_64; I've used gcc
version 9.1.1)
...
According to the .gcc_except_table, the region with our cleanup-handler
starts - after pthread_barrier_wait - directly before the read call,
even though pthread_barrier_wait is within pthread_cleanup_push / pop:
...
Is this behaviour intended?

The difference between those calls is "nothrow":
extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
vs
extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
       __attribute__ ((__nothrow__))  __attribute__ ((__nonnull__ (1)));

This looks like GCC is assuming it can hoist a call to a nothrow
function out of a cleanup region that lexically contains it.  That's
not true when an exception can be thrown by a signal handler.  I can
see an argument that this is an incorrect optimization when
-fasynchronous-unwind-tables is in use, but the documentation makes it
sound like -fasynchronous-unwind-tables is only intended to enable
*stack tracing* from an arbitrary instruction, not exceptions.
(See the documentation for -fnon-call-exceptions as well as for
-fasynchronous-exception-tables.)

It is not entirely clear to me what this test is supposed to test, but
I think the intent is for the SIGHUP to be delivered *only* after we
are sure tf_body is blocked on read.  If it's delivered at any other
point, the test might not be testing the right thing.  Instead of your
change to use a second pipe, therefore, I suggest use of
pthread_sigmask and ppoll: remove the barriers; block SIGHUP using
pthread_sigmask in do_test (this will inherit to all child threads);
then change tf_body to something like this:

static void __attribute__((noinline))
tf_body (void)
{
   char c;

   pthread_cleanup_push (cl, (void *) 3L);

   sigset_t unblock_SIGHUP;
   sigemptyset(&unblock_SIGHUP);

   struct pollfd pfds[1];
   pfds[0].fd = fd[0];
   pfds[0].events = POLLIN;

   // this call is expected to be interrupted by a SIGHUP
   // before anything is written to the pipe
   if (ppoll (pfds, 1, 0, &unblock_SIGHUP) != -1)
     {
       puts ("read succeeded");
       exit (1);
     }

   // drain the pipe
   read (fd[0], &c, 1);

   pthread_cleanup_pop (0);
}

ppoll _atomically_ sets the signal mask before waiting, and restores
it afterward, so, this should ensure that the SIGHUP is delivered at
exactly the right point.

zw


Sorry for the long delay.

I've give it a try. The signal handler is now called while we are in the syscall in ppoll called in tf_body.

Before, the signal handler arrived while we were in the syscall in read called in tf_body.

In both cases (read or ppoll), we are in a syscall surrounded with __libc_enable_asynccancel and __libc_disable_asynccancel.

If that's okay, I'll proceed with the patch.

Bye,
Stefan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]