libc hang in mutex acquisition on exit in single-threaded process

Christian Grothoff grothoff@gnunet.org
Sat Feb 16 20:19:00 GMT 2019


Ah, thanks a lot. We should definitively call _exit(1) there.
(How on earth did you spot this so fast in our huge codebase!?!?).

On 2/16/19 9:16 PM, Florian Weimer wrote:
> * Christian Grothoff:
> 
>> I'm seeing some _very_ odd behavior with processes hanging on exit (?)
>> with GNU libc 2.28-6 on Debian (amd64 threadripper).  This seems to
>> happen at random (for random tests, with very low frequency!) in the
>> GNUnet (Git master) testsuite when a child process is about to exit.
> 
> It looks like you call exit from a signal handler, see
> src/util/scheduler.c:
> 
> /**
>  * Signal handler called for signals that should cause us to shutdown.
>  */
> static void
> sighandler_shutdown ()
> {
>   static char c;
>   int old_errno = errno;        /* backup errno */
> 
>   if (getpid () != my_pid)
>     exit (1);                   /* we have fork'ed since the signal handler was created,
>                                  * ignore the signal, see https://gnunet.org/vfork discussion */
>   GNUNET_DISK_file_write (GNUNET_DISK_pipe_handle
>                           (shutdown_pipe_handle, GNUNET_DISK_PIPE_END_WRITE),
>                           &c, sizeof (c));
>   errno = old_errno;
> }
> 
> In general, this results in undefined behavior because exit (unlike
> _exit) is not an async-signal-safe function.
> 
> I suspect you either call the exit function while a fork is in progress,
> or since you register this signal handler multiple times for different
> signals:
> 
>   sh->shc_int = GNUNET_SIGNAL_handler_install (SIGINT,
>                                                &sighandler_shutdown);
>   sh->shc_term = GNUNET_SIGNAL_handler_install (SIGTERM,
>                                                 &sighandler_shutdown);
> 
> one call to exit might interrupt another call to exit if both signals
> are delivered to the process.
> 
> The deadlock you see was introduced in commit
> 27761a1042daf01987e7d79636d0c41511c6df3c ("Refactor atfork handlers"),
> first released in glibc 2.28.  The fork deadlock will be gone (in the
> single-threaded case) if Debian updates to the current
> release/2.28/master branch because we backported commit
> 60f80624257ef84eacfd9b400bda1b5a5e8e7816 ("nptl: Avoid fork handler lock
> for async-signal-safe fork [BZ #24161]") there.
> 
> But this will not help you.  Even without the deadlock, I expect you
> still experience some random corruption during exit, but it's going to
> be difficult to spot.
> 
> Thanks,
> Florian
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://sourceware.org/pipermail/libc-help/attachments/20190216/ee52de36/attachment.sig>


More information about the Libc-help mailing list