libc hang in mutex acquisition on exit in single-threaded process
Christian Grothoff
grothoff@gnunet.org
Sat Feb 16 20:19:00 GMT 2019
Ah, thanks a lot. We should definitively call _exit(1) there.
(How on earth did you spot this so fast in our huge codebase!?!?).
On 2/16/19 9:16 PM, Florian Weimer wrote:
> * Christian Grothoff:
>
>> I'm seeing some _very_ odd behavior with processes hanging on exit (?)
>> with GNU libc 2.28-6 on Debian (amd64 threadripper). This seems to
>> happen at random (for random tests, with very low frequency!) in the
>> GNUnet (Git master) testsuite when a child process is about to exit.
>
> It looks like you call exit from a signal handler, see
> src/util/scheduler.c:
>
> /**
> * Signal handler called for signals that should cause us to shutdown.
> */
> static void
> sighandler_shutdown ()
> {
> static char c;
> int old_errno = errno; /* backup errno */
>
> if (getpid () != my_pid)
> exit (1); /* we have fork'ed since the signal handler was created,
> * ignore the signal, see https://gnunet.org/vfork discussion */
> GNUNET_DISK_file_write (GNUNET_DISK_pipe_handle
> (shutdown_pipe_handle, GNUNET_DISK_PIPE_END_WRITE),
> &c, sizeof (c));
> errno = old_errno;
> }
>
> In general, this results in undefined behavior because exit (unlike
> _exit) is not an async-signal-safe function.
>
> I suspect you either call the exit function while a fork is in progress,
> or since you register this signal handler multiple times for different
> signals:
>
> sh->shc_int = GNUNET_SIGNAL_handler_install (SIGINT,
> &sighandler_shutdown);
> sh->shc_term = GNUNET_SIGNAL_handler_install (SIGTERM,
> &sighandler_shutdown);
>
> one call to exit might interrupt another call to exit if both signals
> are delivered to the process.
>
> The deadlock you see was introduced in commit
> 27761a1042daf01987e7d79636d0c41511c6df3c ("Refactor atfork handlers"),
> first released in glibc 2.28. The fork deadlock will be gone (in the
> single-threaded case) if Debian updates to the current
> release/2.28/master branch because we backported commit
> 60f80624257ef84eacfd9b400bda1b5a5e8e7816 ("nptl: Avoid fork handler lock
> for async-signal-safe fork [BZ #24161]") there.
>
> But this will not help you. Even without the deadlock, I expect you
> still experience some random corruption during exit, but it's going to
> be difficult to spot.
>
> Thanks,
> Florian
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://sourceware.org/pipermail/libc-help/attachments/20190216/ee52de36/attachment.sig>
More information about the Libc-help
mailing list