This is the mail archive of the libc-hacker@sourceware.org mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Timing window in NPTL fork.c causes hangs.


On Mon, Feb 19, 2007 at 02:38:46PM -0600, Steven Munroe wrote:
> However the code in sysdeps/unix/sysv/linux/fork.c is exposed to signals
> interupting its operation. If the thread calling fork is interrupted by
> a signal, after it has processed atfork prepare handlers but before it
> has processed the atfork parent handles, and the signal handler blocks
> for any reason (sigsuspend or attempts IO) the process can hang. For
> example any other thread attempting to call malloc will wait for the
> atfork handlers to release the "list_lock" but the thread processing the
> fork in now blocked and can not proceed. If the forking thread is
> dependent on one of the other threads to wake it (via signal) that
> thread may block on the list_lock first and now we have deadlock.
> 
> So is it OK for NPTLs fork implementation to not be atomic relative to
> signals?

If you have an async signal handler that can block the app indefinitely,
then that's to be expected.  How is that different from the same signal
handler e.g. interrupting in the middle of malloc or stdio?  Some malloc or
stdio lock can be held at that point, so if your async signal handler
waits till some other thread wakes it up and those other threads need
malloc or stdio, you hang exactly the same way.

> >From the POSIX spec we see statements like:
> 
> 13089 ... Since the fork ( ) call can be considered as atomic
> 13090 from the application???s perspective, the set would be initialized
> as empty and such signals would
> 13091 have arrived after the fork ( ); see also <signal.h>.

This IMHO talks just about the issue whether a signal sent to the process
is sent just to parent or also to the child.  fork() as a whole can't
be considered atomic, you can e.g. block indefinitely in one of the atfork
handlers, using async signal safe function.

> So what should we do about this? One possible solution is to use the
> signal mask and disable async signals for the duration of __libc_fork().
> Or at least from just before atfork prepare processing to after atfork
> parent/child processing.

So you just break different apps (in addition to making fork() considerably
slower)?  Apps have full right to expect the signal masks weren't messed
up by the library, can very well e.g. sigsuspend in an atfork handler
and expect to be woken up.  If you block all signals before running
the atfork handlers, that would never happen.  Not to mention that the
atfork handlers can sigprocmask.

> We have experimented with this in our application (masking signals
> before the fork call and restoring them after in the parent and child).
> And this does seem to elliminate the hang.

Then just do that in your application if you need it.

> But should we change the libc NPTL fork implement to use signal masks to
> give the application the appeirence that fork is atomic?

No.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]