Bug 378

Summary: posix_spawn implementation, use vfork/execve rather than fork/execve for NPTL Linux.
Product: glibc Reporter: Dennis Bordin <dennis>
Component: nptlAssignee: Ulrich Drepper <drepper.fsp>
Status: RESOLVED FIXED    
Severity: enhancement CC: glibc-bugs
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Dennis Bordin 2004-09-10 00:48:01 UTC
The pairing of fork/execve can still be extremely slow even
on modern Unix variants that use copy-on-write semantics.
If one manages to create a large process with an extremely
large number of memory mapped regions then fork/execve
performance degrades badly (many tens-of-times slower).
Copying the page table entries from parent to child can 
be an extremely expensive operation. That performance
penalty does not occur with vfork/execve.

The current generic glibc version of posix_spawn is based
around fork/execve. Hence, it will suffer from the performance
problem listed above.

Solaris 10 now comes with a posix_spawn implementation. Using
truss and speaking to Sun engineers it has been confirmed
that the new Solaris 10 posix_spawn uses vfork/execve.
Also the implementation they have carried out means their
new posix_spawn is thread safe (where their vfork is not).

When I raised this question at Red Hat bugzilla a friendly
Red Hat engineer responded with the following.

> It is impossible to use vfork in the linuxthreads libc, because 
> sigaction modifies global state, but most probably it should be 
> doable for NPTL libc, provided a few calls (e.g. *gid) are changed 
> into inline syscalls and adding code to run atfork registered 
> handlers before/after vfork in spawni.c.

So this enhancement request is for the glibc engineers to
investigate whether NPTL glibc posix_spawn can use vfork/execve
rather than fork/execve. If there are no obstacles then that
enhancement should be pushed through.

This will result in huge performance benefits for application
developers like us. The other benefit is that Solaris10 
and NPTL glibc posix_spawn would have the same performance
characteristics. Again that will be extremely.

Dennis.
Comment 1 Ulrich Drepper 2004-09-12 05:56:26 UTC
It is not possible to just use vfork.  The problem are the atfork handlers which
can be registered.  In the child process they can modify the address space. 
These changes then would be visible in the parent process.

The best one can do is to let the user select this behavior.  A new spawn
attribute (along with setter/getter functions) can be created.  If the flag is
set in the attribute vfork is used instead of fork.  Then it is the programmers
fault if something goes wrong because of the atfork handlers.  In fact, we
should just not run the atfork handlers if vfork is used.
Comment 2 Ulrich Drepper 2004-09-12 18:06:39 UTC
I've implemented a POSIX_SPAWN_USEVFORK flag.
Comment 3 Roland McGrath 2004-09-12 22:54:50 UTC
I don't think the option should be about the implementation detail.
If ought to be POSIX_SPAWN_NO_ATFORK, meaning that atfork handlers do not get run.
In practice, that means calling vfork instead of fork.  But the name and meaning
of the switch should be about the application experience, not the system's
implementation.
Comment 4 Ulrich Drepper 2004-09-13 01:22:05 UTC
> If ought to be POSIX_SPAWN_NO_ATFORK,

This is problematic.  The atfork handlers might not be the only reason why vfork
cannot be used.  It is for the nptl implementation, it isn't for the LT code.

As is, vfork usage can be forced by the programmer if s/he knows it is OK.  If
we'd use POSIX_SPAWN_NO_ATFORK we would also need to add more flags for other
details which are problematic and the programmer would have to select them all
to get vfork used.  This is too much specialized knowledge required.

I prefer the "use vfork and do whatever necessary" flag.
Comment 5 Roland McGrath 2004-09-13 03:10:39 UTC
What you are saying is that behavior of the posix_spawn interface is not
intended to be well-specified.  I think that is a lousy choice.

If there are other differences than atfork handlers, they should be clearly
specified and explicit in the description of the flag.  I don't care what
linuxthreads does, it might as well just not suppose the flag at all.
Comment 6 Dennis Bordin 2004-09-13 04:18:21 UTC
Interesting debate going on.

It should be noted that the Solaris 10 posix_spawn (which uses
vfork only) most definately does not run atfork handlers. It
is listed in their posix_spawn man page, quote

   The  fork  handlers  are  not  run  when  posix_spawn() or
   posix_spawnp() is called.

So a glibc NPTL posix_spawn vfork path should not run atfork handlers.
From what I have read also it doesn't make sense, pthread_atfork
is tied to the fork function call, vfork should not care about
atfork handlers.

Any application developer that relies on the atfork handler 
posix_spawn side effect is going to be in some trouble when they 
use posix_spawn on Solaris 10. Personally speaking the side-effect
free vfork based posix_spawn seems more inline with the ideals
behind posix_spawn, that being to create a process from thin
air in a fast manner. Who knows one day it might make its way
closer to the kernel as a true system call (or a variant of clone).

I believe GNU Hurd has its own implementation of posix_spawn (not
based on fork/exec). Maybe the fact that atfork is handled in 
the current generic glibc posix_spawn may cause future problems?

As for the flag name, how about 'POSIX_SPAWN_NPTL_USEVFORK' or
'POSIX_SPAWN_NPTL_NO_ATFORK'? 

We are still keen on having a vfork posix_spawn fast path (with
no atfork handling similar to Solaris 10).

Dennis.
Comment 7 Jakub Jelinek 2004-09-13 06:35:09 UTC
But similarly any developer that relies on the atfork handlers not being called
is creating a non-portable program.
http://www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html
^[THR] [Option Start] It is implementation-defined whether the fork handlers are
run when posix_spawn() or posix_spawnp() is called. [Option End]
Comment 8 Roland McGrath 2004-09-13 07:46:43 UTC
THe standard explicitly addresses the issue of pthread_atfork handlers being run
by posix_spawn, and specifies it to be implementation-defined.  What that term
means is that the implementation is obliged to document what its behavior is in
that regard.  Since glibc on GNU/Linux has heretofore always called
pthread_atfork handlers (because the code just calls `fork'), we have deemed
that "implementation-defined" for glibc on GNU/Linux here means that
pthread_atfork handlers do in fact get run.  We may have failed to clearly
document this behavior, but it would be a disservice to users to have it change now.
Comment 9 Dennis Bordin 2004-09-13 07:57:12 UTC
Well spotted.

I guess application developers will need to know how 
each posix_spawn implementation deals with atfork handlers.
The Solaris 10 man page indicates clearly what they do. 

If a change does occur in the NPTL glibc posix_spawn (fingers
crossed), again the man will need to state clearly what happens
in each situation (fork/exec path and possible vfork/exec path).

It does appear that a vfork-path based posix_spawn which 
does not run atfork handlers will be safe, fast and 
standards complient.
Comment 10 Dennis Bordin 2004-09-13 08:06:22 UTC
> this behavior, but it would be a disservice to users to have it change now.

Absolutely. The current generic fork/exec posix_spawn should remain
as is.

However, any new NPTL variant introduced (e.g POSIX_SPAWN_USEVFORK/
POSIX_SPAWN_NO_ATFORK) can define how it deals with atfork handlers.
Comment 11 Chris Quenelle 2005-04-06 22:03:25 UTC
If you want to support applications that depend on specific
behavior, then you could add two new flags:

mumble_REQUIRE_ATFORK and mumble_REQUIRE_NOATFORK

Then ask consumers (over time) to add these flags when
it's required.  That effectively converts a dependance
on an implementation defined behavior into a dependance
on a platform-specific extension.  It moves the dependence
from being implicit to being explicit.

Eventually you could assume that any application that doesn't
say what it wants should be resilient to any variety of
implementation defined behavior.

On systems that can't support one of those styles, it also
allows the posix_spawn call to return an error when the user
explicitly asks for something which can't be supported.