Recent Solaris provides a way to delete all file descriptors
greater than a given integer, and provides a way to ask
posix_spawn to do so. I believe glibc should implement these extensions.
Both of the big programs I have worked on, xemacs and openjdk,
have written their own way to do this.
extern int posix_spawn_file_actions_addclosefrom_np(
extern void closefrom(int);
The functionality that has been added to glibc allowing FD_CLOSE_ON_EXEC
to be specified at time of creation of the fd does help (thank you)
but it is not sufficient for "open" programs like the JDK where
arbitrary third party native code may be concurrently opening file
descriptors while creating a subprocess.
nscd.c does this by hand in a Linux-specific way, and it is trivial to implement
in libc on Hurd. So this seems like a good addition.
No, it's a horrible idea. The assumption that a program knows all the open file
descriptors is simply invalid. The runtime (all kinds of libraries) can at any
point in time create additional file descriptors and indiscriminately calls for
trouble. The correct way is to name the individual file descriptors the program
knows about and let the creator of the other file descriptors worry about the rest.
The reason nscd can do it the way it does it is simple: all the code used is
controlled by libc. But that's a special case.
Aside from the Solaris 10 precedent, other OSes have adopted
closefrom, apparently with the same behavior.
To provide more motivation, the idea is that you are in a
large multithreaded app that is swimming in a sea of unknown
file descriptors that may or may not have FD_CLOEXEC set,
you fork(), frob some file descriptors you care about,
and then need to close the rest. You write your own buggy closefrom
or use the one provided by the system.
(In reply to comment #3)
> Here's OpenBSD:
> Here's NetBSD:
This is *anything* but an argument in favor.
> To provide more motivation, the idea is that you are in a
> large multithreaded app that is swimming in a sea of unknown
> file descriptors that may or may not have FD_CLOEXEC set,
So, fix the code. We have O_CLOEXEC support as well. There is no reason to
work around buggy code and this interface *actively* prevents innovations by
usurping file descriptors.
>> To provide more motivation, the idea is that you are in a
>> large multithreaded app that is swimming in a sea of unknown
>> file descriptors that may or may not have FD_CLOEXEC set,
>So, fix the code. We have O_CLOEXEC support as well. There is no reason to
>work around buggy code and this interface *actively* prevents innovations by
>usurping file descriptors.
For many applications, there is no way in practice to control all the
code running in the same address space. This is especially true for
"platforms" like java, where arbitrary user-created shared libraries
are loaded and executed at runtime.
The idea of permitting innovations that use file descriptors is
an interesting one, but one that in my opinion cannot succeed.
Too many people (like myself) are maintaining library code
that starts new subprocesses, and they will continue to
indiscriminately close unknown file descriptors,
with or without help from their libc.
While my library closes file descriptors unconditionally,
The python subprocess API makes closing fds an option.
"""If close_fds is true, all file descriptors except 0, 1 and 2 will be
closed before the child process is executed."""
Interestingly, python provides a related function
.. function:: closerange(fd_low, fd_high)
Close all file descriptors from *fd_low* (inclusive) to *fd_high* (exclusive),
ignoring errors. Availability: Unix, Windows. Equivalent to::
for fd in xrange(fd_low, fd_high):
which doesn't seem to support "infinity" for the second argument.
Can we reopen this please? It's ludicrous that Linux still does not have it, and so many programs need to re-implement this, poorly, see e.g.
_close_fds_by_brute_force(long start_fd, PyObject *py_fds_to_keep)
in Python 3:
A reasonable implementation for this needs kernel support. We can certainly provide a system call wrapper for this functionality once it becomes available.
To expand Florian answer, the main issue is that closefrom call is inherent racy when trying to implement on libc without kernel support.
The straightforward way to accomplish, and what python does, is to iterate over all open file descriptors (either by walking through /proc/<pid>/fds or sysconf(_SC_OPEN_MAX)) and issue a close on the file descriptor.
The problem with this approach does not guarantee on multithread environments that a file descriptor would not be created in the iteration phase. One option would be to serialize a file descriptor creation routine (such as open, openat) with close; but besides the fact it adds a large complexity and scalability issue, it does not solve the issue of file descriptors being created by bypassing the libc (by issuing the syscall instruction directly).
Both OpenBSD and Solaris implements it with syscalls, former with closefrom and later with fcntl(..., F_CLOSEFROM). Also, Solaris documents its MT-unsafe.
In fact, IMHO a closefrom syscall won't help much for posix_spawn. Current implementation for Linux uses a clone(CLONE_VM, CLONE_VFORK) which means that only the calling thread will be suspended while the helper thread issues execlpe or _exit. It means that a file action to issue a closefrom will also be susceptible to race conditions in multi-thread environments.
That's why posix_spawn_file_actions_addclosefrom_np does not make much sense unless you implement posix_spawn with fork+exec (which has its own scalability issues). The possible solutions are:
1. Ensure all file descriptors are opened with O_CLOEXEC.
2. Use a helper process to actually set up the required file descriptor close steps and issue the target binary. This is in fact what OpenJDK does (check src/java.base/unix/native/jspawnhelper/jspawnhelper.c and src/java.base/unix/native/libjava/childproc.c).
So IMHO closefrom would be just a performance optimization on Linux, where kernel will know exactly which file descriptor to close.
The cpython problem with its current implementation is bad performance on systems with large sysconf(_SC_OPEN_MAX) value (e.g. it's much bigger on FreeBSD than it's on Linux). We actually plan to do a cpython PR which would replace the loop with a call to closefrom() -- for systems where it's available.
(In reply to Dima Pasechnik from comment #9)
> The cpython problem with its current implementation is bad performance on
> systems with large sysconf(_SC_OPEN_MAX) value (e.g. it's much bigger on
> FreeBSD than it's on Linux). We actually plan to do a cpython PR which would
> replace the loop with a call to closefrom() -- for systems where it's
AFAIK unfortunately for this case, there is no much glibc can do without proper kernel support.
libbsd does provide closefrom on Linux-I have not looked at what it actually does...
(In reply to Dima Pasechnik from comment #11)
> libbsd does provide closefrom on Linux-I have not looked at what it actually
It does exactly what I described earlier: interact over /proc/self/fd or getconf .
(In reply to Adhemerval Zanella from comment #8)
> In fact, IMHO a closefrom syscall won't help much for posix_spawn. Current
> implementation for Linux uses a clone(CLONE_VM, CLONE_VFORK) which means
> that only the calling thread will be suspended while the helper thread
> issues execlpe or _exit. It means that a file action to issue a closefrom
> will also be susceptible to race conditions in multi-thread environments.
In fact, this is not true since we do not use CLONE_FILES for posix_spawn. So it is possible to implement posix_spawn_file_actions_addclosefrom_np, albeit it would require interacting over the open file descriptors (and thus might incur in performance issues).
We should implement closefrom and related functions from Solaris. A high-quality implementation (that cannot fail) will require kernel support.
I do not think there is a way to obtain the number of the highest file descriptor currently in use because RLIMIT_NOFILE is not enforced for existing descriptors. Opening /proc/self/fd may fail.
(In reply to Florian Weimer from comment #14)
> We should implement closefrom and related functions from Solaris. A
> high-quality implementation (that cannot fail) will require kernel support.
Agreed. I have a wip patch that I might send. Solaris11 also defines as non-portable interfaces:
We already implements posix_spawn_file_actions_addchdir_np. The posix_spawnattr_setsigignore_np/posix_spawnattr_getsigignore_np seems completary addition to posix_spawnattr_setsigdefault/posix_spawnattr_getsigdefault.
I am trying to figure out what posix_spawnattr_setvamask_np/posix_spawnattr_getvamask_np do, I couldn't find any information online.
> I do not think there is a way to obtain the number of the highest file
> descriptor currently in use because RLIMIT_NOFILE is not enforced for
> existing descriptors. Opening /proc/self/fd may fail.
I think an initial userland implementation need just to try opening /proc/self/fd and if it fails bail out with an error. Relying on RLIMIT_NOFILE, even as a fallback, might still leak file descriptors.
closefrom cannot fail. I'm happy to implement /proc/self/fd fallback, but only if we also have a proper system call that can be used to implement this correctly (without possible failure).
(In reply to Florian Weimer from comment #16)
> closefrom cannot fail. I'm happy to implement /proc/self/fd fallback, but
> only if we also have a proper system call that can be used to implement this
> correctly (without possible failure).
It would fail as other file actions, by not spawning the process and returning a generic error (we might also extend the posix_spawn to add a way to actually provide more information of what has failed exactly). I think it would be feasible to implement closefrom initially with/proc/self/fd and later if/when we get an actual syscall to move it as a fallback.
Right, for the posix_spawn extension, we do not need a fail-safe function.
(FWOW, I have just posted a patch for a public getdents64 system call wrapper, and I'm working on async-signal-safe directory stream parsing.)