Recent Solaris provides a way to delete all file descriptors greater than a given integer, and provides a way to ask posix_spawn to do so. I believe glibc should implement these extensions. Both of the big programs I have worked on, xemacs and openjdk, have written their own way to do this. extern int posix_spawn_file_actions_addclosefrom_np( posix_spawn_file_actions_t *file_actions, int lowfiledes); extern void closefrom(int); http://docs.sun.com/app/docs/doc/819-2243/posix-spawn-file-actions-addclosefrom-np-3c?l=ja&a=view http://docs.sun.com/app/docs/doc/819-2243/closefrom-3c?l=ja&a=view The functionality that has been added to glibc allowing FD_CLOSE_ON_EXEC to be specified at time of creation of the fd does help (thank you) but it is not sufficient for "open" programs like the JDK where arbitrary third party native code may be concurrently opening file descriptors while creating a subprocess.
nscd.c does this by hand in a Linux-specific way, and it is trivial to implement in libc on Hurd. So this seems like a good addition.
No, it's a horrible idea. The assumption that a program knows all the open file descriptors is simply invalid. The runtime (all kinds of libraries) can at any point in time create additional file descriptors and indiscriminately calls for trouble. The correct way is to name the individual file descriptors the program knows about and let the creator of the other file descriptors worry about the rest. The reason nscd can do it the way it does it is simple: all the code used is controlled by libc. But that's a special case.
Aside from the Solaris 10 precedent, other OSes have adopted closefrom, apparently with the same behavior. Here's OpenBSD: http://www.openbsd.org/cgi-bin/man.cgi?query=closefrom&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html Here's NetBSD: http://netbsd.gw.com/cgi-bin/man-cgi?closefrom++NetBSD-current To provide more motivation, the idea is that you are in a large multithreaded app that is swimming in a sea of unknown file descriptors that may or may not have FD_CLOEXEC set, you fork(), frob some file descriptors you care about, and then need to close the rest. You write your own buggy closefrom or use the one provided by the system.
(In reply to comment #3) > Here's OpenBSD: > [...] > Here's NetBSD: > [...] This is *anything* but an argument in favor. > To provide more motivation, the idea is that you are in a > large multithreaded app that is swimming in a sea of unknown > file descriptors that may or may not have FD_CLOEXEC set, So, fix the code. We have O_CLOEXEC support as well. There is no reason to work around buggy code and this interface *actively* prevents innovations by usurping file descriptors.
>> To provide more motivation, the idea is that you are in a >> large multithreaded app that is swimming in a sea of unknown >> file descriptors that may or may not have FD_CLOEXEC set, >So, fix the code. We have O_CLOEXEC support as well. There is no reason to >work around buggy code and this interface *actively* prevents innovations by >usurping file descriptors. For many applications, there is no way in practice to control all the code running in the same address space. This is especially true for "platforms" like java, where arbitrary user-created shared libraries are loaded and executed at runtime. The idea of permitting innovations that use file descriptors is an interesting one, but one that in my opinion cannot succeed. Too many people (like myself) are maintaining library code that starts new subprocesses, and they will continue to indiscriminately close unknown file descriptors, with or without help from their libc. While my library closes file descriptors unconditionally, The python subprocess API makes closing fds an option. """If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.""" Interestingly, python provides a related function .. function:: closerange(fd_low, fd_high) Close all file descriptors from *fd_low* (inclusive) to *fd_high* (exclusive), ignoring errors. Availability: Unix, Windows. Equivalent to:: for fd in xrange(fd_low, fd_high): try: os.close(fd) except OSError: pass which doesn't seem to support "infinity" for the second argument.
Can we reopen this please? It's ludicrous that Linux still does not have it, and so many programs need to re-implement this, poorly, see e.g. _close_fds_by_brute_force(long start_fd, PyObject *py_fds_to_keep) in Python 3: https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L220
A reasonable implementation for this needs kernel support. We can certainly provide a system call wrapper for this functionality once it becomes available.
To expand Florian answer, the main issue is that closefrom call is inherent racy when trying to implement on libc without kernel support. The straightforward way to accomplish, and what python does, is to iterate over all open file descriptors (either by walking through /proc/<pid>/fds or sysconf(_SC_OPEN_MAX)) and issue a close on the file descriptor. The problem with this approach does not guarantee on multithread environments that a file descriptor would not be created in the iteration phase. One option would be to serialize a file descriptor creation routine (such as open, openat) with close; but besides the fact it adds a large complexity and scalability issue, it does not solve the issue of file descriptors being created by bypassing the libc (by issuing the syscall instruction directly). Both OpenBSD and Solaris implements it with syscalls, former with closefrom and later with fcntl(..., F_CLOSEFROM). Also, Solaris documents its MT-unsafe. In fact, IMHO a closefrom syscall won't help much for posix_spawn. Current implementation for Linux uses a clone(CLONE_VM, CLONE_VFORK) which means that only the calling thread will be suspended while the helper thread issues execlpe or _exit. It means that a file action to issue a closefrom will also be susceptible to race conditions in multi-thread environments. That's why posix_spawn_file_actions_addclosefrom_np does not make much sense unless you implement posix_spawn with fork+exec (which has its own scalability issues). The possible solutions are: 1. Ensure all file descriptors are opened with O_CLOEXEC. 2. Use a helper process to actually set up the required file descriptor close steps and issue the target binary. This is in fact what OpenJDK does (check src/java.base/unix/native/jspawnhelper/jspawnhelper.c and src/java.base/unix/native/libjava/childproc.c). So IMHO closefrom would be just a performance optimization on Linux, where kernel will know exactly which file descriptor to close.
The cpython problem with its current implementation is bad performance on systems with large sysconf(_SC_OPEN_MAX) value (e.g. it's much bigger on FreeBSD than it's on Linux). We actually plan to do a cpython PR which would replace the loop with a call to closefrom() -- for systems where it's available.
(In reply to Dima Pasechnik from comment #9) > The cpython problem with its current implementation is bad performance on > systems with large sysconf(_SC_OPEN_MAX) value (e.g. it's much bigger on > FreeBSD than it's on Linux). We actually plan to do a cpython PR which would > replace the loop with a call to closefrom() -- for systems where it's > available. AFAIK unfortunately for this case, there is no much glibc can do without proper kernel support.
libbsd does provide closefrom on Linux-I have not looked at what it actually does... https://packages.debian.org/source/sid/libbsd
(In reply to Dima Pasechnik from comment #11) > libbsd does provide closefrom on Linux-I have not looked at what it actually > does... > > https://packages.debian.org/source/sid/ It does exactly what I described earlier: interact over /proc/self/fd or getconf [1]. [1] https://gitlab.freedesktop.org/libbsd/libbsd/blob/master/src/closefrom.c
(In reply to Adhemerval Zanella from comment #8) > In fact, IMHO a closefrom syscall won't help much for posix_spawn. Current > implementation for Linux uses a clone(CLONE_VM, CLONE_VFORK) which means > that only the calling thread will be suspended while the helper thread > issues execlpe or _exit. It means that a file action to issue a closefrom > will also be susceptible to race conditions in multi-thread environments. In fact, this is not true since we do not use CLONE_FILES for posix_spawn. So it is possible to implement posix_spawn_file_actions_addclosefrom_np, albeit it would require interacting over the open file descriptors (and thus might incur in performance issues).
We should implement closefrom and related functions from Solaris. A high-quality implementation (that cannot fail) will require kernel support. I do not think there is a way to obtain the number of the highest file descriptor currently in use because RLIMIT_NOFILE is not enforced for existing descriptors. Opening /proc/self/fd may fail.
(In reply to Florian Weimer from comment #14) > We should implement closefrom and related functions from Solaris. A > high-quality implementation (that cannot fail) will require kernel support. Agreed. I have a wip patch that I might send. Solaris11 also defines as non-portable interfaces: posix_spawn_file_actions_addchdir_np posix_spawn_file_actions_addclosefrom_np posix_spawnattr_setsigignore_np posix_spawnattr_getsigignore_np posix_spawnattr_setvamask_np posix_spawnattr_getvamask_np We already implements posix_spawn_file_actions_addchdir_np. The posix_spawnattr_setsigignore_np/posix_spawnattr_getsigignore_np seems completary addition to posix_spawnattr_setsigdefault/posix_spawnattr_getsigdefault. I am trying to figure out what posix_spawnattr_setvamask_np/posix_spawnattr_getvamask_np do, I couldn't find any information online. > > I do not think there is a way to obtain the number of the highest file > descriptor currently in use because RLIMIT_NOFILE is not enforced for > existing descriptors. Opening /proc/self/fd may fail. I think an initial userland implementation need just to try opening /proc/self/fd and if it fails bail out with an error. Relying on RLIMIT_NOFILE, even as a fallback, might still leak file descriptors.
closefrom cannot fail. I'm happy to implement /proc/self/fd fallback, but only if we also have a proper system call that can be used to implement this correctly (without possible failure).
(In reply to Florian Weimer from comment #16) > closefrom cannot fail. I'm happy to implement /proc/self/fd fallback, but > only if we also have a proper system call that can be used to implement this > correctly (without possible failure). It would fail as other file actions, by not spawning the process and returning a generic error (we might also extend the posix_spawn to add a way to actually provide more information of what has failed exactly). I think it would be feasible to implement closefrom initially with/proc/self/fd and later if/when we get an actual syscall to move it as a fallback.
Right, for the posix_spawn extension, we do not need a fail-safe function. (FWOW, I have just posted a patch for a public getdents64 system call wrapper, and I'm working on async-signal-safe directory stream parsing.)
Hi. What do you think about this closefrom() to only set FD_CLOEXEC rather than to do the actual close()? Seems more consistent to me. - compatible with popen(), posix_spawn(), system() and co as you can just do it before fork() - does exactly what we need, and in a less invasive way - don't we need to close files only at exec()? - clearly underlines the fact that O_CLOEXEC should have been on by default since the creation of posix, but its not. So it fixes exactly that rather than working around that by doing explicit close(). Just an idea.
(In reply to Stas Sergeev from comment #19) > What do you think about this closefrom() to only set > FD_CLOEXEC rather than to do the actual close()? > Seems more consistent to me. > - compatible with popen(), posix_spawn(), system() and co > as you can just do it before fork() > - does exactly what we need, and in a less invasive way - > don't we need to close files only at exec()? You can't set CLOEXEC safely before the fork, it would have to come after. There isn't much difference between closing and setting the flag directly.
The linux kernel now has a close_range system call for this purpose. It would be cool if glibc can provide both a syscall wrapper for close_range *AND* closefrom implemented on top close_range in the linux target.
(In reply to Stas Sergeev from comment #19) > Hi. > > What do you think about this closefrom() to only set > FD_CLOEXEC rather than to do the actual close()? > Seems more consistent to me. > - compatible with popen(), posix_spawn(), system() and co > as you can just do it before fork() > - does exactly what we need, and in a less invasive way - > don't we need to close files only at exec()? > - clearly underlines the fact that O_CLOEXEC should have > been on by default since the creation of posix, but its not. > So it fixes exactly that rather than working around that by > doing explicit close(). > > Just an idea. What you request is now implemented with close_range(2) flag CLOSE_RANGE_CLOEXEC
(In reply to Cristian Rodríguez from comment #21) > The linux kernel now has a close_range system call for this purpose. It > would be cool if glibc can provide both a syscall wrapper for close_range > *AND* closefrom implemented on top close_range in the linux target. The close_range seems a sensible addition to added as a syscall wrapper for Linux, without any extra fallback, and returning ENOSYS on older kernel (as for copy_file_range). From Florian message, he should be ok with a fallback to /proc/self/fd if the syscall is not available; but this will make function not fail-safe and add possible issues (as it was usual for multiple syscall fallbacks we added previosuly). I also agree that closefrom shouldbe added as a GNU extension, multiple system have added it. For posix_spawn extension, I think we can add a /proc/self/fd fallback that sets FD_CLOEXEC for each; the spawn will fail if it can not be executed and it is allowed by the interface.
Fixed on 2.35 with the addition of close_range (Linux), closefrom, and posix_spawn_file_actions_addclosefrom_np.