Bug 10353

Summary: Methods for deleting all file descriptors greater than given integer (closefrom)
Product: glibc Reporter: Martin Buchholz <martinrb>
Component: libcAssignee: Adhemerval Zanella <adhemerval.zanella>
Status: RESOLVED FIXED    
Severity: normal CC: adhemerval.zanella, cristian, deller, dimpase, fweimer, glibc-bugs, stsp, yann
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: 2.34   
Host: Target:
Build: Last reconfirmed:

Description Martin Buchholz 2009-06-30 23:31:09 UTC
Recent Solaris provides a way to delete all file descriptors
greater than a given integer, and provides a way to ask
posix_spawn to do so.  I believe glibc should implement these extensions.

Both of the big programs I have worked on, xemacs and openjdk,
have written their own way to do this.

extern int posix_spawn_file_actions_addclosefrom_np(
	posix_spawn_file_actions_t *file_actions,
	int lowfiledes);

extern void closefrom(int);

http://docs.sun.com/app/docs/doc/819-2243/posix-spawn-file-actions-addclosefrom-np-3c?l=ja&a=view

http://docs.sun.com/app/docs/doc/819-2243/closefrom-3c?l=ja&a=view

The functionality that has been added to glibc allowing FD_CLOSE_ON_EXEC
to be specified at time of creation of the fd does help (thank you)
but it is not sufficient for "open" programs like the JDK where 
arbitrary third party native code may be concurrently opening file 
descriptors while creating a subprocess.
Comment 1 Roland McGrath 2009-06-30 23:52:42 UTC
nscd.c does this by hand in a Linux-specific way, and it is trivial to implement
in libc on Hurd.  So this seems like a good addition.
Comment 2 Ulrich Drepper 2009-07-01 05:57:37 UTC
No, it's a horrible idea.  The assumption that a program knows all the open file
descriptors is simply invalid.  The runtime (all kinds of libraries) can at any
point in time create additional file descriptors and indiscriminately calls for
trouble.  The correct way is to name the individual file descriptors the program
knows about and let the creator of the other file descriptors worry about the rest.

The reason nscd can do it the way it does it is simple: all the code used is
controlled by libc.  But that's a special case.
Comment 3 Martin Buchholz 2009-07-01 22:22:36 UTC
Aside from the Solaris 10 precedent, other OSes have adopted
closefrom, apparently with the same behavior.

Here's OpenBSD:

http://www.openbsd.org/cgi-bin/man.cgi?query=closefrom&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html

Here's NetBSD:

http://netbsd.gw.com/cgi-bin/man-cgi?closefrom++NetBSD-current

To provide more motivation, the idea is that you are in a
large multithreaded app that is swimming in a sea of unknown
file descriptors that may or may not have FD_CLOEXEC set,
you fork(), frob some file descriptors you care about,
and then need to close the rest.  You write your own buggy closefrom
or use the one provided by the system.
Comment 4 Ulrich Drepper 2009-07-02 06:27:45 UTC
(In reply to comment #3)
> Here's OpenBSD:
> [...]
> Here's NetBSD:
> [...]

This is *anything* but an argument in favor.


> To provide more motivation, the idea is that you are in a
> large multithreaded app that is swimming in a sea of unknown
> file descriptors that may or may not have FD_CLOEXEC set,

So, fix the code.  We have O_CLOEXEC support as well.  There is no reason to
work around buggy code and this interface *actively* prevents innovations by
usurping file descriptors.
Comment 5 Martin Buchholz 2009-07-02 16:14:46 UTC
>> To provide more motivation, the idea is that you are in a
>> large multithreaded app that is swimming in a sea of unknown
>> file descriptors that may or may not have FD_CLOEXEC set,

>So, fix the code.  We have O_CLOEXEC support as well.  There is no reason to
>work around buggy code and this interface *actively* prevents innovations by
>usurping file descriptors.

For many applications, there is no way in practice to control all the 
code running in the same address space.  This is especially true for
"platforms" like java, where arbitrary user-created shared libraries
are loaded and executed at runtime.

The idea of permitting innovations that use file descriptors is
an interesting one, but one that in my opinion cannot succeed.
Too many people (like myself) are maintaining library code
that starts new subprocesses, and they will continue to
indiscriminately close unknown file descriptors, 
with or without help from their libc.

While my library closes file descriptors unconditionally,
The python subprocess API makes closing fds an option.

"""If close_fds is true, all file descriptors except 0, 1 and 2 will be
closed before the child process is executed."""

Interestingly, python provides a related function

.. function:: closerange(fd_low, fd_high)

   Close all file descriptors from *fd_low* (inclusive) to *fd_high* (exclusive),
   ignoring errors. Availability: Unix, Windows. Equivalent to::

      for fd in xrange(fd_low, fd_high):
          try:
              os.close(fd)
          except OSError:
              pass

which doesn't seem to support "infinity" for the second argument.
Comment 6 Dima Pasechnik 2019-04-12 08:39:50 UTC
Can we reopen this please? It's ludicrous that Linux still does not have it, and so many programs need to re-implement this, poorly, see e.g. 
_close_fds_by_brute_force(long start_fd, PyObject *py_fds_to_keep)

in Python 3:

https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L220
Comment 7 Florian Weimer 2019-04-12 11:29:11 UTC
A reasonable implementation for this needs kernel support.  We can certainly provide a system call wrapper for this functionality once it becomes available.
Comment 8 Adhemerval Zanella 2019-04-12 14:46:24 UTC
To expand Florian answer, the main issue is that closefrom call is inherent racy when trying to implement on libc without kernel support. 

The straightforward way to accomplish, and what python does, is to iterate over all open file descriptors (either by walking through /proc/<pid>/fds or sysconf(_SC_OPEN_MAX)) and issue a close on the file descriptor.

The problem with this approach does not guarantee on multithread environments that a file descriptor would not be created in the iteration phase.  One option would be to serialize a file descriptor creation routine (such as open, openat) with close; but besides the fact it adds a large complexity and scalability issue, it does not solve the issue of file descriptors being created by bypassing the libc (by issuing the syscall instruction directly).

Both OpenBSD and Solaris implements it with syscalls, former with closefrom and later with fcntl(..., F_CLOSEFROM). Also, Solaris documents its MT-unsafe. 

In fact, IMHO a closefrom syscall won't help much for posix_spawn. Current implementation for Linux uses a clone(CLONE_VM, CLONE_VFORK) which means that only the calling thread will be suspended while the helper thread issues execlpe or _exit.  It means that a file action to issue a closefrom will also be susceptible to race conditions in multi-thread environments.

That's why posix_spawn_file_actions_addclosefrom_np does not make much sense unless you implement posix_spawn with fork+exec (which has its own scalability issues).  The possible solutions are:

 1. Ensure all file descriptors are opened with O_CLOEXEC.

 2. Use a helper process to actually set up the required file descriptor close steps and issue the target binary. This is in fact what OpenJDK does (check src/java.base/unix/native/jspawnhelper/jspawnhelper.c and src/java.base/unix/native/libjava/childproc.c).

So IMHO closefrom would be just a performance optimization on Linux, where kernel will know exactly which file descriptor to close.
Comment 9 Dima Pasechnik 2019-04-12 21:35:32 UTC
The cpython problem with its current implementation is bad performance on systems with large sysconf(_SC_OPEN_MAX) value (e.g. it's much bigger on FreeBSD than it's on Linux). We actually plan to do a cpython PR which would replace the loop with a call to closefrom() -- for systems where it's available.
Comment 10 Adhemerval Zanella 2019-04-12 22:10:51 UTC
(In reply to Dima Pasechnik from comment #9)
> The cpython problem with its current implementation is bad performance on
> systems with large sysconf(_SC_OPEN_MAX) value (e.g. it's much bigger on
> FreeBSD than it's on Linux). We actually plan to do a cpython PR which would
> replace the loop with a call to closefrom() -- for systems where it's
> available.

AFAIK unfortunately for this case, there is no much glibc can do without proper kernel support.
Comment 11 Dima Pasechnik 2019-04-14 12:48:04 UTC
libbsd does provide closefrom on Linux-I have not looked at what it actually does...

https://packages.debian.org/source/sid/libbsd
Comment 12 Adhemerval Zanella 2019-04-14 14:44:42 UTC
(In reply to Dima Pasechnik from comment #11)
> libbsd does provide closefrom on Linux-I have not looked at what it actually
> does...
> 
> https://packages.debian.org/source/sid/

It does exactly what I described earlier: interact over /proc/self/fd or getconf [1].

[1] https://gitlab.freedesktop.org/libbsd/libbsd/blob/master/src/closefrom.c
Comment 13 Adhemerval Zanella 2019-04-22 19:18:27 UTC
(In reply to Adhemerval Zanella from comment #8)
 
> In fact, IMHO a closefrom syscall won't help much for posix_spawn. Current
> implementation for Linux uses a clone(CLONE_VM, CLONE_VFORK) which means
> that only the calling thread will be suspended while the helper thread
> issues execlpe or _exit.  It means that a file action to issue a closefrom
> will also be susceptible to race conditions in multi-thread environments.

In fact, this is not true since we do not use CLONE_FILES for posix_spawn. So it is possible to implement posix_spawn_file_actions_addclosefrom_np, albeit it would require interacting over the open file descriptors (and thus might incur in performance issues).
Comment 14 Florian Weimer 2019-05-15 11:25:26 UTC
We should implement closefrom and related functions from Solaris.  A high-quality implementation (that cannot fail) will require kernel support.

I do not think there is a way to obtain the number of the highest file descriptor currently in use because RLIMIT_NOFILE is not enforced for existing descriptors.  Opening /proc/self/fd may fail.
Comment 15 Adhemerval Zanella 2019-05-15 12:10:43 UTC
(In reply to Florian Weimer from comment #14)
> We should implement closefrom and related functions from Solaris.  A
> high-quality implementation (that cannot fail) will require kernel support.

Agreed. I have a wip patch that I might send. Solaris11 also defines as non-portable interfaces:

posix_spawn_file_actions_addchdir_np
posix_spawn_file_actions_addclosefrom_np
posix_spawnattr_setsigignore_np
posix_spawnattr_getsigignore_np
posix_spawnattr_setvamask_np
posix_spawnattr_getvamask_np

We already implements posix_spawn_file_actions_addchdir_np.  The posix_spawnattr_setsigignore_np/posix_spawnattr_getsigignore_np seems completary addition to posix_spawnattr_setsigdefault/posix_spawnattr_getsigdefault.

I am trying to figure out what posix_spawnattr_setvamask_np/posix_spawnattr_getvamask_np do, I couldn't find any information online. 

> 
> I do not think there is a way to obtain the number of the highest file
> descriptor currently in use because RLIMIT_NOFILE is not enforced for
> existing descriptors.  Opening /proc/self/fd may fail.

I think an initial userland implementation need just to try opening /proc/self/fd and if it fails bail out with an error. Relying on RLIMIT_NOFILE, even as a fallback, might still leak file descriptors.
Comment 16 Florian Weimer 2019-05-15 12:17:16 UTC
closefrom cannot fail.  I'm happy to implement /proc/self/fd fallback, but only if we also have a proper system call that can be used to implement this correctly (without possible failure).
Comment 17 Adhemerval Zanella 2019-05-15 12:35:40 UTC
(In reply to Florian Weimer from comment #16)
> closefrom cannot fail.  I'm happy to implement /proc/self/fd fallback, but
> only if we also have a proper system call that can be used to implement this
> correctly (without possible failure).

It would fail as other file actions, by not spawning the process and returning a generic error (we might also extend the posix_spawn to add a way to actually provide more information of what has failed exactly).  I think it would be feasible to implement closefrom initially with/proc/self/fd and later if/when we get an actual syscall to move it as a fallback.
Comment 18 Florian Weimer 2019-05-15 15:24:55 UTC
Right, for the posix_spawn extension, we do not need a fail-safe function.

(FWOW, I have just posted a patch for a public getdents64 system call wrapper, and I'm working on async-signal-safe directory stream parsing.)
Comment 19 Stas Sergeev 2020-11-03 09:34:52 UTC
Hi.

What do you think about this closefrom() to only set
FD_CLOEXEC rather than to do the actual close()?
Seems more consistent to me.
- compatible with popen(), posix_spawn(), system() and co
as you can just do it before fork()
- does exactly what we need, and in a less invasive way -
don't we need to close files only at exec()?
- clearly underlines the fact that O_CLOEXEC should have
been on by default since the creation of posix, but its not.
So it fixes exactly that rather than working around that by
doing explicit close().

Just an idea.
Comment 20 Florian Weimer 2020-11-03 09:45:44 UTC
(In reply to Stas Sergeev from comment #19)

> What do you think about this closefrom() to only set
> FD_CLOEXEC rather than to do the actual close()?
> Seems more consistent to me.
> - compatible with popen(), posix_spawn(), system() and co
> as you can just do it before fork()
> - does exactly what we need, and in a less invasive way -
> don't we need to close files only at exec()?

You can't set CLOEXEC safely before the fork, it would have to come after.  There isn't much difference between closing and setting the flag directly.
Comment 21 Cristian Rodríguez 2020-12-16 16:39:27 UTC
The linux kernel now has a close_range system call for this purpose. It would be cool if glibc can provide both a syscall wrapper for close_range *AND* closefrom implemented on top close_range in the linux target.
Comment 22 Cristian Rodríguez 2020-12-16 16:40:48 UTC
(In reply to Stas Sergeev from comment #19)
> Hi.
> 
> What do you think about this closefrom() to only set
> FD_CLOEXEC rather than to do the actual close()?
> Seems more consistent to me.
> - compatible with popen(), posix_spawn(), system() and co
> as you can just do it before fork()
> - does exactly what we need, and in a less invasive way -
> don't we need to close files only at exec()?
> - clearly underlines the fact that O_CLOEXEC should have
> been on by default since the creation of posix, but its not.
> So it fixes exactly that rather than working around that by
> doing explicit close().
> 
> Just an idea.


What you request is now implemented with close_range(2) flag CLOSE_RANGE_CLOEXEC
Comment 23 Adhemerval Zanella 2020-12-16 17:07:25 UTC
(In reply to Cristian Rodríguez from comment #21)
> The linux kernel now has a close_range system call for this purpose. It
> would be cool if glibc can provide both a syscall wrapper for close_range
> *AND* closefrom implemented on top close_range in the linux target.

The close_range seems a sensible addition to added as a syscall wrapper for Linux, without any extra fallback, and returning ENOSYS on older kernel (as for copy_file_range). From Florian message, he should be ok with a fallback to /proc/self/fd if the syscall is not available; but this will make function not fail-safe and add possible issues (as it was usual for multiple syscall fallbacks we added previosuly).

I also agree that closefrom shouldbe added as a GNU extension, multiple system have added it.

For posix_spawn extension, I think we can add a /proc/self/fd fallback that sets FD_CLOEXEC for each; the spawn will fail if it can not be executed and it is allowed by the interface.
Comment 24 Adhemerval Zanella 2021-07-08 17:21:23 UTC
Fixed on 2.35 with the addition of close_range (Linux), closefrom, and posix_spawn_file_actions_addclosefrom_np.