This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Manual should discourage mixing TEMP_FAILURE_RETRY and close


On 12/06/2016 11:42 PM, Zack Weinberg wrote:
> On 12/06/2016 10:51 AM, Rich Felker wrote:
>>> Also, I took a look at the illumos code base, which presumably does
>>> not differ greatly from historical Solaris on this point. It looks to
>>> me (closef(), closeandsetf()) that also on that system there are paths
>>> in close() where errors are returned but the FD is nevertheless
>>> closed. So it really does seem that POSIX.1 was specifying (and plans
>>> to again specify) behavior that is inconsistent with actual
>>> implementations.
>>
>> I think you're assuming POSIX is imposing something opposite of what
>> it's doing. The new text has filedes left open _only_ in the case of
>> EINTR. For all other errors it's closed and available for reassignment
>> on subsequent open, etc.
> 
> But that's still wrong!  `fildes` MUST be closed even when EINTR is
> returned.  That's what all existing implementations do (if I understood
> Michael correctly),

I believe it is "most (not all) implementations.

> and even if I've misunderstood that part, because
> *some* implementations do it, portable code has to assume that it's
> unsafe to retry the operation.  So the only sane specification for POSIX
> to make is that the descriptor is closed.

That's my opinion.

> (No, new programs cannot assume that it's safe to retry, because new
> programs get run on old operating systems.)

Exactly. This is the sense in which I'd argue that the POSIX.1
proposal (to forbid an implementation from returning EINTR
and close the FD) potentially worsens portability.

> (Yes, this means delayed failures can get lost.  Code that cares
> should've called `fsync` anyway.  Also, the whole thing is moot if you
> use SA_RESTART on all your signals, which is the Right Thing for tons of
> other reasons.)
> 
> ----
> 
> Regardless of what the specification does or doesn't say or will be
> corrected to say, what the *glibc manual* should say is clear, IMNSHO:
> don't retry close().  Proposed new text follows - it's not as emphatic
> as what Michael wrote for the manpages but I think it's sufficient.

Yes, I still think the glibc manual needs fixing. But, by now, I've
tried to be more nuanced in a further revision to the close(2) man
page, which by now reads:

       A careful programmer will check the return value of close(), since
       it  is quite possible that errors on a previous write(2) operation
       are reported only on the final close() that releases the open file
       description.   Failing  to  check  the return value when closing a
       file may lead to silent loss of  data.   This  can  especially  be
       observed with NFS and with disk quota.

       Note, however, that a failure return should be used only for diag‐
       nostic purposes (i.e., a warning to the application that there may
       still  be  I/O pending or there may have been failed I/O) or reme‐
       dial purposes (e.g., writing the file  once  more  or  creating  a
       backup).

       Retrying  the close() after a failure return is the wrong thing to
       do, since this may cause a reused  file  descriptor  from  another
       thread  to  be  closed.   This  can occur because the Linux kernel
       always releases the file descriptor early in the close  operation,
       freeing  it for reuse; the steps that may return an error, such as
       flushing data to the filesystem or device, occur only later in the
       close operation.

       Many   other  implementations  similarly  always  close  the  file
       descriptor (except in the case of EBADF,  meaning  that  the  file
       descriptor  was invalid) even if they subsequently report an error
       on return from close().   POSIX.1  is  currently  silent  on  this
       point,  but  there  are plans to mandate this behavior in the next
       major release of the standard

       A careful programmer who wants to know about I/O errors  may  pre‐
       cede close() with a call to fsync(2).

       The  EINTR  error is a somewhat special case.  Regarding the EINTR
       error, POSIX.1-2013 says:

              If close() is interrupted by a signal that is to be caught,
              it shall return -1 with errno set to EINTR and the state of
              fildes is unspecified.

       This permits the behavior that occurs  on  Linux  and  many  other
       implementations,  where, as with other errors that may be reported
       by close(), the file descriptor is guaranteed to be closed.   How‐
       ever, it also permits another possibility: that the implementation
       returns an  EINTR  error  and  keeps  the  file  descriptor  open.
       (According  to its documentation, HP-UX's close() does this.)  The
       caller must then once more use close() to close the file  descrip‐
       tor, to avoid file descriptor leaks.  This divergence in implemen‐
       tation behaviors provides a difficult hurdle for portable applica‐
       tions,  since  on many implementations, close() must not be called
       again after an EINTR error, and on at least one, close()  must  be
       called  again.   There are plans to address this conundrum for the
       next major release of the POSIX.1 standard.

(Excepting the first paragraph, all of the text there was added by me,
and can be considered in the public domain if any of it is useful for
inclusion in the glibc manual.)

> @comment unistd.h
> @comment POSIX.1
> @deftypefun int close (int @var{filedes})
> @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}}
> The function @code{close} closes the file descriptor @var{filedes}.
> Closing a file has the following consequences:
> 
> @itemize @bullet
> @item
> The file descriptor is deallocated.
> 
> @item
> Any record locks owned by the process on the file are unlocked.
> 
> @item
> When all file descriptors associated with a pipe or FIFO have been closed,
> any unread data is discarded.
> @end itemize
> 
> Closing a file does @emph{not} guarantee that all data has been
> successfully written to disk.  If you need to make sure of this, you
> must call @code{fsync} before @code{close} (@pxref{Synchronizing I/O
> operations}).
> 
> @code{close} never @emph{fails}; after any call to @code{close}
> returns, the file descriptor @var{filedes} has been closed and must
> not be used again.  However, it may still report an error.  It returns
> 0 if there was no error, and @math{-1} if an error occurred.  The
> following @code{errno} error conditions are defined for this function:
> 
> @table @code
> @item ENOSPC
> @itemx EIO
> @itemx EDQUOT
> These codes indicate an I/O error in an earlier call to @code{write}
> that appeared to succeed.  @xref{I/O Primitives}, for details on their
> meaning.  Because of these ``delayed failures,'' it is important to
> check for errors in @code{close}.
> 
> @item EINTR
> The @code{close} call was interrupted by a signal.  The only
> consequence of this is that a ``delayed failure'' may have been
> lost---the file descriptor has still been closed.  You should
> @emph{not} retry the system call.
> 
> @item EBADF
> The @var{filedes} argument was not an open file descriptor to begin
> with (and still isn't).
> @end table
> 
> This function is a cancellation point in multi-threaded programs.  This
> is a problem if the thread allocates some resources (like memory, file
> descriptors, semaphores or whatever) at the time @code{close} is
> called.  If the thread gets canceled these resources stay allocated
> until the program ends.  To avoid this, calls to @code{close} should be
> protected using cancellation handlers.
> @c ref pthread_cleanup_push / pthread_cleanup_pop
> 
> @code{close} can close any type of file descriptor, no matter how it
> was opened.  Therefore, there is no @code{close64}, as it is
> unnecessary.  However, @code{FILE} objects must be closed using
> @code{fclose} instead (@pxref{Closing Streams}).
> @end deftypefun

While the above is accurate for glibc on Linux, and is also
true for many other systems (AFAICT), I can see why there
may be resistance to accepting it for the glibc manual.
Maybe, some more nuanced text would be helpful. In any case
the current manual text should still be changed, IMO.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]