This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/14942] New: File corruption bug in AIO with close()


http://sourceware.org/bugzilla/show_bug.cgi?id=14942

             Bug #: 14942
           Summary: File corruption bug in AIO with close()
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: unassigned@sourceware.org
        ReportedBy: bugdal@aerifal.cx
                CC: drepper.fsp@gmail.com
    Classification: Unclassified


Created attachment 6778
  --> http://sourceware.org/bugzilla/attachment.cgi?id=6778
demonstration of the bug

Per POSIX, close() is valid on a file descriptor with pending AIO operations:

"When there is an outstanding cancelable asynchronous I/O operation against
fildes when close() is called, that I/O operation may be canceled. An I/O
operation that is not canceled completes as if the close() operation had not
yet occurred. All operations that are not canceled shall complete as if the
close() blocked until the operations completed. The close() operation itself
need not block awaiting such I/O completion. Whether any I/O operation is
canceled, and which I/O operation may be canceled upon close(), is
implementation-defined."

My reading of this text is that you cannot assume anything about the integrity
of data pending for write on a given file descriptor if you close that file
descriptor, but that the behavior of calling close in this situation is not
undefined, and certainly is not permitted to corrupt other files.

However, as the attached test program shows, glibc's AIO implementation DOES
corrupt other files when close is called on a file descriptor with pending AIO
operations and the file descriptor number gets reused. I've used pipes to
control the timing in this example (and sometimes it still requires a few tries
to hit the bug), but it could happen just as well with regular files.

As long as AIO is being implemented with threads on top of regular POSIX file
operations, rather than via direct kernel support, I believe one of the
following two solutions must be used:

1. Modify close() to attempt to cancel any pending AIO requests and block until
they have all successfully completed or cancelled. This is very difficult,
since close() is required to be async-signal-safe.

2. Have the AIO implementation duplicate any file descriptor it's going to work
with, using fcntl with F_DUPFD_CLOEXEC, and always use the duplicate. In this
case, close() must still be responsible for dissociating the file descriptor
number from its AIO work queue so that AIO requests on a new file descriptor
don't get appended to the old work queue but instead result in a new one. This
still sounds difficult to do in a way that's async-signal-safe, however.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]