This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.
Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
I have just checked in some changes to libio that I think address all of the concerns recently raised here. There are two changes to the mmap plan that are actually independent. First, the decision to mmap is no longer done at the open, but instead at the time of the first read. The benefits of this are: * Save the overhead of stat+mmap if you fopen the file and never read it. * Don't perceive the lag if you fopen, wait while the file changes, and then read (not that you could ever have relied on it before, but now it behaves the same with mmap or no). * No spurious atime update (on Linux, via mmap call) at open time. * The atime is set on first read under the full range of conforming mmap behaviors without any extra read syscall (since the mmap is not done until first read, when it's immediately followed by the actual access). I implemented this in what seemed the clean way, which was to have a separate "maybe_mmap" jump table that is used when a possibly-mmap'able file is first opened. The functions in this jump table used for reading perform the stat and mmap attempt, switch the jump tables to either the mmap ones or the plain file ones depending on whether mmap could be used, and then punt to the chosen flavor's real read routines. The second set of changes is to the behavior on fflush and reaching EOF. I made it quite generous, but with what I think is a small amount of overhead. Basically, any time you try to read more than it thinks is there, it does a stat to re-check the file size and remap the file if necessary. After an fflush, it does that check on the next read attempt. If the file grows too big, or if mmap stops working for some reason, it will quit using mmap and switch to regular file methods (though it will never switch back after that, should the file get smaller or whatever). Given the POSIX.1 8.2.3 rules for synchronizing the file position, and the "underlying functions" clause with its monumentally vague phrase "certain traits", and a strong urge to fly, one can make the tenuous leap to analogs of the read vs write guarantees of what's visible when after appropriate stdio synchronization; i.e., that once you're guaranteed the file positions are synchronized, you're guaranteed that the next stdio read will behave as read does vis a vis a prior write. Now, I am not going to try to argue that this is what the standard requires. But it definitely describes the behavior all stdios heretofore have always had. My implementation is even a bit more generous than that, in that the 8.2.3 rules mention an implicit synchronization point when feof() is true while I also provide one when you have read exactly all of the file and not incurred a stdio EOF condition. This faithfully replicates the observed behavior of doing just that when mmap is not used (i.e. reading exactly the full contents but not hitting an EOF condition, then having someone else extend the file, then continuing the read without fflush or anything else first). I have added a few test programs that exercise some of these cases. I don't claim these programs strictly conform to POSIX, but they do demonstrate the assumptions that the reasoning above leads to. I think it is important to realize when thinking about these cases that the modifications done through the separate file descriptor could just as well be done by some unrelated process on the machine and the results, especially in the cases using fflush or having hit EOF, show what people certainly expect to happen when they try to read random files on the system. I am still concerned by cases involving truncation (or IO error, but that is sufficiently rare to ignore), such as what happens in the tst-mmap2-eofsync program when you remove the final fflush call. Here is an example equivalent to a simple reader of some file when another user might possibly come along and overwrite it. The behavior without mmap is to return stale data instead of EOF. The behavior now is to crash in fgetc with SIGBUS. The former state of affairs is what everyone always presumed when they used stdio to read files without outside synchronization: no guarantees about synchronization, but you will get either data that was once at that offset in the file or you will get EOF and there are no other outcomes. The new state of affairs might be rather distressing. Should cat or less or whatever program that uses stdio to read a file have the possibility to crash if it happens to be trying to read the wrong part at the same time another user is truncating the file? In the absence of MAP_COPY (or a non-POSIX filesystem that is only written with atomic-supercede semantics), there is no way to avoid the possibility of this fault signal. Myself, I would be perfectly happy to have libc check the signal for faults in its mapped files, turn them into a C++-style exception, and have every data access (including the getc macro) prepared to handle the exception. But this is not something we can make happen today, and methods of fault handling other than C++-style exception annotations are too costly (since they have overhead every time instead of just the once in a blue moon when a fault actually happens). Moreover, old binaries using the getc macro will never have fault handling for its buffer accesses. Incidentally, it occurs to me that we should probably tune the heuristic with some performance tests. I imagine that for files smaller than a page or two, doing stat + mmap + munmap might be worse than the normal case where it's a single read call with a small amount of data copying (vs MMU twiddling overhead) and you're done.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |