]>
Commit | Line | Data |
---|---|---|
28f540f4 | 1 | @node Low-Level I/O, File System Interface, I/O on Streams, Top |
7a68c94a | 2 | @c %MENU% Low-level, less portable I/O |
28f540f4 RM |
3 | @chapter Low-Level Input/Output |
4 | ||
5 | This chapter describes functions for performing low-level input/output | |
6 | operations on file descriptors. These functions include the primitives | |
7 | for the higher-level I/O functions described in @ref{I/O on Streams}, as | |
8 | well as functions for performing low-level control operations for which | |
9 | there are no equivalents on streams. | |
10 | ||
11 | Stream-level I/O is more flexible and usually more convenient; | |
12 | therefore, programmers generally use the descriptor-level functions only | |
13 | when necessary. These are some of the usual reasons: | |
14 | ||
15 | @itemize @bullet | |
16 | @item | |
17 | For reading binary files in large chunks. | |
18 | ||
19 | @item | |
20 | For reading an entire file into core before parsing it. | |
21 | ||
22 | @item | |
23 | To perform operations other than data transfer, which can only be done | |
24 | with a descriptor. (You can use @code{fileno} to get the descriptor | |
25 | corresponding to a stream.) | |
26 | ||
27 | @item | |
28 | To pass descriptors to a child process. (The child can create its own | |
29 | stream to use a descriptor that it inherits, but cannot inherit a stream | |
30 | directly.) | |
31 | @end itemize | |
32 | ||
33 | @menu | |
34 | * Opening and Closing Files:: How to open and close file | |
2c6fe0bd | 35 | descriptors. |
28f540f4 RM |
36 | * I/O Primitives:: Reading and writing data. |
37 | * File Position Primitive:: Setting a descriptor's file | |
2c6fe0bd | 38 | position. |
28f540f4 RM |
39 | * Descriptors and Streams:: Converting descriptor to stream |
40 | or vice-versa. | |
41 | * Stream/Descriptor Precautions:: Precautions needed if you use both | |
42 | descriptors and streams. | |
49c091e5 | 43 | * Scatter-Gather:: Fast I/O to discontinuous buffers. |
bad7a0c8 | 44 | * Copying File Data:: Copying data between files. |
07435eb4 | 45 | * Memory-mapped I/O:: Using files like memory. |
28f540f4 RM |
46 | * Waiting for I/O:: How to check for input or output |
47 | on multiple file descriptors. | |
dfd2257a | 48 | * Synchronizing I/O:: Making sure all I/O actions completed. |
b07d03e0 | 49 | * Asynchronous I/O:: Perform I/O in parallel. |
28f540f4 RM |
50 | * Control Operations:: Various other operations on file |
51 | descriptors. | |
52 | * Duplicating Descriptors:: Fcntl commands for duplicating | |
53 | file descriptors. | |
54 | * Descriptor Flags:: Fcntl commands for manipulating | |
55 | flags associated with file | |
2c6fe0bd | 56 | descriptors. |
28f540f4 RM |
57 | * File Status Flags:: Fcntl commands for manipulating |
58 | flags associated with open files. | |
59 | * File Locks:: Fcntl commands for implementing | |
60 | file locking. | |
0961f7e1 JL |
61 | * Open File Description Locks:: Fcntl commands for implementing |
62 | open file description locking. | |
63 | * Open File Description Locks Example:: An example of open file description lock | |
64 | usage | |
28f540f4 RM |
65 | * Interrupt Input:: Getting an asynchronous signal when |
66 | input arrives. | |
07435eb4 | 67 | * IOCTLs:: Generic I/O Control operations. |
6c0be743 | 68 | * Other Low-Level I/O APIs:: Other low-level-I/O-related functions. |
28f540f4 RM |
69 | @end menu |
70 | ||
71 | ||
72 | @node Opening and Closing Files | |
73 | @section Opening and Closing Files | |
74 | ||
75 | @cindex opening a file descriptor | |
76 | @cindex closing a file descriptor | |
77 | This section describes the primitives for opening and closing files | |
78 | using file descriptors. The @code{open} and @code{creat} functions are | |
79 | declared in the header file @file{fcntl.h}, while @code{close} is | |
80 | declared in @file{unistd.h}. | |
81 | @pindex unistd.h | |
82 | @pindex fcntl.h | |
83 | ||
28f540f4 | 84 | @deftypefun int open (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) |
d08a7e4c | 85 | @standards{POSIX.1, fcntl.h} |
2cc3615c | 86 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
624254b1 SC |
87 | The @code{open} function creates and returns a new file descriptor for |
88 | the file named by @var{filename}. Initially, the file position | |
28f540f4 | 89 | indicator for the file is at the beginning of the file. The argument |
624254b1 SC |
90 | @var{mode} (@pxref{Permission Bits}) is used only when a file is |
91 | created, but it doesn't hurt to supply the argument in any case. | |
28f540f4 RM |
92 | |
93 | The @var{flags} argument controls how the file is to be opened. This is | |
94 | a bit mask; you create the value by the bitwise OR of the appropriate | |
95 | parameters (using the @samp{|} operator in C). | |
96 | @xref{File Status Flags}, for the parameters available. | |
97 | ||
98 | The normal return value from @code{open} is a non-negative integer file | |
07435eb4 | 99 | descriptor. In the case of an error, a value of @math{-1} is returned |
28f540f4 RM |
100 | instead. In addition to the usual file name errors (@pxref{File |
101 | Name Errors}), the following @code{errno} error conditions are defined | |
102 | for this function: | |
103 | ||
104 | @table @code | |
105 | @item EACCES | |
19e4c7dd | 106 | The file exists but is not readable/writable as requested by the @var{flags} |
9739d2d5 | 107 | argument, or the file does not exist and the directory is unwritable so |
28f540f4 RM |
108 | it cannot be created. |
109 | ||
110 | @item EEXIST | |
111 | Both @code{O_CREAT} and @code{O_EXCL} are set, and the named file already | |
112 | exists. | |
113 | ||
114 | @item EINTR | |
115 | The @code{open} operation was interrupted by a signal. | |
116 | @xref{Interrupted Primitives}. | |
117 | ||
118 | @item EISDIR | |
119 | The @var{flags} argument specified write access, and the file is a directory. | |
120 | ||
121 | @item EMFILE | |
122 | The process has too many files open. | |
123 | The maximum number of file descriptors is controlled by the | |
124 | @code{RLIMIT_NOFILE} resource limit; @pxref{Limits on Resources}. | |
125 | ||
126 | @item ENFILE | |
127 | The entire system, or perhaps the file system which contains the | |
128 | directory, cannot support any additional open files at the moment. | |
a7a93d50 | 129 | (This problem cannot happen on @gnuhurdsystems{}.) |
28f540f4 RM |
130 | |
131 | @item ENOENT | |
132 | The named file does not exist, and @code{O_CREAT} is not specified. | |
133 | ||
134 | @item ENOSPC | |
135 | The directory or file system that would contain the new file cannot be | |
136 | extended, because there is no disk space left. | |
137 | ||
138 | @item ENXIO | |
139 | @code{O_NONBLOCK} and @code{O_WRONLY} are both set in the @var{flags} | |
140 | argument, the file named by @var{filename} is a FIFO (@pxref{Pipes and | |
141 | FIFOs}), and no process has the file open for reading. | |
142 | ||
143 | @item EROFS | |
144 | The file resides on a read-only file system and any of @w{@code{O_WRONLY}}, | |
145 | @code{O_RDWR}, and @code{O_TRUNC} are set in the @var{flags} argument, | |
146 | or @code{O_CREAT} is set and the file does not already exist. | |
147 | @end table | |
148 | ||
149 | @c !!! umask | |
150 | ||
04b9968b | 151 | If on a 32 bit machine the sources are translated with |
b07d03e0 UD |
152 | @code{_FILE_OFFSET_BITS == 64} the function @code{open} returns a file |
153 | descriptor opened in the large file mode which enables the file handling | |
9ceeb279 OB |
154 | functions to use files up to @twoexp{63} bytes in size and offset from |
155 | @minus{}@twoexp{63} to @twoexp{63}. This happens transparently for the user | |
9739d2d5 | 156 | since all of the low-level file handling functions are equally replaced. |
b07d03e0 | 157 | |
04b9968b | 158 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
159 | is a problem if the thread allocates some resources (like memory, file |
160 | descriptors, semaphores or whatever) at the time @code{open} is | |
19e4c7dd | 161 | called. If the thread gets canceled these resources stay allocated |
dfd2257a | 162 | until the program ends. To avoid this calls to @code{open} should be |
04b9968b | 163 | protected using cancellation handlers. |
dfd2257a UD |
164 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
165 | ||
28f540f4 RM |
166 | The @code{open} function is the underlying primitive for the @code{fopen} |
167 | and @code{freopen} functions, that create streams. | |
168 | @end deftypefun | |
169 | ||
b07d03e0 | 170 | @deftypefun int open64 (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) |
d08a7e4c | 171 | @standards{Unix98, fcntl.h} |
2cc3615c | 172 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
b07d03e0 UD |
173 | This function is similar to @code{open}. It returns a file descriptor |
174 | which can be used to access the file named by @var{filename}. The only | |
04b9968b | 175 | difference is that on 32 bit systems the file is opened in the |
b07d03e0 UD |
176 | large file mode. I.e., file length and file offsets can exceed 31 bits. |
177 | ||
b07d03e0 UD |
178 | When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this |
179 | function is actually available under the name @code{open}. I.e., the | |
180 | new, extended API using 64 bit file sizes and offsets transparently | |
181 | replaces the old API. | |
182 | @end deftypefun | |
183 | ||
28f540f4 | 184 | @deftypefn {Obsolete function} int creat (const char *@var{filename}, mode_t @var{mode}) |
d08a7e4c | 185 | @standards{POSIX.1, fcntl.h} |
2cc3615c | 186 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
28f540f4 RM |
187 | This function is obsolete. The call: |
188 | ||
189 | @smallexample | |
190 | creat (@var{filename}, @var{mode}) | |
191 | @end smallexample | |
192 | ||
193 | @noindent | |
194 | is equivalent to: | |
195 | ||
196 | @smallexample | |
197 | open (@var{filename}, O_WRONLY | O_CREAT | O_TRUNC, @var{mode}) | |
198 | @end smallexample | |
b07d03e0 | 199 | |
04b9968b | 200 | If on a 32 bit machine the sources are translated with |
b07d03e0 UD |
201 | @code{_FILE_OFFSET_BITS == 64} the function @code{creat} returns a file |
202 | descriptor opened in the large file mode which enables the file handling | |
9ceeb279 OB |
203 | functions to use files up to @twoexp{63} in size and offset from |
204 | @minus{}@twoexp{63} to @twoexp{63}. This happens transparently for the user | |
9739d2d5 | 205 | since all of the low-level file handling functions are equally replaced. |
b07d03e0 UD |
206 | @end deftypefn |
207 | ||
b07d03e0 | 208 | @deftypefn {Obsolete function} int creat64 (const char *@var{filename}, mode_t @var{mode}) |
d08a7e4c | 209 | @standards{Unix98, fcntl.h} |
2cc3615c | 210 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
b07d03e0 UD |
211 | This function is similar to @code{creat}. It returns a file descriptor |
212 | which can be used to access the file named by @var{filename}. The only | |
9739d2d5 | 213 | difference is that on 32 bit systems the file is opened in the |
b07d03e0 UD |
214 | large file mode. I.e., file length and file offsets can exceed 31 bits. |
215 | ||
216 | To use this file descriptor one must not use the normal operations but | |
217 | instead the counterparts named @code{*64}, e.g., @code{read64}. | |
218 | ||
219 | When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this | |
220 | function is actually available under the name @code{open}. I.e., the | |
221 | new, extended API using 64 bit file sizes and offsets transparently | |
222 | replaces the old API. | |
28f540f4 RM |
223 | @end deftypefn |
224 | ||
28f540f4 | 225 | @deftypefun int close (int @var{filedes}) |
d08a7e4c | 226 | @standards{POSIX.1, unistd.h} |
2cc3615c | 227 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
28f540f4 RM |
228 | The function @code{close} closes the file descriptor @var{filedes}. |
229 | Closing a file has the following consequences: | |
230 | ||
231 | @itemize @bullet | |
2c6fe0bd | 232 | @item |
28f540f4 RM |
233 | The file descriptor is deallocated. |
234 | ||
235 | @item | |
236 | Any record locks owned by the process on the file are unlocked. | |
237 | ||
238 | @item | |
239 | When all file descriptors associated with a pipe or FIFO have been closed, | |
240 | any unread data is discarded. | |
241 | @end itemize | |
242 | ||
04b9968b | 243 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
244 | is a problem if the thread allocates some resources (like memory, file |
245 | descriptors, semaphores or whatever) at the time @code{close} is | |
19e4c7dd | 246 | called. If the thread gets canceled these resources stay allocated |
04b9968b UD |
247 | until the program ends. To avoid this, calls to @code{close} should be |
248 | protected using cancellation handlers. | |
dfd2257a UD |
249 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
250 | ||
07435eb4 | 251 | The normal return value from @code{close} is @math{0}; a value of @math{-1} |
28f540f4 RM |
252 | is returned in case of failure. The following @code{errno} error |
253 | conditions are defined for this function: | |
254 | ||
255 | @table @code | |
256 | @item EBADF | |
257 | The @var{filedes} argument is not a valid file descriptor. | |
258 | ||
259 | @item EINTR | |
260 | The @code{close} call was interrupted by a signal. | |
261 | @xref{Interrupted Primitives}. | |
262 | Here is an example of how to handle @code{EINTR} properly: | |
263 | ||
264 | @smallexample | |
265 | TEMP_FAILURE_RETRY (close (desc)); | |
266 | @end smallexample | |
267 | ||
268 | @item ENOSPC | |
269 | @itemx EIO | |
270 | @itemx EDQUOT | |
2c6fe0bd | 271 | When the file is accessed by NFS, these errors from @code{write} can sometimes |
28f540f4 RM |
272 | not be detected until @code{close}. @xref{I/O Primitives}, for details |
273 | on their meaning. | |
274 | @end table | |
b07d03e0 UD |
275 | |
276 | Please note that there is @emph{no} separate @code{close64} function. | |
277 | This is not necessary since this function does not determine nor depend | |
fed8f7f7 | 278 | on the mode of the file. The kernel which performs the @code{close} |
04b9968b | 279 | operation knows which mode the descriptor is used for and can handle |
b07d03e0 | 280 | this situation. |
28f540f4 RM |
281 | @end deftypefun |
282 | ||
283 | To close a stream, call @code{fclose} (@pxref{Closing Streams}) instead | |
284 | of trying to close its underlying file descriptor with @code{close}. | |
285 | This flushes any buffered output and updates the stream object to | |
286 | indicate that it is closed. | |
287 | ||
28628628 AZ |
288 | @deftypefun int close_range (unsigned int @var{lowfd}, unsigned int @var{maxfd}, int @var{flags}) |
289 | @standards{Linux, unistd.h} | |
290 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} | |
291 | @c This is a syscall for Linux v5.9. There is no fallback emulation for | |
292 | @c older kernels. | |
293 | ||
294 | The function @code{close_range} closes the file descriptor from @var{lowfd} | |
295 | to @var{maxfd} (inclusive). This function is similar to call @code{close} in | |
296 | specified file descriptor range depending on the @var{flags}. | |
297 | ||
298 | This is function is only supported on recent Linux versions and @theglibc{} | |
299 | does not provide any fallback (the application will need to handle possible | |
300 | @code{ENOSYS}). | |
301 | ||
302 | The @var{flags} add options on how the files are closes. Linux currently | |
303 | supports: | |
304 | ||
305 | @vtable @code | |
306 | @item CLOSE_RANGE_UNSHARE | |
307 | Unshare the file descriptor table before closing file descriptors. | |
308 | ||
309 | @item CLOSE_RANGE_CLOEXEC | |
310 | Set the @code{FD_CLOEXEC} bit instead of closing the file descriptor. | |
311 | @end vtable | |
312 | ||
313 | The normal return value from @code{close_range} is @math{0}; a value | |
314 | of @math{-1} is returned in case of failure. The following @code{errno} error | |
315 | conditions are defined for this function: | |
316 | ||
317 | @table @code | |
318 | @item EINVAL | |
319 | The @var{lowfd} value is larger than @var{maxfd} or an unsupported @var{flags} | |
320 | is used. | |
321 | ||
322 | @item ENOMEM | |
323 | Either there is not enough memory for the operation, or the process is | |
64d9ebae | 324 | out of address space. It can only happen when @code{CLOSE_RANGE_UNSHARED} |
28628628 AZ |
325 | flag is used. |
326 | ||
327 | @item EMFILE | |
328 | The process has too many files open and it can only happens when | |
329 | @code{CLOSE_RANGE_UNSHARED} flag is used. | |
330 | The maximum number of file descriptors is controlled by the | |
331 | @code{RLIMIT_NOFILE} resource limit; @pxref{Limits on Resources}. | |
332 | ||
333 | @item ENOSYS | |
334 | The kernel does not implement the required functionality. | |
335 | @end table | |
336 | @end deftypefun | |
337 | ||
60744950 AZ |
338 | @deftypefun void closefrom (int @var{lowfd}) |
339 | @standards{GNU, unistd.h} | |
340 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} | |
341 | ||
5aa359d3 MK |
342 | The function @code{closefrom} closes all file descriptors greater than or equal |
343 | to @var{lowfd}. This function is similar to calling | |
60744950 AZ |
344 | @code{close} for all open file descriptors not less than @var{lowfd}. |
345 | ||
346 | Already closed file descriptors are ignored. | |
347 | @end deftypefun | |
28628628 | 348 | |
28f540f4 RM |
349 | @node I/O Primitives |
350 | @section Input and Output Primitives | |
351 | ||
352 | This section describes the functions for performing primitive input and | |
353 | output operations on file descriptors: @code{read}, @code{write}, and | |
354 | @code{lseek}. These functions are declared in the header file | |
355 | @file{unistd.h}. | |
356 | @pindex unistd.h | |
357 | ||
28f540f4 | 358 | @deftp {Data Type} ssize_t |
d08a7e4c | 359 | @standards{POSIX.1, unistd.h} |
28f540f4 RM |
360 | This data type is used to represent the sizes of blocks that can be |
361 | read or written in a single operation. It is similar to @code{size_t}, | |
362 | but must be a signed type. | |
363 | @end deftp | |
364 | ||
365 | @cindex reading from a file descriptor | |
28f540f4 | 366 | @deftypefun ssize_t read (int @var{filedes}, void *@var{buffer}, size_t @var{size}) |
d08a7e4c | 367 | @standards{POSIX.1, unistd.h} |
2cc3615c | 368 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
369 | The @code{read} function reads up to @var{size} bytes from the file |
370 | with descriptor @var{filedes}, storing the results in the @var{buffer}. | |
04b9968b UD |
371 | (This is not necessarily a character string, and no terminating null |
372 | character is added.) | |
28f540f4 RM |
373 | |
374 | @cindex end-of-file, on a file descriptor | |
375 | The return value is the number of bytes actually read. This might be | |
376 | less than @var{size}; for example, if there aren't that many bytes left | |
377 | in the file or if there aren't that many bytes immediately available. | |
378 | The exact behavior depends on what kind of file it is. Note that | |
379 | reading less than @var{size} bytes is not an error. | |
380 | ||
381 | A value of zero indicates end-of-file (except if the value of the | |
382 | @var{size} argument is also zero). This is not considered an error. | |
383 | If you keep calling @code{read} while at end-of-file, it will keep | |
384 | returning zero and doing nothing else. | |
385 | ||
386 | If @code{read} returns at least one character, there is no way you can | |
387 | tell whether end-of-file was reached. But if you did reach the end, the | |
388 | next read will return zero. | |
389 | ||
07435eb4 | 390 | In case of an error, @code{read} returns @math{-1}. The following |
28f540f4 RM |
391 | @code{errno} error conditions are defined for this function: |
392 | ||
393 | @table @code | |
394 | @item EAGAIN | |
395 | Normally, when no input is immediately available, @code{read} waits for | |
396 | some input. But if the @code{O_NONBLOCK} flag is set for the file | |
397 | (@pxref{File Status Flags}), @code{read} returns immediately without | |
398 | reading any data, and reports this error. | |
399 | ||
400 | @strong{Compatibility Note:} Most versions of BSD Unix use a different | |
1f77f049 | 401 | error code for this: @code{EWOULDBLOCK}. In @theglibc{}, |
28f540f4 RM |
402 | @code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter |
403 | which name you use. | |
404 | ||
405 | On some systems, reading a large amount of data from a character special | |
406 | file can also fail with @code{EAGAIN} if the kernel cannot find enough | |
407 | physical memory to lock down the user's pages. This is limited to | |
408 | devices that transfer with direct memory access into the user's memory, | |
409 | which means it does not include terminals, since they always use | |
a7a93d50 JM |
410 | separate buffers inside the kernel. This problem never happens on |
411 | @gnuhurdsystems{}. | |
28f540f4 RM |
412 | |
413 | Any condition that could result in @code{EAGAIN} can instead result in a | |
414 | successful @code{read} which returns fewer bytes than requested. | |
415 | Calling @code{read} again immediately would result in @code{EAGAIN}. | |
416 | ||
417 | @item EBADF | |
418 | The @var{filedes} argument is not a valid file descriptor, | |
419 | or is not open for reading. | |
420 | ||
421 | @item EINTR | |
422 | @code{read} was interrupted by a signal while it was waiting for input. | |
9739d2d5 | 423 | @xref{Interrupted Primitives}. A signal will not necessarily cause |
28f540f4 RM |
424 | @code{read} to return @code{EINTR}; it may instead result in a |
425 | successful @code{read} which returns fewer bytes than requested. | |
426 | ||
427 | @item EIO | |
428 | For many devices, and for disk files, this error code indicates | |
429 | a hardware error. | |
430 | ||
431 | @code{EIO} also occurs when a background process tries to read from the | |
432 | controlling terminal, and the normal action of stopping the process by | |
433 | sending it a @code{SIGTTIN} signal isn't working. This might happen if | |
04b9968b | 434 | the signal is being blocked or ignored, or because the process group is |
28f540f4 RM |
435 | orphaned. @xref{Job Control}, for more information about job control, |
436 | and @ref{Signal Handling}, for information about signals. | |
7e583a52 RM |
437 | |
438 | @item EINVAL | |
439 | In some systems, when reading from a character or block device, position | |
440 | and size offsets must be aligned to a particular block size. This error | |
441 | indicates that the offsets were not properly aligned. | |
28f540f4 RM |
442 | @end table |
443 | ||
b07d03e0 UD |
444 | Please note that there is no function named @code{read64}. This is not |
445 | necessary since this function does not directly modify or handle the | |
446 | possibly wide file offset. Since the kernel handles this state | |
04b9968b | 447 | internally, the @code{read} function can be used for all cases. |
b07d03e0 | 448 | |
04b9968b | 449 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
450 | is a problem if the thread allocates some resources (like memory, file |
451 | descriptors, semaphores or whatever) at the time @code{read} is | |
19e4c7dd | 452 | called. If the thread gets canceled these resources stay allocated |
04b9968b UD |
453 | until the program ends. To avoid this, calls to @code{read} should be |
454 | protected using cancellation handlers. | |
dfd2257a UD |
455 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
456 | ||
28f540f4 RM |
457 | The @code{read} function is the underlying primitive for all of the |
458 | functions that read from streams, such as @code{fgetc}. | |
459 | @end deftypefun | |
460 | ||
a5a0310d | 461 | @deftypefun ssize_t pread (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off_t @var{offset}) |
d08a7e4c | 462 | @standards{Unix98, unistd.h} |
2cc3615c AO |
463 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
464 | @c This is usually a safe syscall. The sysdeps/posix fallback emulation | |
465 | @c is not MT-Safe because it uses lseek, read and lseek back, but is it | |
466 | @c used anywhere? | |
a5a0310d | 467 | The @code{pread} function is similar to the @code{read} function. The |
04b9968b UD |
468 | first three arguments are identical, and the return values and error |
469 | codes also correspond. | |
a5a0310d UD |
470 | |
471 | The difference is the fourth argument and its handling. The data block | |
472 | is not read from the current position of the file descriptor | |
473 | @code{filedes}. Instead the data is read from the file starting at | |
474 | position @var{offset}. The position of the file descriptor itself is | |
04b9968b | 475 | not affected by the operation. The value is the same as before the call. |
a5a0310d | 476 | |
b07d03e0 UD |
477 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
478 | @code{pread} function is in fact @code{pread64} and the type | |
04b9968b | 479 | @code{off_t} has 64 bits, which makes it possible to handle files up to |
9ceeb279 | 480 | @twoexp{63} bytes in length. |
b07d03e0 | 481 | |
a5a0310d UD |
482 | The return value of @code{pread} describes the number of bytes read. |
483 | In the error case it returns @math{-1} like @code{read} does and the | |
04b9968b UD |
484 | error codes are also the same, with these additions: |
485 | ||
a5a0310d UD |
486 | @table @code |
487 | @item EINVAL | |
488 | The value given for @var{offset} is negative and therefore illegal. | |
489 | ||
490 | @item ESPIPE | |
9739d2d5 | 491 | The file descriptor @var{filedes} is associated with a pipe or a FIFO and |
a5a0310d UD |
492 | this device does not allow positioning of the file pointer. |
493 | @end table | |
494 | ||
495 | The function is an extension defined in the Unix Single Specification | |
496 | version 2. | |
497 | @end deftypefun | |
498 | ||
b07d03e0 | 499 | @deftypefun ssize_t pread64 (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) |
d08a7e4c | 500 | @standards{Unix98, unistd.h} |
2cc3615c AO |
501 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
502 | @c This is usually a safe syscall. The sysdeps/posix fallback emulation | |
503 | @c is not MT-Safe because it uses lseek64, read and lseek64 back, but is | |
504 | @c it used anywhere? | |
b07d03e0 UD |
505 | This function is similar to the @code{pread} function. The difference |
506 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
04b9968b | 507 | @code{off_t} which makes it possible on 32 bit machines to address |
9ceeb279 | 508 | files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The |
b07d03e0 UD |
509 | file descriptor @code{filedes} must be opened using @code{open64} since |
510 | otherwise the large offsets possible with @code{off64_t} will lead to | |
511 | errors with a descriptor in small file mode. | |
512 | ||
c756c71c | 513 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a |
04b9968b UD |
514 | 32 bit machine this function is actually available under the name |
515 | @code{pread} and so transparently replaces the 32 bit interface. | |
b07d03e0 UD |
516 | @end deftypefun |
517 | ||
28f540f4 | 518 | @cindex writing to a file descriptor |
28f540f4 | 519 | @deftypefun ssize_t write (int @var{filedes}, const void *@var{buffer}, size_t @var{size}) |
d08a7e4c | 520 | @standards{POSIX.1, unistd.h} |
2cc3615c | 521 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
0c6891a0 CD |
522 | @c Some say write is thread-unsafe on Linux without O_APPEND. In the VFS layer |
523 | @c the vfs_write() does no locking around the acquisition of a file offset and | |
524 | @c therefore multiple threads / kernel tasks may race and get the same offset | |
525 | @c resulting in data loss. | |
526 | @c | |
527 | @c See: | |
528 | @c http://thread.gmane.org/gmane.linux.kernel/397980 | |
529 | @c http://lwn.net/Articles/180387/ | |
530 | @c | |
531 | @c The counter argument is that POSIX only says that the write starts at the | |
532 | @c file position and that the file position is updated *before* the function | |
533 | @c returns. What that really means is that any expectation of atomic writes is | |
534 | @c strictly an invention of the interpretation of the reader. Data loss could | |
535 | @c happen if two threads start the write at the same time. Only writes that | |
536 | @c come after the return of another write are guaranteed to follow the other | |
537 | @c write. | |
538 | @c | |
539 | @c The other side of the coin is that POSIX goes on further to say in | |
540 | @c "2.9.7 Thread Interactions with Regular File Operations" that threads | |
541 | @c should never see interleaving sets of file operations, but it is insane | |
542 | @c to do anything like that because it kills performance, so you don't get | |
543 | @c those guarantees in Linux. | |
544 | @c | |
545 | @c So we mark it thread safe, it doesn't blow up, but you might loose | |
546 | @c data, and we don't strictly meet the POSIX requirements. | |
a2887bdb CD |
547 | @c |
548 | @c The fix for file offsets racing was merged in 3.14, the commits were: | |
549 | @c 9c225f2655e36a470c4f58dbbc99244c5fc7f2d4, and | |
550 | @c d7a15f8d0777955986a2ab00ab181795cab14b01. Therefore after Linux 3.14 you | |
551 | @c should get mostly MT-safe writes. | |
28f540f4 RM |
552 | The @code{write} function writes up to @var{size} bytes from |
553 | @var{buffer} to the file with descriptor @var{filedes}. The data in | |
554 | @var{buffer} is not necessarily a character string and a null character is | |
555 | output like any other character. | |
556 | ||
557 | The return value is the number of bytes actually written. This may be | |
558 | @var{size}, but can always be smaller. Your program should always call | |
559 | @code{write} in a loop, iterating until all the data is written. | |
560 | ||
561 | Once @code{write} returns, the data is enqueued to be written and can be | |
562 | read back right away, but it is not necessarily written out to permanent | |
563 | storage immediately. You can use @code{fsync} when you need to be sure | |
564 | your data has been permanently stored before continuing. (It is more | |
565 | efficient for the system to batch up consecutive writes and do them all | |
566 | at once when convenient. Normally they will always be written to disk | |
a5a0310d UD |
567 | within a minute or less.) Modern systems provide another function |
568 | @code{fdatasync} which guarantees integrity only for the file data and | |
569 | is therefore faster. | |
570 | @c !!! xref fsync, fdatasync | |
2c6fe0bd | 571 | You can use the @code{O_FSYNC} open mode to make @code{write} always |
28f540f4 RM |
572 | store the data to disk before returning; @pxref{Operating Modes}. |
573 | ||
07435eb4 | 574 | In the case of an error, @code{write} returns @math{-1}. The following |
28f540f4 RM |
575 | @code{errno} error conditions are defined for this function: |
576 | ||
577 | @table @code | |
578 | @item EAGAIN | |
579 | Normally, @code{write} blocks until the write operation is complete. | |
580 | But if the @code{O_NONBLOCK} flag is set for the file (@pxref{Control | |
04b9968b | 581 | Operations}), it returns immediately without writing any data and |
28f540f4 RM |
582 | reports this error. An example of a situation that might cause the |
583 | process to block on output is writing to a terminal device that supports | |
584 | flow control, where output has been suspended by receipt of a STOP | |
585 | character. | |
586 | ||
587 | @strong{Compatibility Note:} Most versions of BSD Unix use a different | |
1f77f049 | 588 | error code for this: @code{EWOULDBLOCK}. In @theglibc{}, |
28f540f4 RM |
589 | @code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter |
590 | which name you use. | |
591 | ||
592 | On some systems, writing a large amount of data from a character special | |
593 | file can also fail with @code{EAGAIN} if the kernel cannot find enough | |
594 | physical memory to lock down the user's pages. This is limited to | |
595 | devices that transfer with direct memory access into the user's memory, | |
596 | which means it does not include terminals, since they always use | |
a7a93d50 JM |
597 | separate buffers inside the kernel. This problem does not arise on |
598 | @gnuhurdsystems{}. | |
28f540f4 RM |
599 | |
600 | @item EBADF | |
601 | The @var{filedes} argument is not a valid file descriptor, | |
602 | or is not open for writing. | |
603 | ||
604 | @item EFBIG | |
605 | The size of the file would become larger than the implementation can support. | |
606 | ||
607 | @item EINTR | |
608 | The @code{write} operation was interrupted by a signal while it was | |
04b9968b | 609 | blocked waiting for completion. A signal will not necessarily cause |
28f540f4 RM |
610 | @code{write} to return @code{EINTR}; it may instead result in a |
611 | successful @code{write} which writes fewer bytes than requested. | |
612 | @xref{Interrupted Primitives}. | |
613 | ||
614 | @item EIO | |
615 | For many devices, and for disk files, this error code indicates | |
616 | a hardware error. | |
617 | ||
618 | @item ENOSPC | |
619 | The device containing the file is full. | |
620 | ||
621 | @item EPIPE | |
622 | This error is returned when you try to write to a pipe or FIFO that | |
623 | isn't open for reading by any process. When this happens, a @code{SIGPIPE} | |
624 | signal is also sent to the process; see @ref{Signal Handling}. | |
7e583a52 RM |
625 | |
626 | @item EINVAL | |
627 | In some systems, when writing to a character or block device, position | |
628 | and size offsets must be aligned to a particular block size. This error | |
629 | indicates that the offsets were not properly aligned. | |
28f540f4 RM |
630 | @end table |
631 | ||
632 | Unless you have arranged to prevent @code{EINTR} failures, you should | |
633 | check @code{errno} after each failing call to @code{write}, and if the | |
634 | error was @code{EINTR}, you should simply repeat the call. | |
635 | @xref{Interrupted Primitives}. The easy way to do this is with the | |
636 | macro @code{TEMP_FAILURE_RETRY}, as follows: | |
637 | ||
638 | @smallexample | |
639 | nbytes = TEMP_FAILURE_RETRY (write (desc, buffer, count)); | |
640 | @end smallexample | |
641 | ||
b07d03e0 UD |
642 | Please note that there is no function named @code{write64}. This is not |
643 | necessary since this function does not directly modify or handle the | |
644 | possibly wide file offset. Since the kernel handles this state | |
645 | internally the @code{write} function can be used for all cases. | |
646 | ||
04b9968b | 647 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
648 | is a problem if the thread allocates some resources (like memory, file |
649 | descriptors, semaphores or whatever) at the time @code{write} is | |
19e4c7dd | 650 | called. If the thread gets canceled these resources stay allocated |
04b9968b UD |
651 | until the program ends. To avoid this, calls to @code{write} should be |
652 | protected using cancellation handlers. | |
dfd2257a UD |
653 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
654 | ||
28f540f4 RM |
655 | The @code{write} function is the underlying primitive for all of the |
656 | functions that write to streams, such as @code{fputc}. | |
657 | @end deftypefun | |
658 | ||
a5a0310d | 659 | @deftypefun ssize_t pwrite (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off_t @var{offset}) |
d08a7e4c | 660 | @standards{Unix98, unistd.h} |
2cc3615c AO |
661 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
662 | @c This is usually a safe syscall. The sysdeps/posix fallback emulation | |
663 | @c is not MT-Safe because it uses lseek, write and lseek back, but is it | |
664 | @c used anywhere? | |
a5a0310d | 665 | The @code{pwrite} function is similar to the @code{write} function. The |
04b9968b UD |
666 | first three arguments are identical, and the return values and error codes |
667 | also correspond. | |
a5a0310d UD |
668 | |
669 | The difference is the fourth argument and its handling. The data block | |
670 | is not written to the current position of the file descriptor | |
671 | @code{filedes}. Instead the data is written to the file starting at | |
672 | position @var{offset}. The position of the file descriptor itself is | |
04b9968b | 673 | not affected by the operation. The value is the same as before the call. |
a5a0310d | 674 | |
717da4b3 AZ |
675 | However, on Linux, if a file is opened with @code{O_APPEND}, @code{pwrite} |
676 | appends data to the end of the file, regardless of the value of | |
677 | @code{offset}. | |
678 | ||
b07d03e0 UD |
679 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
680 | @code{pwrite} function is in fact @code{pwrite64} and the type | |
04b9968b | 681 | @code{off_t} has 64 bits, which makes it possible to handle files up to |
9ceeb279 | 682 | @twoexp{63} bytes in length. |
b07d03e0 | 683 | |
a5a0310d UD |
684 | The return value of @code{pwrite} describes the number of written bytes. |
685 | In the error case it returns @math{-1} like @code{write} does and the | |
04b9968b UD |
686 | error codes are also the same, with these additions: |
687 | ||
a5a0310d UD |
688 | @table @code |
689 | @item EINVAL | |
690 | The value given for @var{offset} is negative and therefore illegal. | |
691 | ||
692 | @item ESPIPE | |
04b9968b | 693 | The file descriptor @var{filedes} is associated with a pipe or a FIFO and |
a5a0310d UD |
694 | this device does not allow positioning of the file pointer. |
695 | @end table | |
696 | ||
697 | The function is an extension defined in the Unix Single Specification | |
698 | version 2. | |
699 | @end deftypefun | |
700 | ||
b07d03e0 | 701 | @deftypefun ssize_t pwrite64 (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) |
d08a7e4c | 702 | @standards{Unix98, unistd.h} |
2cc3615c AO |
703 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
704 | @c This is usually a safe syscall. The sysdeps/posix fallback emulation | |
705 | @c is not MT-Safe because it uses lseek64, write and lseek64 back, but | |
706 | @c is it used anywhere? | |
b07d03e0 UD |
707 | This function is similar to the @code{pwrite} function. The difference |
708 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
04b9968b | 709 | @code{off_t} which makes it possible on 32 bit machines to address |
9ceeb279 | 710 | files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The |
b07d03e0 UD |
711 | file descriptor @code{filedes} must be opened using @code{open64} since |
712 | otherwise the large offsets possible with @code{off64_t} will lead to | |
713 | errors with a descriptor in small file mode. | |
714 | ||
c756c71c | 715 | When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a |
04b9968b UD |
716 | 32 bit machine this function is actually available under the name |
717 | @code{pwrite} and so transparently replaces the 32 bit interface. | |
b07d03e0 | 718 | @end deftypefun |
717da4b3 | 719 | |
28f540f4 RM |
720 | @node File Position Primitive |
721 | @section Setting the File Position of a Descriptor | |
722 | ||
723 | Just as you can set the file position of a stream with @code{fseek}, you | |
724 | can set the file position of a descriptor with @code{lseek}. This | |
725 | specifies the position in the file for the next @code{read} or | |
726 | @code{write} operation. @xref{File Positioning}, for more information | |
727 | on the file position and what it means. | |
728 | ||
729 | To read the current file position value from a descriptor, use | |
730 | @code{lseek (@var{desc}, 0, SEEK_CUR)}. | |
731 | ||
732 | @cindex file positioning on a file descriptor | |
733 | @cindex positioning a file descriptor | |
734 | @cindex seeking on a file descriptor | |
28f540f4 | 735 | @deftypefun off_t lseek (int @var{filedes}, off_t @var{offset}, int @var{whence}) |
d08a7e4c | 736 | @standards{POSIX.1, unistd.h} |
2cc3615c | 737 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
738 | The @code{lseek} function is used to change the file position of the |
739 | file with descriptor @var{filedes}. | |
740 | ||
741 | The @var{whence} argument specifies how the @var{offset} should be | |
04b9968b UD |
742 | interpreted, in the same way as for the @code{fseek} function, and it must |
743 | be one of the symbolic constants @code{SEEK_SET}, @code{SEEK_CUR}, or | |
28f540f4 RM |
744 | @code{SEEK_END}. |
745 | ||
2fe82ca6 | 746 | @vtable @code |
28f540f4 | 747 | @item SEEK_SET |
4dad7bab | 748 | Specifies that @var{offset} is a count of characters from the beginning |
28f540f4 RM |
749 | of the file. |
750 | ||
751 | @item SEEK_CUR | |
4dad7bab | 752 | Specifies that @var{offset} is a count of characters from the current |
28f540f4 RM |
753 | file position. This count may be positive or negative. |
754 | ||
755 | @item SEEK_END | |
4dad7bab | 756 | Specifies that @var{offset} is a count of characters from the end of |
28f540f4 RM |
757 | the file. A negative count specifies a position within the current |
758 | extent of the file; a positive count specifies a position past the | |
2c6fe0bd | 759 | current end. If you set the position past the current end, and |
28f540f4 | 760 | actually write data, you will extend the file with zeros up to that |
336dfb2d | 761 | position. |
2fe82ca6 | 762 | @end vtable |
28f540f4 RM |
763 | |
764 | The return value from @code{lseek} is normally the resulting file | |
765 | position, measured in bytes from the beginning of the file. | |
766 | You can use this feature together with @code{SEEK_CUR} to read the | |
767 | current file position. | |
768 | ||
769 | If you want to append to the file, setting the file position to the | |
770 | current end of file with @code{SEEK_END} is not sufficient. Another | |
771 | process may write more data after you seek but before you write, | |
772 | extending the file so the position you write onto clobbers their data. | |
773 | Instead, use the @code{O_APPEND} operating mode; @pxref{Operating Modes}. | |
774 | ||
775 | You can set the file position past the current end of the file. This | |
776 | does not by itself make the file longer; @code{lseek} never changes the | |
777 | file. But subsequent output at that position will extend the file. | |
778 | Characters between the previous end of file and the new position are | |
779 | filled with zeros. Extending the file in this way can create a | |
780 | ``hole'': the blocks of zeros are not actually allocated on disk, so the | |
78759725 | 781 | file takes up less space than it appears to; it is then called a |
28f540f4 RM |
782 | ``sparse file''. |
783 | @cindex sparse files | |
784 | @cindex holes in files | |
785 | ||
786 | If the file position cannot be changed, or the operation is in some way | |
07435eb4 | 787 | invalid, @code{lseek} returns a value of @math{-1}. The following |
28f540f4 RM |
788 | @code{errno} error conditions are defined for this function: |
789 | ||
790 | @table @code | |
791 | @item EBADF | |
792 | The @var{filedes} is not a valid file descriptor. | |
793 | ||
794 | @item EINVAL | |
795 | The @var{whence} argument value is not valid, or the resulting | |
796 | file offset is not valid. A file offset is invalid. | |
797 | ||
798 | @item ESPIPE | |
799 | The @var{filedes} corresponds to an object that cannot be positioned, | |
800 | such as a pipe, FIFO or terminal device. (POSIX.1 specifies this error | |
a7a93d50 | 801 | only for pipes and FIFOs, but on @gnusystems{}, you always get |
28f540f4 RM |
802 | @code{ESPIPE} if the object is not seekable.) |
803 | @end table | |
804 | ||
b07d03e0 UD |
805 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
806 | @code{lseek} function is in fact @code{lseek64} and the type | |
807 | @code{off_t} has 64 bits which makes it possible to handle files up to | |
9ceeb279 | 808 | @twoexp{63} bytes in length. |
b07d03e0 | 809 | |
04b9968b | 810 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
811 | is a problem if the thread allocates some resources (like memory, file |
812 | descriptors, semaphores or whatever) at the time @code{lseek} is | |
19e4c7dd | 813 | called. If the thread gets canceled these resources stay allocated |
dfd2257a | 814 | until the program ends. To avoid this calls to @code{lseek} should be |
04b9968b | 815 | protected using cancellation handlers. |
dfd2257a UD |
816 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
817 | ||
28f540f4 | 818 | The @code{lseek} function is the underlying primitive for the |
dfd2257a UD |
819 | @code{fseek}, @code{fseeko}, @code{ftell}, @code{ftello} and |
820 | @code{rewind} functions, which operate on streams instead of file | |
821 | descriptors. | |
28f540f4 RM |
822 | @end deftypefun |
823 | ||
b07d03e0 | 824 | @deftypefun off64_t lseek64 (int @var{filedes}, off64_t @var{offset}, int @var{whence}) |
d08a7e4c | 825 | @standards{Unix98, unistd.h} |
2cc3615c | 826 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b07d03e0 UD |
827 | This function is similar to the @code{lseek} function. The difference |
828 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
04b9968b | 829 | @code{off_t} which makes it possible on 32 bit machines to address |
9ceeb279 | 830 | files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The |
b07d03e0 UD |
831 | file descriptor @code{filedes} must be opened using @code{open64} since |
832 | otherwise the large offsets possible with @code{off64_t} will lead to | |
833 | errors with a descriptor in small file mode. | |
834 | ||
c756c71c | 835 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a |
b07d03e0 | 836 | 32 bits machine this function is actually available under the name |
04b9968b | 837 | @code{lseek} and so transparently replaces the 32 bit interface. |
b07d03e0 UD |
838 | @end deftypefun |
839 | ||
28f540f4 | 840 | You can have multiple descriptors for the same file if you open the file |
2c6fe0bd | 841 | more than once, or if you duplicate a descriptor with @code{dup}. |
28f540f4 RM |
842 | Descriptors that come from separate calls to @code{open} have independent |
843 | file positions; using @code{lseek} on one descriptor has no effect on the | |
2c6fe0bd | 844 | other. For example, |
28f540f4 RM |
845 | |
846 | @smallexample | |
847 | @group | |
848 | @{ | |
849 | int d1, d2; | |
850 | char buf[4]; | |
851 | d1 = open ("foo", O_RDONLY); | |
852 | d2 = open ("foo", O_RDONLY); | |
853 | lseek (d1, 1024, SEEK_SET); | |
854 | read (d2, buf, 4); | |
855 | @} | |
856 | @end group | |
857 | @end smallexample | |
858 | ||
859 | @noindent | |
860 | will read the first four characters of the file @file{foo}. (The | |
861 | error-checking code necessary for a real program has been omitted here | |
862 | for brevity.) | |
863 | ||
864 | By contrast, descriptors made by duplication share a common file | |
865 | position with the original descriptor that was duplicated. Anything | |
866 | which alters the file position of one of the duplicates, including | |
867 | reading or writing data, affects all of them alike. Thus, for example, | |
868 | ||
869 | @smallexample | |
870 | @{ | |
871 | int d1, d2, d3; | |
872 | char buf1[4], buf2[4]; | |
873 | d1 = open ("foo", O_RDONLY); | |
874 | d2 = dup (d1); | |
875 | d3 = dup (d2); | |
876 | lseek (d3, 1024, SEEK_SET); | |
877 | read (d1, buf1, 4); | |
878 | read (d2, buf2, 4); | |
879 | @} | |
880 | @end smallexample | |
881 | ||
882 | @noindent | |
883 | will read four characters starting with the 1024'th character of | |
884 | @file{foo}, and then four more characters starting with the 1028'th | |
885 | character. | |
886 | ||
28f540f4 | 887 | @deftp {Data Type} off_t |
d08a7e4c | 888 | @standards{POSIX.1, sys/types.h} |
07e12bb3 JM |
889 | This is a signed integer type used to represent file sizes. In |
890 | @theglibc{}, this type is no narrower than @code{int}. | |
a3a4a74e UD |
891 | |
892 | If the source is compiled with @code{_FILE_OFFSET_BITS == 64} this type | |
893 | is transparently replaced by @code{off64_t}. | |
28f540f4 RM |
894 | @end deftp |
895 | ||
b07d03e0 | 896 | @deftp {Data Type} off64_t |
d08a7e4c | 897 | @standards{Unix98, sys/types.h} |
b07d03e0 | 898 | This type is used similar to @code{off_t}. The difference is that even |
04b9968b | 899 | on 32 bit machines, where the @code{off_t} type would have 32 bits, |
b07d03e0 | 900 | @code{off64_t} has 64 bits and so is able to address files up to |
9ceeb279 | 901 | @twoexp{63} bytes in length. |
a3a4a74e UD |
902 | |
903 | When compiling with @code{_FILE_OFFSET_BITS == 64} this type is | |
904 | available under the name @code{off_t}. | |
b07d03e0 UD |
905 | @end deftp |
906 | ||
28f540f4 RM |
907 | These aliases for the @samp{SEEK_@dots{}} constants exist for the sake |
908 | of compatibility with older BSD systems. They are defined in two | |
909 | different header files: @file{fcntl.h} and @file{sys/file.h}. | |
910 | ||
2fe82ca6 | 911 | @vtable @code |
28f540f4 RM |
912 | @item L_SET |
913 | An alias for @code{SEEK_SET}. | |
914 | ||
915 | @item L_INCR | |
916 | An alias for @code{SEEK_CUR}. | |
917 | ||
918 | @item L_XTND | |
919 | An alias for @code{SEEK_END}. | |
2fe82ca6 | 920 | @end vtable |
28f540f4 RM |
921 | |
922 | @node Descriptors and Streams | |
923 | @section Descriptors and Streams | |
924 | @cindex streams, and file descriptors | |
925 | @cindex converting file descriptor to stream | |
926 | @cindex extracting file descriptor from stream | |
927 | ||
928 | Given an open file descriptor, you can create a stream for it with the | |
929 | @code{fdopen} function. You can get the underlying file descriptor for | |
930 | an existing stream with the @code{fileno} function. These functions are | |
931 | declared in the header file @file{stdio.h}. | |
932 | @pindex stdio.h | |
933 | ||
28f540f4 | 934 | @deftypefun {FILE *} fdopen (int @var{filedes}, const char *@var{opentype}) |
d08a7e4c | 935 | @standards{POSIX.1, stdio.h} |
2cc3615c | 936 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{} @asulock{}}@acunsafe{@acsmem{} @aculock{}}} |
28f540f4 RM |
937 | The @code{fdopen} function returns a new stream for the file descriptor |
938 | @var{filedes}. | |
939 | ||
940 | The @var{opentype} argument is interpreted in the same way as for the | |
941 | @code{fopen} function (@pxref{Opening Streams}), except that | |
a7a93d50 | 942 | the @samp{b} option is not permitted; this is because @gnusystems{} make no |
28f540f4 | 943 | distinction between text and binary files. Also, @code{"w"} and |
04b9968b | 944 | @code{"w+"} do not cause truncation of the file; these have an effect only |
28f540f4 RM |
945 | when opening a file, and in this case the file has already been opened. |
946 | You must make sure that the @var{opentype} argument matches the actual | |
947 | mode of the open file descriptor. | |
948 | ||
949 | The return value is the new stream. If the stream cannot be created | |
950 | (for example, if the modes for the file indicated by the file descriptor | |
951 | do not permit the access specified by the @var{opentype} argument), a | |
952 | null pointer is returned instead. | |
953 | ||
954 | In some other systems, @code{fdopen} may fail to detect that the modes | |
9739d2d5 | 955 | for file descriptors do not permit the access specified by |
1f77f049 | 956 | @code{opentype}. @Theglibc{} always checks for this. |
28f540f4 RM |
957 | @end deftypefun |
958 | ||
959 | For an example showing the use of the @code{fdopen} function, | |
960 | see @ref{Creating a Pipe}. | |
961 | ||
28f540f4 | 962 | @deftypefun int fileno (FILE *@var{stream}) |
d08a7e4c | 963 | @standards{POSIX.1, stdio.h} |
2cc3615c | 964 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
965 | This function returns the file descriptor associated with the stream |
966 | @var{stream}. If an error is detected (for example, if the @var{stream} | |
967 | is not valid) or if @var{stream} does not do I/O to a file, | |
07435eb4 | 968 | @code{fileno} returns @math{-1}. |
28f540f4 RM |
969 | @end deftypefun |
970 | ||
7b4161bb | 971 | @deftypefun int fileno_unlocked (FILE *@var{stream}) |
d08a7e4c | 972 | @standards{GNU, stdio.h} |
2cc3615c | 973 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
7b4161bb UD |
974 | The @code{fileno_unlocked} function is equivalent to the @code{fileno} |
975 | function except that it does not implicitly lock the stream if the state | |
976 | is @code{FSETLOCKING_INTERNAL}. | |
977 | ||
978 | This function is a GNU extension. | |
979 | @end deftypefun | |
980 | ||
28f540f4 RM |
981 | @cindex standard file descriptors |
982 | @cindex file descriptors, standard | |
983 | There are also symbolic constants defined in @file{unistd.h} for the | |
984 | file descriptors belonging to the standard streams @code{stdin}, | |
985 | @code{stdout}, and @code{stderr}; see @ref{Standard Streams}. | |
986 | @pindex unistd.h | |
987 | ||
2fe82ca6 | 988 | @vtable @code |
28f540f4 | 989 | @item STDIN_FILENO |
d08a7e4c | 990 | @standards{POSIX.1, unistd.h} |
28f540f4 RM |
991 | This macro has value @code{0}, which is the file descriptor for |
992 | standard input. | |
993 | @cindex standard input file descriptor | |
994 | ||
28f540f4 | 995 | @item STDOUT_FILENO |
d08a7e4c | 996 | @standards{POSIX.1, unistd.h} |
28f540f4 RM |
997 | This macro has value @code{1}, which is the file descriptor for |
998 | standard output. | |
999 | @cindex standard output file descriptor | |
1000 | ||
28f540f4 | 1001 | @item STDERR_FILENO |
d08a7e4c | 1002 | @standards{POSIX.1, unistd.h} |
28f540f4 RM |
1003 | This macro has value @code{2}, which is the file descriptor for |
1004 | standard error output. | |
2fe82ca6 | 1005 | @end vtable |
28f540f4 RM |
1006 | @cindex standard error file descriptor |
1007 | ||
1008 | @node Stream/Descriptor Precautions | |
1009 | @section Dangers of Mixing Streams and Descriptors | |
1010 | @cindex channels | |
1011 | @cindex streams and descriptors | |
1012 | @cindex descriptors and streams | |
1013 | @cindex mixing descriptors and streams | |
1014 | ||
1015 | You can have multiple file descriptors and streams (let's call both | |
1016 | streams and descriptors ``channels'' for short) connected to the same | |
1017 | file, but you must take care to avoid confusion between channels. There | |
1018 | are two cases to consider: @dfn{linked} channels that share a single | |
1019 | file position value, and @dfn{independent} channels that have their own | |
1020 | file positions. | |
1021 | ||
1022 | It's best to use just one channel in your program for actual data | |
1023 | transfer to any given file, except when all the access is for input. | |
1024 | For example, if you open a pipe (something you can only do at the file | |
1025 | descriptor level), either do all I/O with the descriptor, or construct a | |
1026 | stream from the descriptor with @code{fdopen} and then do all I/O with | |
1027 | the stream. | |
1028 | ||
1029 | @menu | |
1030 | * Linked Channels:: Dealing with channels sharing a file position. | |
1031 | * Independent Channels:: Dealing with separately opened, unlinked channels. | |
2c6fe0bd | 1032 | * Cleaning Streams:: Cleaning a stream makes it safe to use |
28f540f4 RM |
1033 | another channel. |
1034 | @end menu | |
1035 | ||
1036 | @node Linked Channels | |
1037 | @subsection Linked Channels | |
1038 | @cindex linked channels | |
1039 | ||
1040 | Channels that come from a single opening share the same file position; | |
1041 | we call them @dfn{linked} channels. Linked channels result when you | |
1042 | make a stream from a descriptor using @code{fdopen}, when you get a | |
1043 | descriptor from a stream with @code{fileno}, when you copy a descriptor | |
1044 | with @code{dup} or @code{dup2}, and when descriptors are inherited | |
1045 | during @code{fork}. For files that don't support random access, such as | |
1046 | terminals and pipes, @emph{all} channels are effectively linked. On | |
1047 | random-access files, all append-type output streams are effectively | |
1048 | linked to each other. | |
1049 | ||
1050 | @cindex cleaning up a stream | |
0295d266 UD |
1051 | If you have been using a stream for I/O (or have just opened the stream), |
1052 | and you want to do I/O using | |
28f540f4 RM |
1053 | another channel (either a stream or a descriptor) that is linked to it, |
1054 | you must first @dfn{clean up} the stream that you have been using. | |
1055 | @xref{Cleaning Streams}. | |
1056 | ||
1057 | Terminating a process, or executing a new program in the process, | |
1058 | destroys all the streams in the process. If descriptors linked to these | |
1059 | streams persist in other processes, their file positions become | |
1060 | undefined as a result. To prevent this, you must clean up the streams | |
1061 | before destroying them. | |
1062 | ||
1063 | @node Independent Channels | |
1064 | @subsection Independent Channels | |
1065 | @cindex independent channels | |
1066 | ||
1067 | When you open channels (streams or descriptors) separately on a seekable | |
1068 | file, each channel has its own file position. These are called | |
1069 | @dfn{independent channels}. | |
1070 | ||
1071 | The system handles each channel independently. Most of the time, this | |
1072 | is quite predictable and natural (especially for input): each channel | |
1073 | can read or write sequentially at its own place in the file. However, | |
1074 | if some of the channels are streams, you must take these precautions: | |
1075 | ||
1076 | @itemize @bullet | |
1077 | @item | |
1078 | You should clean an output stream after use, before doing anything else | |
1079 | that might read or write from the same part of the file. | |
1080 | ||
1081 | @item | |
1082 | You should clean an input stream before reading data that may have been | |
1083 | modified using an independent channel. Otherwise, you might read | |
1084 | obsolete data that had been in the stream's buffer. | |
1085 | @end itemize | |
1086 | ||
1087 | If you do output to one channel at the end of the file, this will | |
1088 | certainly leave the other independent channels positioned somewhere | |
1089 | before the new end. You cannot reliably set their file positions to the | |
1090 | new end of file before writing, because the file can always be extended | |
1091 | by another process between when you set the file position and when you | |
1092 | write the data. Instead, use an append-type descriptor or stream; they | |
1093 | always output at the current end of the file. In order to make the | |
1094 | end-of-file position accurate, you must clean the output channel you | |
1095 | were using, if it is a stream. | |
1096 | ||
1097 | It's impossible for two channels to have separate file pointers for a | |
1098 | file that doesn't support random access. Thus, channels for reading or | |
1099 | writing such files are always linked, never independent. Append-type | |
1100 | channels are also always linked. For these channels, follow the rules | |
1101 | for linked channels; see @ref{Linked Channels}. | |
1102 | ||
1103 | @node Cleaning Streams | |
1104 | @subsection Cleaning Streams | |
1105 | ||
6664049b | 1106 | You can use @code{fflush} to clean a stream in most |
28f540f4 RM |
1107 | cases. |
1108 | ||
6664049b | 1109 | You can skip the @code{fflush} if you know the stream |
28f540f4 RM |
1110 | is already clean. A stream is clean whenever its buffer is empty. For |
1111 | example, an unbuffered stream is always clean. An input stream that is | |
1112 | at end-of-file is clean. A line-buffered stream is clean when the last | |
0295d266 UD |
1113 | character output was a newline. However, a just-opened input stream |
1114 | might not be clean, as its input buffer might not be empty. | |
28f540f4 RM |
1115 | |
1116 | There is one case in which cleaning a stream is impossible on most | |
1117 | systems. This is when the stream is doing input from a file that is not | |
1118 | random-access. Such streams typically read ahead, and when the file is | |
1119 | not random access, there is no way to give back the excess data already | |
1120 | read. When an input stream reads from a random-access file, | |
1121 | @code{fflush} does clean the stream, but leaves the file pointer at an | |
1122 | unpredictable place; you must set the file pointer before doing any | |
6664049b | 1123 | further I/O. |
28f540f4 RM |
1124 | |
1125 | Closing an output-only stream also does @code{fflush}, so this is a | |
6664049b | 1126 | valid way of cleaning an output stream. |
28f540f4 RM |
1127 | |
1128 | You need not clean a stream before using its descriptor for control | |
1129 | operations such as setting terminal modes; these operations don't affect | |
1130 | the file position and are not affected by it. You can use any | |
1131 | descriptor for these operations, and all channels are affected | |
1132 | simultaneously. However, text already ``output'' to a stream but still | |
1133 | buffered by the stream will be subject to the new terminal modes when | |
1134 | subsequently flushed. To make sure ``past'' output is covered by the | |
1135 | terminal settings that were in effect at the time, flush the output | |
1136 | streams for that terminal before setting the modes. @xref{Terminal | |
1137 | Modes}. | |
1138 | ||
07435eb4 UD |
1139 | @node Scatter-Gather |
1140 | @section Fast Scatter-Gather I/O | |
1141 | @cindex scatter-gather | |
1142 | ||
1143 | Some applications may need to read or write data to multiple buffers, | |
04b9968b | 1144 | which are separated in memory. Although this can be done easily enough |
19e4c7dd | 1145 | with multiple calls to @code{read} and @code{write}, it is inefficient |
07435eb4 UD |
1146 | because there is overhead associated with each kernel call. |
1147 | ||
1148 | Instead, many platforms provide special high-speed primitives to perform | |
1f77f049 JM |
1149 | these @dfn{scatter-gather} operations in a single kernel call. @Theglibc{} |
1150 | will provide an emulation on any system that lacks these | |
07435eb4 UD |
1151 | primitives, so they are not a portability threat. They are defined in |
1152 | @code{sys/uio.h}. | |
1153 | ||
1154 | These functions are controlled with arrays of @code{iovec} structures, | |
1155 | which describe the location and size of each buffer. | |
1156 | ||
1157 | @deftp {Data Type} {struct iovec} | |
d08a7e4c | 1158 | @standards{BSD, sys/uio.h} |
07435eb4 | 1159 | |
cf822e3c | 1160 | The @code{iovec} structure describes a buffer. It contains two fields: |
07435eb4 UD |
1161 | |
1162 | @table @code | |
1163 | ||
1164 | @item void *iov_base | |
1165 | Contains the address of a buffer. | |
1166 | ||
1167 | @item size_t iov_len | |
1168 | Contains the length of the buffer. | |
1169 | ||
1170 | @end table | |
1171 | @end deftp | |
1172 | ||
1173 | @deftypefun ssize_t readv (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) | |
d08a7e4c | 1174 | @standards{BSD, sys/uio.h} |
2cc3615c AO |
1175 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
1176 | @c The fallback sysdeps/posix implementation, used even on GNU/Linux | |
1177 | @c with old kernels that lack a full readv/writev implementation, may | |
1178 | @c malloc the buffer into which data is read, if the total read size is | |
1179 | @c too large for alloca. | |
07435eb4 UD |
1180 | |
1181 | The @code{readv} function reads data from @var{filedes} and scatters it | |
1182 | into the buffers described in @var{vector}, which is taken to be | |
1183 | @var{count} structures long. As each buffer is filled, data is sent to the | |
1184 | next. | |
1185 | ||
1186 | Note that @code{readv} is not guaranteed to fill all the buffers. | |
1187 | It may stop at any point, for the same reasons @code{read} would. | |
1188 | ||
1189 | The return value is a count of bytes (@emph{not} buffers) read, @math{0} | |
1190 | indicating end-of-file, or @math{-1} indicating an error. The possible | |
1191 | errors are the same as in @code{read}. | |
1192 | ||
1193 | @end deftypefun | |
1194 | ||
1195 | @deftypefun ssize_t writev (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) | |
d08a7e4c | 1196 | @standards{BSD, sys/uio.h} |
2cc3615c AO |
1197 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
1198 | @c The fallback sysdeps/posix implementation, used even on GNU/Linux | |
1199 | @c with old kernels that lack a full readv/writev implementation, may | |
1200 | @c malloc the buffer from which data is written, if the total write size | |
1201 | @c is too large for alloca. | |
07435eb4 UD |
1202 | |
1203 | The @code{writev} function gathers data from the buffers described in | |
1204 | @var{vector}, which is taken to be @var{count} structures long, and writes | |
1205 | them to @code{filedes}. As each buffer is written, it moves on to the | |
1206 | next. | |
1207 | ||
1208 | Like @code{readv}, @code{writev} may stop midstream under the same | |
1209 | conditions @code{write} would. | |
1210 | ||
1211 | The return value is a count of bytes written, or @math{-1} indicating an | |
1212 | error. The possible errors are the same as in @code{write}. | |
1213 | ||
1214 | @end deftypefun | |
1215 | ||
f6e965ee FW |
1216 | @deftypefun ssize_t preadv (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off_t @var{offset}) |
1217 | @standards{BSD, sys/uio.h} | |
1218 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1219 | @c This is a syscall for Linux 3.2 for all architectures but microblaze | |
1220 | @c (which was added on 3.15). The sysdeps/posix fallback emulation | |
1221 | @c is also MT-Safe since it calls pread, and it is now a syscall on all | |
1222 | @c targets. | |
1223 | ||
1224 | This function is similar to the @code{readv} function, with the difference | |
1225 | it adds an extra @var{offset} parameter of type @code{off_t} similar to | |
b156c5f0 | 1226 | @code{pread}. The data is read from the file starting at position |
f6e965ee FW |
1227 | @var{offset}. The position of the file descriptor itself is not affected |
1228 | by the operation. The value is the same as before the call. | |
1229 | ||
1230 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the | |
1231 | @code{preadv} function is in fact @code{preadv64} and the type | |
1232 | @code{off_t} has 64 bits, which makes it possible to handle files up to | |
1233 | @twoexp{63} bytes in length. | |
1234 | ||
1235 | The return value is a count of bytes (@emph{not} buffers) read, @math{0} | |
1236 | indicating end-of-file, or @math{-1} indicating an error. The possible | |
1237 | errors are the same as in @code{readv} and @code{pread}. | |
1238 | @end deftypefun | |
1239 | ||
1240 | @deftypefun ssize_t preadv64 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off64_t @var{offset}) | |
1241 | @standards{BSD, unistd.h} | |
1242 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1243 | @c This is a syscall for Linux 3.2 for all architectures but microblaze | |
1244 | @c (which was added on 3.15). The sysdeps/posix fallback emulation | |
1245 | @c is also MT-Safe since it calls pread64, and it is now a syscall on all | |
1246 | @c targets. | |
1247 | ||
1248 | This function is similar to the @code{preadv} function with the difference | |
1249 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
1250 | @code{off_t}. It makes it possible on 32 bit machines to address | |
1251 | files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The | |
1252 | file descriptor @code{filedes} must be opened using @code{open64} since | |
1253 | otherwise the large offsets possible with @code{off64_t} will lead to | |
1254 | errors with a descriptor in small file mode. | |
1255 | ||
1256 | When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a | |
1257 | 32 bit machine this function is actually available under the name | |
1258 | @code{preadv} and so transparently replaces the 32 bit interface. | |
1259 | @end deftypefun | |
1260 | ||
1261 | @deftypefun ssize_t pwritev (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off_t @var{offset}) | |
1262 | @standards{BSD, sys/uio.h} | |
1263 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1264 | @c This is a syscall for Linux 3.2 for all architectures but microblaze | |
1265 | @c (which was added on 3.15). The sysdeps/posix fallback emulation | |
1266 | @c is also MT-Safe since it calls pwrite, and it is now a syscall on all | |
1267 | @c targets. | |
1268 | ||
1269 | This function is similar to the @code{writev} function, with the difference | |
1270 | it adds an extra @var{offset} parameter of type @code{off_t} similar to | |
1271 | @code{pwrite}. The data is written to the file starting at position | |
1272 | @var{offset}. The position of the file descriptor itself is not affected | |
1273 | by the operation. The value is the same as before the call. | |
1274 | ||
1275 | However, on Linux, if a file is opened with @code{O_APPEND}, @code{pwrite} | |
1276 | appends data to the end of the file, regardless of the value of | |
1277 | @code{offset}. | |
1278 | ||
1279 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the | |
1280 | @code{pwritev} function is in fact @code{pwritev64} and the type | |
1281 | @code{off_t} has 64 bits, which makes it possible to handle files up to | |
1282 | @twoexp{63} bytes in length. | |
1283 | ||
1284 | The return value is a count of bytes (@emph{not} buffers) written, @math{0} | |
1285 | indicating end-of-file, or @math{-1} indicating an error. The possible | |
1286 | errors are the same as in @code{writev} and @code{pwrite}. | |
1287 | @end deftypefun | |
1288 | ||
1289 | @deftypefun ssize_t pwritev64 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off64_t @var{offset}) | |
1290 | @standards{BSD, unistd.h} | |
1291 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1292 | @c This is a syscall for Linux 3.2 for all architectures but microblaze | |
1293 | @c (which was added on 3.15). The sysdeps/posix fallback emulation | |
1294 | @c is also MT-Safe since it calls pwrite64, and it is now a syscall on all | |
1295 | @c targets. | |
1296 | ||
1297 | This function is similar to the @code{pwritev} function with the difference | |
1298 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
1299 | @code{off_t}. It makes it possible on 32 bit machines to address | |
1300 | files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The | |
1301 | file descriptor @code{filedes} must be opened using @code{open64} since | |
1302 | otherwise the large offsets possible with @code{off64_t} will lead to | |
1303 | errors with a descriptor in small file mode. | |
1304 | ||
1305 | When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a | |
1306 | 32 bit machine this function is actually available under the name | |
1307 | @code{pwritev} and so transparently replaces the 32 bit interface. | |
1308 | @end deftypefun | |
1309 | ||
1310 | @deftypefun ssize_t preadv2 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off_t @var{offset}, int @var{flags}) | |
1311 | @standards{GNU, sys/uio.h} | |
1312 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1313 | @c This is a syscall for Linux v4.6. The sysdeps/posix fallback emulation | |
1314 | @c is also MT-Safe since it calls preadv. | |
1315 | ||
d4b4a00a FW |
1316 | This function is similar to the @code{preadv} function, with the |
1317 | difference it adds an extra @var{flags} parameter of type @code{int}. | |
1318 | Additionally, if @var{offset} is @math{-1}, the current file position | |
1319 | is used and updated (like the @code{readv} function). | |
1320 | ||
1321 | The supported @var{flags} are dependent of the underlying system. For | |
1322 | Linux it supports: | |
f6e965ee FW |
1323 | |
1324 | @vtable @code | |
1325 | @item RWF_HIPRI | |
1326 | High priority request. This adds a flag that tells the file system that | |
1327 | this is a high priority request for which it is worth to poll the hardware. | |
1328 | The flag is purely advisory and can be ignored if not supported. The | |
1329 | @var{fd} must be opened using @code{O_DIRECT}. | |
1330 | ||
1331 | @item RWF_DSYNC | |
1332 | Per-IO synchronization as if the file was opened with @code{O_DSYNC} flag. | |
1333 | ||
1334 | @item RWF_SYNC | |
1335 | Per-IO synchronization as if the file was opened with @code{O_SYNC} flag. | |
1336 | ||
1337 | @item RWF_NOWAIT | |
1338 | Use nonblocking mode for this operation; that is, this call to @code{preadv2} | |
1339 | will fail and set @code{errno} to @code{EAGAIN} if the operation would block. | |
f2652643 L |
1340 | |
1341 | @item RWF_APPEND | |
1342 | Per-IO synchronization as if the file was opened with @code{O_APPEND} flag. | |
3db9d208 SH |
1343 | |
1344 | @item RWF_NOAPPEND | |
1345 | This flag allows an offset to be honored, even if the file was opened with | |
1346 | @code{O_APPEND} flag. | |
f6e965ee FW |
1347 | @end vtable |
1348 | ||
1349 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the | |
1350 | @code{preadv2} function is in fact @code{preadv64v2} and the type | |
1351 | @code{off_t} has 64 bits, which makes it possible to handle files up to | |
1352 | @twoexp{63} bytes in length. | |
1353 | ||
1354 | The return value is a count of bytes (@emph{not} buffers) read, @math{0} | |
1355 | indicating end-of-file, or @math{-1} indicating an error. The possible | |
1356 | errors are the same as in @code{preadv} with the addition of: | |
1357 | ||
1358 | @table @code | |
1359 | ||
1360 | @item EOPNOTSUPP | |
1361 | ||
1362 | @c The default sysdeps/posix code will return it for any flags value | |
1363 | @c different than 0. | |
1364 | An unsupported @var{flags} was used. | |
1365 | ||
1366 | @end table | |
1367 | ||
1368 | @end deftypefun | |
1369 | ||
1370 | @deftypefun ssize_t preadv64v2 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off64_t @var{offset}, int @var{flags}) | |
1371 | @standards{GNU, unistd.h} | |
1372 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1373 | @c This is a syscall for Linux v4.6. The sysdeps/posix fallback emulation | |
1374 | @c is also MT-Safe since it calls preadv. | |
07435eb4 | 1375 | |
f6e965ee FW |
1376 | This function is similar to the @code{preadv2} function with the difference |
1377 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
1378 | @code{off_t}. It makes it possible on 32 bit machines to address | |
1379 | files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The | |
1380 | file descriptor @code{filedes} must be opened using @code{open64} since | |
1381 | otherwise the large offsets possible with @code{off64_t} will lead to | |
1382 | errors with a descriptor in small file mode. | |
1383 | ||
1384 | When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a | |
1385 | 32 bit machine this function is actually available under the name | |
1386 | @code{preadv2} and so transparently replaces the 32 bit interface. | |
1387 | @end deftypefun | |
1388 | ||
1389 | ||
1390 | @deftypefun ssize_t pwritev2 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off_t @var{offset}, int @var{flags}) | |
1391 | @standards{GNU, sys/uio.h} | |
1392 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1393 | @c This is a syscall for Linux v4.6. The sysdeps/posix fallback emulation | |
1394 | @c is also MT-Safe since it calls pwritev. | |
1395 | ||
d4b4a00a FW |
1396 | This function is similar to the @code{pwritev} function, with the |
1397 | difference it adds an extra @var{flags} parameter of type @code{int}. | |
1398 | Additionally, if @var{offset} is @math{-1}, the current file position | |
1399 | should is used and updated (like the @code{writev} function). | |
1400 | ||
1401 | The supported @var{flags} are dependent of the underlying system. For | |
1402 | Linux, the supported flags are the same as those for @code{preadv2}. | |
f6e965ee FW |
1403 | |
1404 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the | |
1405 | @code{pwritev2} function is in fact @code{pwritev64v2} and the type | |
1406 | @code{off_t} has 64 bits, which makes it possible to handle files up to | |
1407 | @twoexp{63} bytes in length. | |
1408 | ||
1409 | The return value is a count of bytes (@emph{not} buffers) write, @math{0} | |
1410 | indicating end-of-file, or @math{-1} indicating an error. The possible | |
1411 | errors are the same as in @code{preadv2}. | |
1412 | @end deftypefun | |
1413 | ||
1414 | @deftypefun ssize_t pwritev64v2 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off64_t @var{offset}, int @var{flags}) | |
1415 | @standards{GNU, unistd.h} | |
1416 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1417 | @c This is a syscall for Linux v4.6. The sysdeps/posix fallback emulation | |
1418 | @c is also MT-Safe since it calls pwritev. | |
1419 | ||
1420 | This function is similar to the @code{pwritev2} function with the difference | |
1421 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
1422 | @code{off_t}. It makes it possible on 32 bit machines to address | |
1423 | files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The | |
1424 | file descriptor @code{filedes} must be opened using @code{open64} since | |
1425 | otherwise the large offsets possible with @code{off64_t} will lead to | |
1426 | errors with a descriptor in small file mode. | |
1427 | ||
1428 | When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a | |
1429 | 32 bit machine this function is actually available under the name | |
1430 | @code{pwritev2} and so transparently replaces the 32 bit interface. | |
1431 | @end deftypefun | |
07435eb4 | 1432 | |
bad7a0c8 FW |
1433 | @node Copying File Data |
1434 | @section Copying data between two files | |
1435 | @cindex copying files | |
1436 | @cindex file copy | |
1437 | ||
1438 | A special function is provided to copy data between two files on the | |
1439 | same file system. The system can optimize such copy operations. This | |
1440 | is particularly important on network file systems, where the data would | |
1441 | otherwise have to be transferred twice over the network. | |
1442 | ||
1443 | Note that this function only copies file data, but not metadata such as | |
1444 | file permissions or extended attributes. | |
1445 | ||
1446 | @deftypefun ssize_t copy_file_range (int @var{inputfd}, off64_t *@var{inputpos}, int @var{outputfd}, off64_t *@var{outputpos}, ssize_t @var{length}, unsigned int @var{flags}) | |
1447 | @standards{GNU, unistd.h} | |
1448 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1449 | ||
1450 | This function copies up to @var{length} bytes from the file descriptor | |
1451 | @var{inputfd} to the file descriptor @var{outputfd}. | |
1452 | ||
1453 | The function can operate on both the current file position (like | |
1454 | @code{read} and @code{write}) and an explicit offset (like @code{pread} | |
1455 | and @code{pwrite}). If the @var{inputpos} pointer is null, the file | |
1456 | position of @var{inputfd} is used as the starting point of the copy | |
1457 | operation, and the file position is advanced during it. If | |
1458 | @var{inputpos} is not null, then @code{*@var{inputpos}} is used as the | |
1459 | starting point of the copy operation, and @code{*@var{inputpos}} is | |
1460 | incremented by the number of copied bytes, but the file position remains | |
1461 | unchanged. Similar rules apply to @var{outputfd} and @var{outputpos} | |
1462 | for the output file position. | |
1463 | ||
1464 | The @var{flags} argument is currently reserved and must be zero. | |
1465 | ||
1466 | The @code{copy_file_range} function returns the number of bytes copied. | |
1467 | This can be less than the specified @var{length} in case the input file | |
1468 | contains fewer remaining bytes than @var{length}, or if a read or write | |
1469 | failure occurs. The return value is zero if the end of the input file | |
1470 | is encountered immediately. | |
1471 | ||
1472 | If no bytes can be copied, to report an error, @code{copy_file_range} | |
5a659ccc FW |
1473 | returns the value @math{-1} and sets @code{errno}. The table below |
1474 | lists some of the error conditions for this function. | |
bad7a0c8 FW |
1475 | |
1476 | @table @code | |
5a659ccc FW |
1477 | @item ENOSYS |
1478 | The kernel does not implement the required functionality. | |
1479 | ||
bad7a0c8 FW |
1480 | @item EISDIR |
1481 | At least one of the descriptors @var{inputfd} or @var{outputfd} refers | |
1482 | to a directory. | |
1483 | ||
1484 | @item EINVAL | |
1485 | At least one of the descriptors @var{inputfd} or @var{outputfd} refers | |
1486 | to a non-regular, non-directory file (such as a socket or a FIFO). | |
1487 | ||
1488 | The input or output positions before are after the copy operations are | |
1489 | outside of an implementation-defined limit. | |
1490 | ||
1491 | The @var{flags} argument is not zero. | |
1492 | ||
1493 | @item EFBIG | |
1494 | The new file size would exceed the process file size limit. | |
1495 | @xref{Limits on Resources}. | |
1496 | ||
1497 | The input or output positions before are after the copy operations are | |
1498 | outside of an implementation-defined limit. This can happen if the file | |
1499 | was not opened with large file support (LFS) on 32-bit machines, and the | |
1500 | copy operation would create a file which is larger than what | |
1501 | @code{off_t} could represent. | |
1502 | ||
1503 | @item EBADF | |
1504 | The argument @var{inputfd} is not a valid file descriptor open for | |
1505 | reading. | |
1506 | ||
1507 | The argument @var{outputfd} is not a valid file descriptor open for | |
1508 | writing, or @var{outputfd} has been opened with @code{O_APPEND}. | |
bad7a0c8 FW |
1509 | @end table |
1510 | ||
1511 | In addition, @code{copy_file_range} can fail with the error codes | |
1512 | which are used by @code{read}, @code{pread}, @code{write}, and | |
1513 | @code{pwrite}. | |
1514 | ||
1515 | The @code{copy_file_range} function is a cancellation point. In case of | |
1516 | cancellation, the input location (the file position or the value at | |
1517 | @code{*@var{inputpos}}) is indeterminate. | |
1518 | @end deftypefun | |
1519 | ||
07435eb4 UD |
1520 | @node Memory-mapped I/O |
1521 | @section Memory-mapped I/O | |
1522 | ||
1523 | On modern operating systems, it is possible to @dfn{mmap} (pronounced | |
1524 | ``em-map'') a file to a region of memory. When this is done, the file can | |
1525 | be accessed just like an array in the program. | |
1526 | ||
19e4c7dd | 1527 | This is more efficient than @code{read} or @code{write}, as only the regions |
04b9968b | 1528 | of the file that a program actually accesses are loaded. Accesses to |
07435eb4 UD |
1529 | not-yet-loaded parts of the mmapped region are handled in the same way as |
1530 | swapped out pages. | |
1531 | ||
b642f101 UD |
1532 | Since mmapped pages can be stored back to their file when physical |
1533 | memory is low, it is possible to mmap files orders of magnitude larger | |
1534 | than both the physical memory @emph{and} swap space. The only limit is | |
1535 | address space. The theoretical limit is 4GB on a 32-bit machine - | |
1536 | however, the actual limit will be smaller since some areas will be | |
1537 | reserved for other purposes. If the LFS interface is used the file size | |
1538 | on 32-bit systems is not limited to 2GB (offsets are signed which | |
1539 | reduces the addressable area of 4GB by half); the full 64-bit are | |
1540 | available. | |
07435eb4 UD |
1541 | |
1542 | Memory mapping only works on entire pages of memory. Thus, addresses | |
1543 | for mapping must be page-aligned, and length values will be rounded up. | |
a465b89e | 1544 | To determine the default size of a page the machine uses one should use: |
07435eb4 | 1545 | |
b642f101 | 1546 | @vindex _SC_PAGESIZE |
07435eb4 UD |
1547 | @smallexample |
1548 | size_t page_size = (size_t) sysconf (_SC_PAGESIZE); | |
1549 | @end smallexample | |
1550 | ||
a465b89e FW |
1551 | On some systems, mappings can use larger page sizes |
1552 | for certain files, and applications can request larger page sizes for | |
1553 | anonymous mappings as well (see the @code{MAP_HUGETLB} flag below). | |
1554 | ||
1555 | The following functions are declared in @file{sys/mman.h}: | |
07435eb4 | 1556 | |
cc6e48bc | 1557 | @deftypefun {void *} mmap (void *@var{address}, size_t @var{length}, int @var{protect}, int @var{flags}, int @var{filedes}, off_t @var{offset}) |
d08a7e4c | 1558 | @standards{POSIX, sys/mman.h} |
2cc3615c | 1559 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
07435eb4 UD |
1560 | |
1561 | The @code{mmap} function creates a new mapping, connected to bytes | |
b73147d0 | 1562 | (@var{offset}) to (@var{offset} + @var{length} - 1) in the file open on |
b61345a1 UD |
1563 | @var{filedes}. A new reference for the file specified by @var{filedes} |
1564 | is created, which is not removed by closing the file. | |
07435eb4 UD |
1565 | |
1566 | @var{address} gives a preferred starting address for the mapping. | |
cf822e3c OB |
1567 | @code{NULL} expresses no preference. Any previous mapping at that |
1568 | address is automatically removed. The address you give may still be | |
07435eb4 UD |
1569 | changed, unless you use the @code{MAP_FIXED} flag. |
1570 | ||
07435eb4 UD |
1571 | @var{protect} contains flags that control what kind of access is |
1572 | permitted. They include @code{PROT_READ}, @code{PROT_WRITE}, and | |
0f74bbf5 FW |
1573 | @code{PROT_EXEC}. The special flag @code{PROT_NONE} reserves a region |
1574 | of address space for future use. The @code{mprotect} function can be | |
1575 | used to change the protection flags. @xref{Memory Protection}. | |
07435eb4 | 1576 | |
dce754b1 DD |
1577 | The @var{flags} parameter contains flags that control the nature of |
1578 | the map. One of @code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or | |
1579 | @code{MAP_PRIVATE} must be specified. Additional flags may be bitwise | |
1580 | OR'd to further define the mapping. | |
07435eb4 | 1581 | |
dce754b1 DD |
1582 | Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not |
1583 | all flags are supported on all versions of all operating systems. | |
1584 | Consult the kernel-specific documentation for details. The flags | |
1585 | include: | |
07435eb4 UD |
1586 | |
1587 | @vtable @code | |
1588 | @item MAP_PRIVATE | |
1589 | This specifies that writes to the region should never be written back | |
1590 | to the attached file. Instead, a copy is made for the process, and the | |
1591 | region will be swapped normally if memory runs low. No other process will | |
1592 | see the changes. | |
1593 | ||
1594 | Since private mappings effectively revert to ordinary memory | |
1595 | when written to, you must have enough virtual memory for a copy of | |
1596 | the entire mmapped region if you use this mode with @code{PROT_WRITE}. | |
1597 | ||
1598 | @item MAP_SHARED | |
1599 | This specifies that writes to the region will be written back to the | |
1600 | file. Changes made will be shared immediately with other processes | |
1601 | mmaping the same file. | |
1602 | ||
1603 | Note that actual writing may take place at any time. You need to use | |
1604 | @code{msync}, described below, if it is important that other processes | |
1605 | using conventional I/O get a consistent view of the file. | |
1606 | ||
dce754b1 DD |
1607 | @item MAP_SHARED_VALIDATE |
1608 | Similar to @code{MAP_SHARED} except that additional flags will be | |
1609 | validated by the kernel, and the call will fail if an unrecognized | |
1610 | flag is provided. With @code{MAP_SHARED} using a flag on a kernel | |
1611 | that doesn't support it causes the flag to be ignored. | |
1612 | @code{MAP_SHARED_VALIDATE} should be used when the behavior of all | |
1613 | flags is required. | |
1614 | ||
07435eb4 UD |
1615 | @item MAP_FIXED |
1616 | This forces the system to use the exact mapping address specified in | |
dce754b1 DD |
1617 | @var{address} and fail if it can't. Note that if the new mapping |
1618 | would overlap an existing mapping, the overlapping portion of the | |
1619 | existing map is unmapped. | |
07435eb4 UD |
1620 | |
1621 | @c One of these is official - the other is obviously an obsolete synonym | |
1622 | @c Which is which? | |
1623 | @item MAP_ANONYMOUS | |
1624 | @itemx MAP_ANON | |
1625 | This flag tells the system to create an anonymous mapping, not connected | |
9739d2d5 | 1626 | to a file. @var{filedes} and @var{offset} are ignored, and the region is |
07435eb4 UD |
1627 | initialized with zeros. |
1628 | ||
1629 | Anonymous maps are used as the basic primitive to extend the heap on some | |
1630 | systems. They are also useful to share data between multiple tasks | |
1631 | without creating a file. | |
1632 | ||
49c091e5 | 1633 | On some systems using private anonymous mmaps is more efficient than using |
1f77f049 | 1634 | @code{malloc} for large blocks. This is not an issue with @theglibc{}, |
07435eb4 UD |
1635 | as the included @code{malloc} automatically uses @code{mmap} where appropriate. |
1636 | ||
a465b89e FW |
1637 | @item MAP_HUGETLB |
1638 | @standards{Linux, sys/mman.h} | |
1639 | This requests that the system uses an alternative page size which is | |
1640 | larger than the default page size for the mapping. For some workloads, | |
1641 | increasing the page size for large mappings improves performance because | |
1642 | the system needs to handle far fewer pages. For other workloads which | |
1643 | require frequent transfer of pages between storage or different nodes, | |
1644 | the decreased page granularity may cause performance problems due to the | |
1645 | increased page size and larger transfers. | |
1646 | ||
1647 | In order to create the mapping, the system needs physically contiguous | |
1648 | memory of the size of the increased page size. As a result, | |
1649 | @code{MAP_HUGETLB} mappings are affected by memory fragmentation, and | |
1650 | their creation can fail even if plenty of memory is available in the | |
1651 | system. | |
1652 | ||
1653 | Not all file systems support mappings with an increased page size. | |
1654 | ||
1655 | The @code{MAP_HUGETLB} flag is specific to Linux. | |
1656 | ||
1657 | @c There is a mechanism to select different hugepage sizes; see | |
1658 | @c include/uapi/asm-generic/hugetlb_encode.h in the kernel sources. | |
1659 | ||
dce754b1 DD |
1660 | @item MAP_32BIT |
1661 | Require addresses that can be accessed with a signed 32 bit pointer, | |
1662 | i.e., within the first 2 GiB. Ignored if MAP_FIXED is specified. | |
1663 | ||
1664 | @item MAP_DENYWRITE | |
1665 | @itemx MAP_EXECUTABLE | |
1666 | @itemx MAP_FILE | |
1667 | ||
1668 | Provided for compatibility. Ignored by the Linux kernel. | |
1669 | ||
1670 | @item MAP_FIXED_NOREPLACE | |
1671 | Similar to @code{MAP_FIXED} except the call will fail with | |
1672 | @code{EEXIST} if the new mapping would overwrite an existing mapping. | |
1673 | To test for support for this flag, specify MAP_FIXED_NOREPLACE without | |
1674 | MAP_FIXED, and (if the call was successful) check the actual address | |
1675 | returned. If it does not match the address passed, then this flag is | |
1676 | not supported. | |
1677 | ||
1678 | @item MAP_GROWSDOWN | |
1679 | This flag is used to make stacks, and is typically only needed inside | |
1680 | the program loader to set up the main stack for the running process. | |
1681 | The mapping is created according to the other flags, except an | |
1682 | additional page just prior to the mapping is marked as a ``guard | |
1683 | page''. If a write is attempted inside this guard page, that page is | |
1684 | mapped, the mapping is extended, and a new guard page is created. | |
1685 | Thus, the mapping continues to grow towards lower addresses until it | |
1686 | encounters some other mapping. | |
1687 | ||
1688 | Note that accessing memory beyond the guard page will not trigger this | |
1689 | feature. In gcc, use @code{-fstack-clash-protection} to ensure the | |
1690 | guard page is always touched. | |
1691 | ||
1692 | @item MAP_LOCKED | |
1693 | A hint that requests that mapped pages are locked in memory (i.e. not | |
1694 | paged out). Note that this is a request and not a requirement; use | |
1695 | @code{mlock} if locking is required. | |
1696 | ||
1697 | @item MAP_POPULATE | |
1698 | @itemx MAP_NONBLOCK | |
1699 | @code{MAP_POPULATE} is a hint that requests that the kernel read-ahead | |
1700 | a file-backed mapping, causing pages to be mapped before they're | |
1701 | needed. @code{MAP_NONBLOCK} is a hint that requests that the kernel | |
1702 | @emph{not} attempt such except for pages are already in memory. Note | |
1703 | that neither of these hints affects future paging activity, use | |
1704 | @code{mlock} if such needs to be controlled. | |
1705 | ||
1706 | @item MAP_NORESERVE | |
1707 | Asks the kernel to not reserve physical backing (i.e. space in a swap | |
1708 | device) for a mapping. This would be useful for, for example, a very | |
1709 | large but sparsely used mapping which need not be limited in total | |
1710 | length by available RAM, but with very few mapped pages. Note that | |
1711 | writes to such a mapping may cause a @code{SIGSEGV} if the system is | |
1712 | unable to map a page due to lack of resources. | |
1713 | ||
1714 | On Linux, this flag's behavior may be overwridden by | |
1715 | @file{/proc/sys/vm/overcommit_memory} as documented in the proc(5) man | |
1716 | page. | |
1717 | ||
1718 | @item MAP_STACK | |
1719 | Ensures that the resulting mapping is suitable for use as a program | |
1720 | stack. For example, the use of huge pages might be precluded. | |
1721 | ||
1722 | @item MAP_SYNC | |
1723 | This is a special flag for DAX devices, which tells the kernel to | |
1724 | write dirty metadata out whenever dirty data is written out. Unlike | |
1725 | most other flags, this one will fail unless @code{MAP_SHARED_VALIDATE} | |
1726 | is also given. | |
07435eb4 UD |
1727 | |
1728 | @end vtable | |
1729 | ||
52e6d801 FB |
1730 | @code{mmap} returns the address of the new mapping, or |
1731 | @code{MAP_FAILED} for an error. | |
07435eb4 UD |
1732 | |
1733 | Possible errors include: | |
1734 | ||
1735 | @table @code | |
1736 | ||
dce754b1 DD |
1737 | @item EACCES |
1738 | ||
1739 | @var{filedes} was not open for the type of access specified in @var{protect}. | |
1740 | ||
1741 | @item EAGAIN | |
1742 | ||
1743 | The system has temporarily run out of resources. | |
1744 | ||
1745 | @item EBADF | |
1746 | ||
1747 | The @var{fd} passed is invalid, and a valid file descriptor is | |
1748 | required (i.e. MAP_ANONYMOUS was not specified). | |
1749 | ||
1750 | @item EEXIST | |
1751 | ||
1752 | @code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was | |
1753 | found overlapping the requested address range. | |
1754 | ||
07435eb4 UD |
1755 | @item EINVAL |
1756 | ||
a465b89e FW |
1757 | Either @var{address} was unusable (because it is not a multiple of the |
1758 | applicable page size), or inconsistent @var{flags} were given. | |
1759 | ||
1760 | If @code{MAP_HUGETLB} was specified, the file or system does not support | |
1761 | large page sizes. | |
07435eb4 | 1762 | |
dce754b1 | 1763 | @item ENODEV |
07435eb4 | 1764 | |
dce754b1 DD |
1765 | This file is of a type that doesn't support mapping, the process has |
1766 | exceeded its data space limit, or the map request would exceed the | |
1767 | process's virtual address space. | |
07435eb4 UD |
1768 | |
1769 | @item ENOMEM | |
1770 | ||
dce754b1 DD |
1771 | There is not enough memory for the operation, the process is out of |
1772 | address space, or there are too many mappings. On Linux, the maximum | |
1773 | number of mappings can be controlled via | |
1774 | @file{/proc/sys/vm/max_map_count} or, if your OS supports it, via | |
1775 | the @code{vm.max_map_count} @code{sysctl} setting. | |
07435eb4 UD |
1776 | |
1777 | @item ENOEXEC | |
1778 | ||
1779 | The file is on a filesystem that doesn't support mapping. | |
1780 | ||
dce754b1 DD |
1781 | @item EPERM |
1782 | ||
1783 | @code{PROT_EXEC} was requested but the file is on a filesystem that | |
1784 | was mounted with execution denied, a file seal prevented the mapping, | |
1785 | or the caller set MAP_HUDETLB but does not have the required | |
1786 | priviledges. | |
1787 | ||
1788 | @item EOVERFLOW | |
1789 | ||
1790 | Either the offset into the file plus the length of the mapping causes | |
1791 | internal page counts to overflow, or the offset requested exceeds the | |
1792 | length of the file. | |
1793 | ||
07435eb4 UD |
1794 | @c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock. |
1795 | @c However mandatory locks are not discussed in this manual. | |
1796 | @c | |
1797 | @c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented | |
1798 | @c here) is used and the file is already open for writing. | |
1799 | ||
1800 | @end table | |
1801 | ||
1802 | @end deftypefun | |
1803 | ||
cc6e48bc | 1804 | @deftypefun {void *} mmap64 (void *@var{address}, size_t @var{length}, int @var{protect}, int @var{flags}, int @var{filedes}, off64_t @var{offset}) |
d08a7e4c | 1805 | @standards{LFS, sys/mman.h} |
2cc3615c AO |
1806 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
1807 | @c The page_shift auto detection when MMAP2_PAGE_SHIFT is -1 (it never | |
1808 | @c is) would be thread-unsafe. | |
b642f101 UD |
1809 | The @code{mmap64} function is equivalent to the @code{mmap} function but |
1810 | the @var{offset} parameter is of type @code{off64_t}. On 32-bit systems | |
1811 | this allows the file associated with the @var{filedes} descriptor to be | |
1812 | larger than 2GB. @var{filedes} must be a descriptor returned from a | |
1813 | call to @code{open64} or @code{fopen64} and @code{freopen64} where the | |
1814 | descriptor is retrieved with @code{fileno}. | |
1815 | ||
1816 | When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this | |
1817 | function is actually available under the name @code{mmap}. I.e., the | |
1818 | new, extended API using 64 bit file sizes and offsets transparently | |
1819 | replaces the old API. | |
1820 | @end deftypefun | |
1821 | ||
07435eb4 | 1822 | @deftypefun int munmap (void *@var{addr}, size_t @var{length}) |
d08a7e4c | 1823 | @standards{POSIX, sys/mman.h} |
2cc3615c | 1824 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
07435eb4 UD |
1825 | |
1826 | @code{munmap} removes any memory maps from (@var{addr}) to (@var{addr} + | |
1827 | @var{length}). @var{length} should be the length of the mapping. | |
1828 | ||
04b9968b | 1829 | It is safe to unmap multiple mappings in one command, or include unmapped |
07435eb4 | 1830 | space in the range. It is also possible to unmap only part of an existing |
04b9968b | 1831 | mapping. However, only entire pages can be removed. If @var{length} is not |
07435eb4 UD |
1832 | an even number of pages, it will be rounded up. |
1833 | ||
1834 | It returns @math{0} for success and @math{-1} for an error. | |
1835 | ||
1836 | One error is possible: | |
1837 | ||
1838 | @table @code | |
1839 | ||
1840 | @item EINVAL | |
04b9968b | 1841 | The memory range given was outside the user mmap range or wasn't page |
07435eb4 UD |
1842 | aligned. |
1843 | ||
1844 | @end table | |
1845 | ||
1846 | @end deftypefun | |
1847 | ||
1848 | @deftypefun int msync (void *@var{address}, size_t @var{length}, int @var{flags}) | |
d08a7e4c | 1849 | @standards{POSIX, sys/mman.h} |
2cc3615c | 1850 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
07435eb4 UD |
1851 | |
1852 | When using shared mappings, the kernel can write the file at any time | |
1853 | before the mapping is removed. To be certain data has actually been | |
49c091e5 UD |
1854 | written to the file and will be accessible to non-memory-mapped I/O, it |
1855 | is necessary to use this function. | |
07435eb4 UD |
1856 | |
1857 | It operates on the region @var{address} to (@var{address} + @var{length}). | |
1858 | It may be used on part of a mapping or multiple mappings, however the | |
1859 | region given should not contain any unmapped space. | |
1860 | ||
1861 | @var{flags} can contain some options: | |
1862 | ||
1863 | @vtable @code | |
1864 | ||
1865 | @item MS_SYNC | |
1866 | ||
1867 | This flag makes sure the data is actually written @emph{to disk}. | |
1868 | Normally @code{msync} only makes sure that accesses to a file with | |
1869 | conventional I/O reflect the recent changes. | |
1870 | ||
1871 | @item MS_ASYNC | |
1872 | ||
1873 | This tells @code{msync} to begin the synchronization, but not to wait for | |
1874 | it to complete. | |
1875 | ||
1876 | @c Linux also has MS_INVALIDATE, which I don't understand. | |
1877 | ||
1878 | @end vtable | |
1879 | ||
1880 | @code{msync} returns @math{0} for success and @math{-1} for | |
1881 | error. Errors include: | |
1882 | ||
1883 | @table @code | |
1884 | ||
1885 | @item EINVAL | |
1886 | An invalid region was given, or the @var{flags} were invalid. | |
1887 | ||
1888 | @item EFAULT | |
1889 | There is no existing mapping in at least part of the given region. | |
1890 | ||
1891 | @end table | |
1892 | ||
1893 | @end deftypefun | |
1894 | ||
1895 | @deftypefun {void *} mremap (void *@var{address}, size_t @var{length}, size_t @var{new_length}, int @var{flag}) | |
d08a7e4c | 1896 | @standards{GNU, sys/mman.h} |
2cc3615c | 1897 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
07435eb4 UD |
1898 | |
1899 | This function can be used to change the size of an existing memory | |
1900 | area. @var{address} and @var{length} must cover a region entirely mapped | |
cf822e3c | 1901 | in the same @code{mmap} statement. A new mapping with the same |
04b9968b | 1902 | characteristics will be returned with the length @var{new_length}. |
07435eb4 | 1903 | |
cf822e3c | 1904 | One option is possible, @code{MREMAP_MAYMOVE}. If it is given in |
07435eb4 UD |
1905 | @var{flags}, the system may remove the existing mapping and create a new |
1906 | one of the desired length in another location. | |
1907 | ||
cf822e3c | 1908 | The address of the resulting mapping is returned, or @math{-1}. Possible |
07435eb4 UD |
1909 | error codes include: |
1910 | ||
07435eb4 UD |
1911 | @table @code |
1912 | ||
1913 | @item EFAULT | |
1914 | There is no existing mapping in at least part of the original region, or | |
1915 | the region covers two or more distinct mappings. | |
1916 | ||
1917 | @item EINVAL | |
1918 | The address given is misaligned or inappropriate. | |
1919 | ||
1920 | @item EAGAIN | |
1921 | The region has pages locked, and if extended it would exceed the | |
1922 | process's resource limit for locked pages. @xref{Limits on Resources}. | |
1923 | ||
1924 | @item ENOMEM | |
19e4c7dd | 1925 | The region is private writable, and insufficient virtual memory is |
07435eb4 UD |
1926 | available to extend it. Also, this error will occur if |
1927 | @code{MREMAP_MAYMOVE} is not given and the extension would collide with | |
1928 | another mapped region. | |
1929 | ||
1930 | @end table | |
1931 | @end deftypefun | |
1932 | ||
04b9968b UD |
1933 | This function is only available on a few systems. Except for performing |
1934 | optional optimizations one should not rely on this function. | |
1935 | ||
07435eb4 UD |
1936 | Not all file descriptors may be mapped. Sockets, pipes, and most devices |
1937 | only allow sequential access and do not fit into the mapping abstraction. | |
1938 | In addition, some regular files may not be mmapable, and older kernels may | |
1939 | not support mapping at all. Thus, programs using @code{mmap} should | |
1940 | have a fallback method to use should it fail. @xref{Mmap,,,standards,GNU | |
1941 | Coding Standards}. | |
1942 | ||
0bc93a2f | 1943 | @deftypefun int madvise (void *@var{addr}, size_t @var{length}, int @var{advice}) |
d08a7e4c | 1944 | @standards{POSIX, sys/mman.h} |
2cc3615c | 1945 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
0bc93a2f AJ |
1946 | |
1947 | This function can be used to provide the system with @var{advice} about | |
1948 | the intended usage patterns of the memory region starting at @var{addr} | |
1949 | and extending @var{length} bytes. | |
1950 | ||
1951 | The valid BSD values for @var{advice} are: | |
1952 | ||
2fe82ca6 | 1953 | @vtable @code |
0bc93a2f AJ |
1954 | |
1955 | @item MADV_NORMAL | |
1956 | The region should receive no further special treatment. | |
1957 | ||
1958 | @item MADV_RANDOM | |
cf822e3c | 1959 | The region will be accessed via random page references. The kernel |
0bc93a2f AJ |
1960 | should page-in the minimal number of pages for each page fault. |
1961 | ||
1962 | @item MADV_SEQUENTIAL | |
cf822e3c | 1963 | The region will be accessed via sequential page references. This |
0bc93a2f AJ |
1964 | may cause the kernel to aggressively read-ahead, expecting further |
1965 | sequential references after any page fault within this region. | |
1966 | ||
1967 | @item MADV_WILLNEED | |
1968 | The region will be needed. The pages within this region may | |
1969 | be pre-faulted in by the kernel. | |
1970 | ||
1971 | @item MADV_DONTNEED | |
1972 | The region is no longer needed. The kernel may free these pages, | |
1973 | causing any changes to the pages to be lost, as well as swapped | |
1974 | out pages to be discarded. | |
1975 | ||
a465b89e FW |
1976 | @item MADV_HUGEPAGE |
1977 | @standards{Linux, sys/mman.h} | |
1978 | Indicate that it is beneficial to increase the page size for this | |
1979 | mapping. This can improve performance for larger mappings because the | |
1980 | system needs to handle far fewer pages. However, if parts of the | |
1981 | mapping are frequently transferred between storage or different nodes, | |
1982 | performance may suffer because individual transfers can become | |
1983 | substantially larger due to the increased page size. | |
1984 | ||
1985 | This flag is specific to Linux. | |
1986 | ||
1987 | @item MADV_NOHUGEPAGE | |
1988 | Undo the effect of a previous @code{MADV_HUGEPAGE} advice. This flag | |
1989 | is specific to Linux. | |
1990 | ||
2fe82ca6 | 1991 | @end vtable |
0bc93a2f AJ |
1992 | |
1993 | The POSIX names are slightly different, but with the same meanings: | |
1994 | ||
2fe82ca6 | 1995 | @vtable @code |
0bc93a2f AJ |
1996 | |
1997 | @item POSIX_MADV_NORMAL | |
1998 | This corresponds with BSD's @code{MADV_NORMAL}. | |
1999 | ||
2000 | @item POSIX_MADV_RANDOM | |
2001 | This corresponds with BSD's @code{MADV_RANDOM}. | |
2002 | ||
2003 | @item POSIX_MADV_SEQUENTIAL | |
2004 | This corresponds with BSD's @code{MADV_SEQUENTIAL}. | |
2005 | ||
2006 | @item POSIX_MADV_WILLNEED | |
2007 | This corresponds with BSD's @code{MADV_WILLNEED}. | |
2008 | ||
2009 | @item POSIX_MADV_DONTNEED | |
2010 | This corresponds with BSD's @code{MADV_DONTNEED}. | |
2011 | ||
2fe82ca6 | 2012 | @end vtable |
0bc93a2f | 2013 | |
bb4e6db2 | 2014 | @code{madvise} returns @math{0} for success and @math{-1} for |
0bc93a2f AJ |
2015 | error. Errors include: |
2016 | @table @code | |
2017 | ||
2018 | @item EINVAL | |
2019 | An invalid region was given, or the @var{advice} was invalid. | |
2020 | ||
2021 | @item EFAULT | |
2022 | There is no existing mapping in at least part of the given region. | |
2023 | ||
2024 | @end table | |
2025 | @end deftypefun | |
07435eb4 | 2026 | |
416e0145 | 2027 | @deftypefn Function int shm_open (const char *@var{name}, int @var{oflag}, mode_t @var{mode}) |
d08a7e4c | 2028 | @standards{POSIX, sys/mman.h} |
2cc3615c AO |
2029 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asuinit{} @ascuheap{} @asulock{}}@acunsafe{@aculock{} @acsmem{} @acsfd{}}} |
2030 | @c shm_open @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd | |
2031 | @c libc_once(where_is_shmfs) @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd | |
2032 | @c where_is_shmfs @mtslocale @ascuheap @asulock @aculock @acsmem @acsfd | |
2033 | @c statfs dup ok | |
2034 | @c setmntent dup @ascuheap @asulock @acsmem @acsfd @aculock | |
2035 | @c getmntent_r dup @mtslocale @ascuheap @aculock @acsmem [no @asucorrupt @acucorrupt; exclusive stream] | |
2036 | @c strcmp dup ok | |
2037 | @c strlen dup ok | |
2038 | @c malloc dup @ascuheap @acsmem | |
2039 | @c mempcpy dup ok | |
2040 | @c endmntent dup @ascuheap @asulock @aculock @acsmem @acsfd | |
2041 | @c strlen dup ok | |
2042 | @c strchr dup ok | |
2043 | @c mempcpy dup ok | |
2044 | @c open dup @acsfd | |
2045 | @c fcntl dup ok | |
2046 | @c close dup @acsfd | |
416e0145 OB |
2047 | |
2048 | This function returns a file descriptor that can be used to allocate shared | |
cf822e3c | 2049 | memory via mmap. Unrelated processes can use same @var{name} to create or |
416e0145 OB |
2050 | open existing shared memory objects. |
2051 | ||
2052 | A @var{name} argument specifies the shared memory object to be opened. | |
2053 | In @theglibc{} it must be a string smaller than @code{NAME_MAX} bytes starting | |
2054 | with an optional slash but containing no other slashes. | |
2055 | ||
2056 | The semantics of @var{oflag} and @var{mode} arguments is same as in @code{open}. | |
2057 | ||
2058 | @code{shm_open} returns the file descriptor on success or @math{-1} on error. | |
2059 | On failure @code{errno} is set. | |
2060 | @end deftypefn | |
2061 | ||
2062 | @deftypefn Function int shm_unlink (const char *@var{name}) | |
2cc3615c AO |
2063 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asuinit{} @ascuheap{} @asulock{}}@acunsafe{@aculock{} @acsmem{} @acsfd{}}} |
2064 | @c shm_unlink @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd | |
2065 | @c libc_once(where_is_shmfs) dup @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd | |
2066 | @c strlen dup ok | |
2067 | @c strchr dup ok | |
2068 | @c mempcpy dup ok | |
2069 | @c unlink dup ok | |
416e0145 | 2070 | |
9739d2d5 | 2071 | This function is the inverse of @code{shm_open} and removes the object with |
416e0145 OB |
2072 | the given @var{name} previously created by @code{shm_open}. |
2073 | ||
2074 | @code{shm_unlink} returns @math{0} on success or @math{-1} on error. | |
2075 | On failure @code{errno} is set. | |
2076 | @end deftypefn | |
2077 | ||
59d2cbb1 FW |
2078 | @deftypefun int memfd_create (const char *@var{name}, unsigned int @var{flags}) |
2079 | @standards{Linux, sys/mman.h} | |
2080 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} | |
2081 | The @code{memfd_create} function returns a file descriptor which can be | |
2082 | used to create memory mappings using the @code{mmap} function. It is | |
2083 | similar to the @code{shm_open} function in the sense that these mappings | |
2084 | are not backed by actual files. However, the descriptor returned by | |
2085 | @code{memfd_create} does not correspond to a named object; the | |
2086 | @var{name} argument is used for debugging purposes only (e.g., will | |
2087 | appear in @file{/proc}), and separate invocations of @code{memfd_create} | |
2088 | with the same @var{name} will not return descriptors for the same region | |
2089 | of memory. The descriptor can also be used to create alias mappings | |
2090 | within the same process. | |
2091 | ||
2092 | The descriptor initially refers to a zero-length file. Before mappings | |
2093 | can be created which are backed by memory, the file size needs to be | |
2094 | increased with the @code{ftruncate} function. @xref{File Size}. | |
2095 | ||
2096 | The @var{flags} argument can be a combination of the following flags: | |
2097 | ||
2098 | @vtable @code | |
2099 | @item MFD_CLOEXEC | |
2100 | @standards{Linux, sys/mman.h} | |
2101 | The descriptor is created with the @code{O_CLOEXEC} flag. | |
2102 | ||
2103 | @item MFD_ALLOW_SEALING | |
2104 | @standards{Linux, sys/mman.h} | |
2105 | The descriptor supports the addition of seals using the @code{fcntl} | |
2106 | function. | |
2107 | ||
2108 | @item MFD_HUGETLB | |
2109 | @standards{Linux, sys/mman.h} | |
2110 | This requests that mappings created using the returned file descriptor | |
2111 | use a larger page size. See @code{MAP_HUGETLB} above for details. | |
2112 | ||
2113 | This flag is incompatible with @code{MFD_ALLOW_SEALING}. | |
2114 | @end vtable | |
2115 | ||
2116 | @code{memfd_create} returns a file descriptor on success, and @math{-1} | |
2117 | on failure. | |
2118 | ||
2119 | The following @code{errno} error conditions are defined for this | |
2120 | function: | |
2121 | ||
2122 | @table @code | |
2123 | @item EINVAL | |
2124 | An invalid combination is specified in @var{flags}, or @var{name} is | |
2125 | too long. | |
2126 | ||
2127 | @item EFAULT | |
2128 | The @var{name} argument does not point to a string. | |
2129 | ||
2130 | @item EMFILE | |
2131 | The operation would exceed the file descriptor limit for this process. | |
2132 | ||
2133 | @item ENFILE | |
2134 | The operation would exceed the system-wide file descriptor limit. | |
2135 | ||
2136 | @item ENOMEM | |
2137 | There is not enough memory for the operation. | |
2138 | @end table | |
2139 | @end deftypefun | |
2140 | ||
28f540f4 RM |
2141 | @node Waiting for I/O |
2142 | @section Waiting for Input or Output | |
2143 | @cindex waiting for input or output | |
2144 | @cindex multiplexing input | |
2145 | @cindex input from multiple files | |
2146 | ||
2147 | Sometimes a program needs to accept input on multiple input channels | |
2148 | whenever input arrives. For example, some workstations may have devices | |
2149 | such as a digitizing tablet, function button box, or dial box that are | |
2150 | connected via normal asynchronous serial interfaces; good user interface | |
2151 | style requires responding immediately to input on any device. Another | |
2152 | example is a program that acts as a server to several other processes | |
2153 | via pipes or sockets. | |
2154 | ||
2155 | You cannot normally use @code{read} for this purpose, because this | |
2156 | blocks the program until input is available on one particular file | |
2157 | descriptor; input on other channels won't wake it up. You could set | |
2158 | nonblocking mode and poll each file descriptor in turn, but this is very | |
2159 | inefficient. | |
2160 | ||
2161 | A better solution is to use the @code{select} function. This blocks the | |
2162 | program until input or output is ready on a specified set of file | |
2163 | descriptors, or until a timer expires, whichever comes first. This | |
2164 | facility is declared in the header file @file{sys/types.h}. | |
2165 | @pindex sys/types.h | |
2166 | ||
2167 | In the case of a server socket (@pxref{Listening}), we say that | |
2168 | ``input'' is available when there are pending connections that could be | |
2169 | accepted (@pxref{Accepting Connections}). @code{accept} for server | |
2170 | sockets blocks and interacts with @code{select} just as @code{read} does | |
2171 | for normal input. | |
2172 | ||
2173 | @cindex file descriptor sets, for @code{select} | |
2174 | The file descriptor sets for the @code{select} function are specified | |
2175 | as @code{fd_set} objects. Here is the description of the data type | |
2176 | and some macros for manipulating these objects. | |
2177 | ||
28f540f4 | 2178 | @deftp {Data Type} fd_set |
d08a7e4c | 2179 | @standards{BSD, sys/types.h} |
28f540f4 RM |
2180 | The @code{fd_set} data type represents file descriptor sets for the |
2181 | @code{select} function. It is actually a bit array. | |
2182 | @end deftp | |
2183 | ||
28f540f4 | 2184 | @deftypevr Macro int FD_SETSIZE |
d08a7e4c | 2185 | @standards{BSD, sys/types.h} |
28f540f4 RM |
2186 | The value of this macro is the maximum number of file descriptors that a |
2187 | @code{fd_set} object can hold information about. On systems with a | |
2188 | fixed maximum number, @code{FD_SETSIZE} is at least that number. On | |
2189 | some systems, including GNU, there is no absolute limit on the number of | |
2190 | descriptors open, but this macro still has a constant value which | |
2191 | controls the number of bits in an @code{fd_set}; if you get a file | |
2192 | descriptor with a value as high as @code{FD_SETSIZE}, you cannot put | |
2193 | that descriptor into an @code{fd_set}. | |
2194 | @end deftypevr | |
2195 | ||
28f540f4 | 2196 | @deftypefn Macro void FD_ZERO (fd_set *@var{set}) |
d08a7e4c | 2197 | @standards{BSD, sys/types.h} |
2cc3615c | 2198 | @safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} |
28f540f4 RM |
2199 | This macro initializes the file descriptor set @var{set} to be the |
2200 | empty set. | |
2201 | @end deftypefn | |
2202 | ||
28f540f4 | 2203 | @deftypefn Macro void FD_SET (int @var{filedes}, fd_set *@var{set}) |
d08a7e4c | 2204 | @standards{BSD, sys/types.h} |
2cc3615c AO |
2205 | @safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} |
2206 | @c Setting a bit isn't necessarily atomic, so there's a potential race | |
2207 | @c here if set is not used exclusively. | |
28f540f4 | 2208 | This macro adds @var{filedes} to the file descriptor set @var{set}. |
d9997a45 UD |
2209 | |
2210 | The @var{filedes} parameter must not have side effects since it is | |
2211 | evaluated more than once. | |
28f540f4 RM |
2212 | @end deftypefn |
2213 | ||
28f540f4 | 2214 | @deftypefn Macro void FD_CLR (int @var{filedes}, fd_set *@var{set}) |
d08a7e4c | 2215 | @standards{BSD, sys/types.h} |
2cc3615c AO |
2216 | @safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} |
2217 | @c Setting a bit isn't necessarily atomic, so there's a potential race | |
2218 | @c here if set is not used exclusively. | |
28f540f4 | 2219 | This macro removes @var{filedes} from the file descriptor set @var{set}. |
d9997a45 UD |
2220 | |
2221 | The @var{filedes} parameter must not have side effects since it is | |
2222 | evaluated more than once. | |
28f540f4 RM |
2223 | @end deftypefn |
2224 | ||
d9997a45 | 2225 | @deftypefn Macro int FD_ISSET (int @var{filedes}, const fd_set *@var{set}) |
d08a7e4c | 2226 | @standards{BSD, sys/types.h} |
2cc3615c | 2227 | @safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} |
28f540f4 | 2228 | This macro returns a nonzero value (true) if @var{filedes} is a member |
3081378b | 2229 | of the file descriptor set @var{set}, and zero (false) otherwise. |
d9997a45 UD |
2230 | |
2231 | The @var{filedes} parameter must not have side effects since it is | |
2232 | evaluated more than once. | |
28f540f4 RM |
2233 | @end deftypefn |
2234 | ||
2235 | Next, here is the description of the @code{select} function itself. | |
2236 | ||
28f540f4 | 2237 | @deftypefun int select (int @var{nfds}, fd_set *@var{read-fds}, fd_set *@var{write-fds}, fd_set *@var{except-fds}, struct timeval *@var{timeout}) |
d08a7e4c | 2238 | @standards{BSD, sys/types.h} |
2cc3615c AO |
2239 | @safety{@prelim{}@mtsafe{@mtsrace{:read-fds} @mtsrace{:write-fds} @mtsrace{:except-fds}}@assafe{}@acsafe{}} |
2240 | @c The select syscall is preferred, but pselect6 may be used instead, | |
2241 | @c which requires converting timeout to a timespec and back. The | |
2242 | @c conversions are not atomic. | |
28f540f4 RM |
2243 | The @code{select} function blocks the calling process until there is |
2244 | activity on any of the specified sets of file descriptors, or until the | |
2245 | timeout period has expired. | |
2246 | ||
2247 | The file descriptors specified by the @var{read-fds} argument are | |
2248 | checked to see if they are ready for reading; the @var{write-fds} file | |
2249 | descriptors are checked to see if they are ready for writing; and the | |
2250 | @var{except-fds} file descriptors are checked for exceptional | |
2251 | conditions. You can pass a null pointer for any of these arguments if | |
2252 | you are not interested in checking for that kind of condition. | |
2253 | ||
76de2021 UD |
2254 | A file descriptor is considered ready for reading if a @code{read} |
2255 | call will not block. This usually includes the read offset being at | |
2256 | the end of the file or there is an error to report. A server socket | |
2257 | is considered ready for reading if there is a pending connection which | |
2258 | can be accepted with @code{accept}; @pxref{Accepting Connections}. A | |
2259 | client socket is ready for writing when its connection is fully | |
2260 | established; @pxref{Connecting}. | |
28f540f4 RM |
2261 | |
2262 | ``Exceptional conditions'' does not mean errors---errors are reported | |
2263 | immediately when an erroneous system call is executed, and do not | |
2264 | constitute a state of the descriptor. Rather, they include conditions | |
2265 | such as the presence of an urgent message on a socket. (@xref{Sockets}, | |
2266 | for information on urgent messages.) | |
2267 | ||
2268 | The @code{select} function checks only the first @var{nfds} file | |
2269 | descriptors. The usual thing is to pass @code{FD_SETSIZE} as the value | |
2270 | of this argument. | |
2271 | ||
2272 | The @var{timeout} specifies the maximum time to wait. If you pass a | |
62193c4a ZW |
2273 | null pointer for this argument, it means to block indefinitely until |
2274 | one of the file descriptors is ready. Otherwise, you should provide | |
2275 | the time in @code{struct timeval} format; see @ref{Time Types}. | |
2276 | Specify zero as the time (a @code{struct timeval} containing all | |
2277 | zeros) if you want to find out which descriptors are ready without | |
28f540f4 RM |
2278 | waiting if none are ready. |
2279 | ||
2280 | The normal return value from @code{select} is the total number of ready file | |
2281 | descriptors in all of the sets. Each of the argument sets is overwritten | |
2282 | with information about the descriptors that are ready for the corresponding | |
2283 | operation. Thus, to see if a particular descriptor @var{desc} has input, | |
2284 | use @code{FD_ISSET (@var{desc}, @var{read-fds})} after @code{select} returns. | |
2285 | ||
2286 | If @code{select} returns because the timeout period expires, it returns | |
2287 | a value of zero. | |
2288 | ||
2289 | Any signal will cause @code{select} to return immediately. So if your | |
2290 | program uses signals, you can't rely on @code{select} to keep waiting | |
2291 | for the full time specified. If you want to be sure of waiting for a | |
2292 | particular amount of time, you must check for @code{EINTR} and repeat | |
2293 | the @code{select} with a newly calculated timeout based on the current | |
2294 | time. See the example below. See also @ref{Interrupted Primitives}. | |
2295 | ||
2296 | If an error occurs, @code{select} returns @code{-1} and does not modify | |
2c6fe0bd | 2297 | the argument file descriptor sets. The following @code{errno} error |
28f540f4 RM |
2298 | conditions are defined for this function: |
2299 | ||
2300 | @table @code | |
2301 | @item EBADF | |
2302 | One of the file descriptor sets specified an invalid file descriptor. | |
2303 | ||
2304 | @item EINTR | |
2305 | The operation was interrupted by a signal. @xref{Interrupted Primitives}. | |
2306 | ||
2307 | @item EINVAL | |
2308 | The @var{timeout} argument is invalid; one of the components is negative | |
2309 | or too large. | |
2310 | @end table | |
2311 | @end deftypefun | |
2312 | ||
2313 | @strong{Portability Note:} The @code{select} function is a BSD Unix | |
2314 | feature. | |
2315 | ||
2316 | Here is an example showing how you can use @code{select} to establish a | |
2317 | timeout period for reading from a file descriptor. The @code{input_timeout} | |
2318 | function blocks the calling process until input is available on the | |
2319 | file descriptor, or until the timeout period expires. | |
2320 | ||
2321 | @smallexample | |
2322 | @include select.c.texi | |
2323 | @end smallexample | |
2324 | ||
2325 | There is another example showing the use of @code{select} to multiplex | |
2326 | input from multiple sockets in @ref{Server Example}. | |
2327 | ||
6c0be743 DD |
2328 | For an alternate interface to this functionality, see @code{poll} |
2329 | (@pxref{Other Low-Level I/O APIs}). | |
28f540f4 | 2330 | |
dfd2257a UD |
2331 | @node Synchronizing I/O |
2332 | @section Synchronizing I/O operations | |
2333 | ||
2334 | @cindex synchronizing | |
19e4c7dd | 2335 | In most modern operating systems, the normal I/O operations are not |
dfd2257a | 2336 | executed synchronously. I.e., even if a @code{write} system call |
19e4c7dd | 2337 | returns, this does not mean the data is actually written to the media, |
dfd2257a UD |
2338 | e.g., the disk. |
2339 | ||
19e4c7dd | 2340 | In situations where synchronization points are necessary, you can use |
04b9968b | 2341 | special functions which ensure that all operations finish before |
dfd2257a UD |
2342 | they return. |
2343 | ||
8ded91fb | 2344 | @deftypefun void sync (void) |
d08a7e4c | 2345 | @standards{X/Open, unistd.h} |
2cc3615c | 2346 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
dfd2257a | 2347 | A call to this function will not return as long as there is data which |
04b9968b | 2348 | has not been written to the device. All dirty buffers in the kernel will |
dfd2257a UD |
2349 | be written and so an overall consistent system can be achieved (if no |
2350 | other process in parallel writes data). | |
2351 | ||
2352 | A prototype for @code{sync} can be found in @file{unistd.h}. | |
dfd2257a UD |
2353 | @end deftypefun |
2354 | ||
04b9968b UD |
2355 | Programs more often want to ensure that data written to a given file is |
2356 | committed, rather than all data in the system. For this, @code{sync} is overkill. | |
2357 | ||
dfd2257a | 2358 | |
dfd2257a | 2359 | @deftypefun int fsync (int @var{fildes}) |
d08a7e4c | 2360 | @standards{POSIX, unistd.h} |
2cc3615c | 2361 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
19e4c7dd AJ |
2362 | The @code{fsync} function can be used to make sure all data associated with |
2363 | the open file @var{fildes} is written to the device associated with the | |
dfd2257a UD |
2364 | descriptor. The function call does not return unless all actions have |
2365 | finished. | |
2366 | ||
2367 | A prototype for @code{fsync} can be found in @file{unistd.h}. | |
2368 | ||
04b9968b | 2369 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
2370 | is a problem if the thread allocates some resources (like memory, file |
2371 | descriptors, semaphores or whatever) at the time @code{fsync} is | |
19e4c7dd | 2372 | called. If the thread gets canceled these resources stay allocated |
04b9968b UD |
2373 | until the program ends. To avoid this, calls to @code{fsync} should be |
2374 | protected using cancellation handlers. | |
dfd2257a UD |
2375 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
2376 | ||
49c091e5 | 2377 | The return value of the function is zero if no error occurred. Otherwise |
010fe231 | 2378 | it is @math{-1} and the global variable @code{errno} is set to the |
dfd2257a UD |
2379 | following values: |
2380 | @table @code | |
2381 | @item EBADF | |
2382 | The descriptor @var{fildes} is not valid. | |
2383 | ||
2384 | @item EINVAL | |
2385 | No synchronization is possible since the system does not implement this. | |
2386 | @end table | |
2387 | @end deftypefun | |
2388 | ||
2389 | Sometimes it is not even necessary to write all data associated with a | |
2390 | file descriptor. E.g., in database files which do not change in size it | |
2391 | is enough to write all the file content data to the device. | |
19e4c7dd | 2392 | Meta-information, like the modification time etc., are not that important |
dfd2257a | 2393 | and leaving such information uncommitted does not prevent a successful |
9739d2d5 | 2394 | recovery of the file in case of a problem. |
dfd2257a | 2395 | |
dfd2257a | 2396 | @deftypefun int fdatasync (int @var{fildes}) |
d08a7e4c | 2397 | @standards{POSIX, unistd.h} |
2cc3615c | 2398 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
04b9968b | 2399 | When a call to the @code{fdatasync} function returns, it is ensured |
dfd2257a | 2400 | that all of the file data is written to the device. For all pending I/O |
04b9968b | 2401 | operations, the parts guaranteeing data integrity finished. |
dfd2257a UD |
2402 | |
2403 | Not all systems implement the @code{fdatasync} operation. On systems | |
2404 | missing this functionality @code{fdatasync} is emulated by a call to | |
2405 | @code{fsync} since the performed actions are a superset of those | |
19e4c7dd | 2406 | required by @code{fdatasync}. |
dfd2257a UD |
2407 | |
2408 | The prototype for @code{fdatasync} is in @file{unistd.h}. | |
2409 | ||
49c091e5 | 2410 | The return value of the function is zero if no error occurred. Otherwise |
010fe231 | 2411 | it is @math{-1} and the global variable @code{errno} is set to the |
dfd2257a UD |
2412 | following values: |
2413 | @table @code | |
2414 | @item EBADF | |
2415 | The descriptor @var{fildes} is not valid. | |
2416 | ||
2417 | @item EINVAL | |
2418 | No synchronization is possible since the system does not implement this. | |
2419 | @end table | |
2420 | @end deftypefun | |
2421 | ||
2422 | ||
b07d03e0 UD |
2423 | @node Asynchronous I/O |
2424 | @section Perform I/O Operations in Parallel | |
2425 | ||
2426 | The POSIX.1b standard defines a new set of I/O operations which can | |
9739d2d5 | 2427 | significantly reduce the time an application spends waiting for I/O. The |
b07d03e0 | 2428 | new functions allow a program to initiate one or more I/O operations and |
04b9968b UD |
2429 | then immediately resume normal work while the I/O operations are |
2430 | executed in parallel. This functionality is available if the | |
a3a4a74e | 2431 | @file{unistd.h} file defines the symbol @code{_POSIX_ASYNCHRONOUS_IO}. |
b07d03e0 UD |
2432 | |
2433 | These functions are part of the library with realtime functions named | |
2434 | @file{librt}. They are not actually part of the @file{libc} binary. | |
2435 | The implementation of these functions can be done using support in the | |
c756c71c UD |
2436 | kernel (if available) or using an implementation based on threads at |
2437 | userlevel. In the latter case it might be necessary to link applications | |
fed8f7f7 | 2438 | with the thread library @file{libpthread} in addition to @file{librt}. |
b07d03e0 | 2439 | |
c756c71c | 2440 | All AIO operations operate on files which were opened previously. There |
04b9968b | 2441 | might be arbitrarily many operations running for one file. The |
b07d03e0 UD |
2442 | asynchronous I/O operations are controlled using a data structure named |
2443 | @code{struct aiocb} (@dfn{AIO control block}). It is defined in | |
2444 | @file{aio.h} as follows. | |
2445 | ||
b07d03e0 | 2446 | @deftp {Data Type} {struct aiocb} |
d08a7e4c | 2447 | @standards{POSIX.1b, aio.h} |
b07d03e0 UD |
2448 | The POSIX.1b standard mandates that the @code{struct aiocb} structure |
2449 | contains at least the members described in the following table. There | |
04b9968b | 2450 | might be more elements which are used by the implementation, but |
19e4c7dd | 2451 | depending upon these elements is not portable and is highly deprecated. |
b07d03e0 UD |
2452 | |
2453 | @table @code | |
2454 | @item int aio_fildes | |
19e4c7dd AJ |
2455 | This element specifies the file descriptor to be used for the |
2456 | operation. It must be a legal descriptor, otherwise the operation will | |
2457 | fail. | |
b07d03e0 UD |
2458 | |
2459 | The device on which the file is opened must allow the seek operation. | |
2460 | I.e., it is not possible to use any of the AIO operations on devices | |
2461 | like terminals where an @code{lseek} call would lead to an error. | |
2462 | ||
2463 | @item off_t aio_offset | |
19e4c7dd | 2464 | This element specifies the offset in the file at which the operation (input |
fed8f7f7 | 2465 | or output) is performed. Since the operations are carried out in arbitrary |
b07d03e0 UD |
2466 | order and more than one operation for one file descriptor can be |
2467 | started, one cannot expect a current read/write position of the file | |
2468 | descriptor. | |
2469 | ||
2470 | @item volatile void *aio_buf | |
2471 | This is a pointer to the buffer with the data to be written or the place | |
c756c71c | 2472 | where the read data is stored. |
b07d03e0 UD |
2473 | |
2474 | @item size_t aio_nbytes | |
2475 | This element specifies the length of the buffer pointed to by @code{aio_buf}. | |
2476 | ||
2477 | @item int aio_reqprio | |
c756c71c | 2478 | If the platform has defined @code{_POSIX_PRIORITIZED_IO} and |
19e4c7dd | 2479 | @code{_POSIX_PRIORITY_SCHEDULING}, the AIO requests are |
b07d03e0 UD |
2480 | processed based on the current scheduling priority. The |
2481 | @code{aio_reqprio} element can then be used to lower the priority of the | |
2482 | AIO operation. | |
2483 | ||
2484 | @item struct sigevent aio_sigevent | |
2485 | This element specifies how the calling process is notified once the | |
fed8f7f7 | 2486 | operation terminates. If the @code{sigev_notify} element is |
19e4c7dd AJ |
2487 | @code{SIGEV_NONE}, no notification is sent. If it is @code{SIGEV_SIGNAL}, |
2488 | the signal determined by @code{sigev_signo} is sent. Otherwise, | |
2489 | @code{sigev_notify} must be @code{SIGEV_THREAD}. In this case, a thread | |
c756c71c | 2490 | is created which starts executing the function pointed to by |
b07d03e0 UD |
2491 | @code{sigev_notify_function}. |
2492 | ||
2493 | @item int aio_lio_opcode | |
2494 | This element is only used by the @code{lio_listio} and | |
04b9968b UD |
2495 | @code{lio_listio64} functions. Since these functions allow an |
2496 | arbitrary number of operations to start at once, and each operation can be | |
2497 | input or output (or nothing), the information must be stored in the | |
b07d03e0 UD |
2498 | control block. The possible values are: |
2499 | ||
2500 | @vtable @code | |
2501 | @item LIO_READ | |
2502 | Start a read operation. Read from the file at position | |
2503 | @code{aio_offset} and store the next @code{aio_nbytes} bytes in the | |
2504 | buffer pointed to by @code{aio_buf}. | |
2505 | ||
2506 | @item LIO_WRITE | |
2507 | Start a write operation. Write @code{aio_nbytes} bytes starting at | |
2508 | @code{aio_buf} into the file starting at position @code{aio_offset}. | |
2509 | ||
2510 | @item LIO_NOP | |
2511 | Do nothing for this control block. This value is useful sometimes when | |
2512 | an array of @code{struct aiocb} values contains holes, i.e., some of the | |
fed8f7f7 | 2513 | values must not be handled although the whole array is presented to the |
b07d03e0 UD |
2514 | @code{lio_listio} function. |
2515 | @end vtable | |
2516 | @end table | |
a3a4a74e | 2517 | |
fed8f7f7 | 2518 | When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a |
19e4c7dd | 2519 | 32 bit machine, this type is in fact @code{struct aiocb64}, since the LFS |
a3a4a74e UD |
2520 | interface transparently replaces the @code{struct aiocb} definition. |
2521 | @end deftp | |
2522 | ||
19e4c7dd | 2523 | For use with the AIO functions defined in the LFS, there is a similar type |
a3a4a74e | 2524 | defined which replaces the types of the appropriate members with larger |
04b9968b | 2525 | types but otherwise is equivalent to @code{struct aiocb}. Particularly, |
a3a4a74e UD |
2526 | all member names are the same. |
2527 | ||
a3a4a74e | 2528 | @deftp {Data Type} {struct aiocb64} |
d08a7e4c | 2529 | @standards{POSIX.1b, aio.h} |
a3a4a74e UD |
2530 | @table @code |
2531 | @item int aio_fildes | |
2532 | This element specifies the file descriptor which is used for the | |
2533 | operation. It must be a legal descriptor since otherwise the operation | |
2534 | fails for obvious reasons. | |
2535 | ||
2536 | The device on which the file is opened must allow the seek operation. | |
2537 | I.e., it is not possible to use any of the AIO operations on devices | |
2538 | like terminals where an @code{lseek} call would lead to an error. | |
2539 | ||
2540 | @item off64_t aio_offset | |
04b9968b | 2541 | This element specifies at which offset in the file the operation (input |
a3a4a74e UD |
2542 | or output) is performed. Since the operation are carried in arbitrary |
2543 | order and more than one operation for one file descriptor can be | |
2544 | started, one cannot expect a current read/write position of the file | |
2545 | descriptor. | |
2546 | ||
2547 | @item volatile void *aio_buf | |
2548 | This is a pointer to the buffer with the data to be written or the place | |
19e4c7dd | 2549 | where the read data is stored. |
a3a4a74e UD |
2550 | |
2551 | @item size_t aio_nbytes | |
2552 | This element specifies the length of the buffer pointed to by @code{aio_buf}. | |
2553 | ||
2554 | @item int aio_reqprio | |
2555 | If for the platform @code{_POSIX_PRIORITIZED_IO} and | |
04b9968b | 2556 | @code{_POSIX_PRIORITY_SCHEDULING} are defined the AIO requests are |
a3a4a74e UD |
2557 | processed based on the current scheduling priority. The |
2558 | @code{aio_reqprio} element can then be used to lower the priority of the | |
2559 | AIO operation. | |
2560 | ||
2561 | @item struct sigevent aio_sigevent | |
2562 | This element specifies how the calling process is notified once the | |
9739d2d5 | 2563 | operation terminates. If the @code{sigev_notify} element is |
19e4c7dd AJ |
2564 | @code{SIGEV_NONE} no notification is sent. If it is @code{SIGEV_SIGNAL}, |
2565 | the signal determined by @code{sigev_signo} is sent. Otherwise, | |
a3a4a74e | 2566 | @code{sigev_notify} must be @code{SIGEV_THREAD} in which case a thread |
9739d2d5 | 2567 | is created which starts executing the function pointed to by |
a3a4a74e UD |
2568 | @code{sigev_notify_function}. |
2569 | ||
2570 | @item int aio_lio_opcode | |
2571 | This element is only used by the @code{lio_listio} and | |
9739d2d5 | 2572 | @code{lio_listio64} functions. Since these functions allow an |
04b9968b UD |
2573 | arbitrary number of operations to start at once, and since each operation can be |
2574 | input or output (or nothing), the information must be stored in the | |
a3a4a74e UD |
2575 | control block. See the description of @code{struct aiocb} for a description |
2576 | of the possible values. | |
2577 | @end table | |
2578 | ||
2579 | When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a | |
19e4c7dd AJ |
2580 | 32 bit machine, this type is available under the name @code{struct |
2581 | aiocb64}, since the LFS transparently replaces the old interface. | |
b07d03e0 UD |
2582 | @end deftp |
2583 | ||
2584 | @menu | |
a3a4a74e UD |
2585 | * Asynchronous Reads/Writes:: Asynchronous Read and Write Operations. |
2586 | * Status of AIO Operations:: Getting the Status of AIO Operations. | |
2587 | * Synchronizing AIO Operations:: Getting into a consistent state. | |
04b9968b | 2588 | * Cancel AIO Operations:: Cancellation of AIO Operations. |
a3a4a74e | 2589 | * Configuration of AIO:: How to optimize the AIO implementation. |
b07d03e0 UD |
2590 | @end menu |
2591 | ||
a3a4a74e UD |
2592 | @node Asynchronous Reads/Writes |
2593 | @subsection Asynchronous Read and Write Operations | |
b07d03e0 | 2594 | |
b07d03e0 | 2595 | @deftypefun int aio_read (struct aiocb *@var{aiocbp}) |
d08a7e4c | 2596 | @standards{POSIX.1b, aio.h} |
2cc3615c AO |
2597 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
2598 | @c Calls aio_enqueue_request. | |
2599 | @c aio_enqueue_request @asulock @ascuheap @aculock @acsmem | |
2600 | @c pthread_self ok | |
2601 | @c pthread_getschedparam @asulock @aculock | |
2602 | @c lll_lock (pthread descriptor's lock) @asulock @aculock | |
2603 | @c sched_getparam ok | |
2604 | @c sched_getscheduler ok | |
2605 | @c lll_unlock @aculock | |
2606 | @c pthread_mutex_lock (aio_requests_mutex) @asulock @aculock | |
2607 | @c get_elem @ascuheap @acsmem [@asucorrupt @acucorrupt] | |
2608 | @c realloc @ascuheap @acsmem | |
2609 | @c calloc @ascuheap @acsmem | |
2610 | @c aio_create_helper_thread @asulock @ascuheap @aculock @acsmem | |
2611 | @c pthread_attr_init ok | |
2612 | @c pthread_attr_setdetachstate ok | |
2613 | @c pthread_get_minstack ok | |
2614 | @c pthread_attr_setstacksize ok | |
2615 | @c sigfillset ok | |
2616 | @c memset ok | |
2617 | @c sigdelset ok | |
2618 | @c SYSCALL rt_sigprocmask ok | |
2619 | @c pthread_create @asulock @ascuheap @aculock @acsmem | |
2620 | @c lll_lock (default_pthread_attr_lock) @asulock @aculock | |
2621 | @c alloca/malloc @ascuheap @acsmem | |
2622 | @c lll_unlock @aculock | |
2623 | @c allocate_stack @asulock @ascuheap @aculock @acsmem | |
2624 | @c getpagesize dup | |
2625 | @c lll_lock (default_pthread_attr_lock) @asulock @aculock | |
2626 | @c lll_unlock @aculock | |
2627 | @c _dl_allocate_tls @ascuheap @acsmem | |
2628 | @c _dl_allocate_tls_storage @ascuheap @acsmem | |
2629 | @c memalign @ascuheap @acsmem | |
2630 | @c memset ok | |
2631 | @c allocate_dtv dup | |
2632 | @c free @ascuheap @acsmem | |
2633 | @c allocate_dtv @ascuheap @acsmem | |
2634 | @c calloc @ascuheap @acsmem | |
2635 | @c INSTALL_DTV ok | |
2636 | @c list_add dup | |
2637 | @c get_cached_stack | |
2638 | @c lll_lock (stack_cache_lock) @asulock @aculock | |
2639 | @c list_for_each ok | |
2640 | @c list_entry dup | |
2641 | @c FREE_P dup | |
2642 | @c stack_list_del dup | |
2643 | @c stack_list_add dup | |
2644 | @c lll_unlock @aculock | |
2645 | @c _dl_allocate_tls_init ok | |
2646 | @c GET_DTV ok | |
2647 | @c mmap ok | |
d1babeb3 | 2648 | @c atomic_fetch_add_relaxed ok |
2cc3615c AO |
2649 | @c munmap ok |
2650 | @c change_stack_perm ok | |
2651 | @c mprotect ok | |
2652 | @c mprotect ok | |
2653 | @c stack_list_del dup | |
2654 | @c _dl_deallocate_tls dup | |
2655 | @c munmap ok | |
2656 | @c THREAD_COPY_STACK_GUARD ok | |
2657 | @c THREAD_COPY_POINTER_GUARD ok | |
22f4ab2d | 2658 | @c atomic_exchange_acquire ok |
2cc3615c AO |
2659 | @c lll_futex_wake ok |
2660 | @c deallocate_stack @asulock @ascuheap @aculock @acsmem | |
2661 | @c lll_lock (state_cache_lock) @asulock @aculock | |
2662 | @c stack_list_del ok | |
2663 | @c atomic_write_barrier ok | |
2664 | @c list_del ok | |
2665 | @c atomic_write_barrier ok | |
2666 | @c queue_stack @ascuheap @acsmem | |
2667 | @c stack_list_add ok | |
2668 | @c atomic_write_barrier ok | |
2669 | @c list_add ok | |
2670 | @c atomic_write_barrier ok | |
2671 | @c free_stacks @ascuheap @acsmem | |
2672 | @c list_for_each_prev_safe ok | |
2673 | @c list_entry ok | |
2674 | @c FREE_P ok | |
2675 | @c stack_list_del dup | |
2676 | @c _dl_deallocate_tls dup | |
2677 | @c munmap ok | |
2678 | @c _dl_deallocate_tls @ascuheap @acsmem | |
2679 | @c free @ascuheap @acsmem | |
2680 | @c lll_unlock @aculock | |
2681 | @c create_thread @asulock @ascuheap @aculock @acsmem | |
2682 | @c td_eventword | |
2683 | @c td_eventmask | |
2684 | @c do_clone @asulock @ascuheap @aculock @acsmem | |
2685 | @c PREPARE_CREATE ok | |
2686 | @c lll_lock (pd->lock) @asulock @aculock | |
d1babeb3 | 2687 | @c atomic_fetch_add_relaxed ok |
2cc3615c | 2688 | @c clone ok |
a364a3a7 | 2689 | @c atomic_fetch_add_relaxed ok |
22f4ab2d | 2690 | @c atomic_exchange_acquire ok |
2cc3615c AO |
2691 | @c lll_futex_wake ok |
2692 | @c deallocate_stack dup | |
2693 | @c sched_setaffinity ok | |
2694 | @c tgkill ok | |
2695 | @c sched_setscheduler ok | |
2696 | @c atomic_compare_and_exchange_bool_acq ok | |
2697 | @c nptl_create_event ok | |
2698 | @c lll_unlock (pd->lock) @aculock | |
2699 | @c free @ascuheap @acsmem | |
2700 | @c pthread_attr_destroy ok (cpuset won't be set, so free isn't called) | |
2701 | @c add_request_to_runlist ok | |
2702 | @c pthread_cond_signal ok | |
2703 | @c aio_free_request ok | |
2704 | @c pthread_mutex_unlock @aculock | |
2705 | ||
2706 | @c (in the new thread, initiated with clone) | |
2707 | @c start_thread ok | |
2708 | @c HP_TIMING_NOW ok | |
2709 | @c ctype_init @mtslocale | |
22f4ab2d | 2710 | @c atomic_exchange_acquire ok |
2cc3615c AO |
2711 | @c lll_futex_wake ok |
2712 | @c sigemptyset ok | |
2713 | @c sigaddset ok | |
2714 | @c setjmp ok | |
ce0b7961 | 2715 | @c LIBC_CANCEL_ASYNC -> __pthread_enable_asynccancel ok |
2cc3615c AO |
2716 | @c do_cancel ok |
2717 | @c pthread_unwind ok | |
2718 | @c Unwind_ForcedUnwind or longjmp ok [@ascuheap @acsmem?] | |
2719 | @c lll_lock @asulock @aculock | |
2720 | @c lll_unlock @asulock @aculock | |
ce0b7961 | 2721 | @c LIBC_CANCEL_RESET -> __pthread_disable_asynccancel ok |
2cc3615c AO |
2722 | @c lll_futex_wait ok |
2723 | @c ->start_routine ok ----- | |
2724 | @c call_tls_dtors @asulock @ascuheap @aculock @acsmem | |
2725 | @c user-supplied dtor | |
2726 | @c rtld_lock_lock_recursive (dl_load_lock) @asulock @aculock | |
2727 | @c rtld_lock_unlock_recursive @aculock | |
2728 | @c free @ascuheap @acsmem | |
2729 | @c nptl_deallocate_tsd @ascuheap @acsmem | |
2730 | @c tsd user-supplied dtors ok | |
2731 | @c free @ascuheap @acsmem | |
2732 | @c libc_thread_freeres | |
2733 | @c libc_thread_subfreeres ok | |
4a07fbb6 | 2734 | @c atomic_fetch_add_relaxed ok |
2cc3615c AO |
2735 | @c td_eventword ok |
2736 | @c td_eventmask ok | |
2737 | @c atomic_compare_exchange_bool_acq ok | |
2738 | @c nptl_death_event ok | |
2739 | @c lll_robust_dead ok | |
2740 | @c getpagesize ok | |
2741 | @c madvise ok | |
2742 | @c free_tcb @asulock @ascuheap @aculock @acsmem | |
2743 | @c free @ascuheap @acsmem | |
2744 | @c deallocate_stack @asulock @ascuheap @aculock @acsmem | |
2745 | @c lll_futex_wait ok | |
2746 | @c exit_thread_inline ok | |
2747 | @c syscall(exit) ok | |
2748 | ||
04b9968b UD |
2749 | This function initiates an asynchronous read operation. It |
2750 | immediately returns after the operation was enqueued or when an | |
fed8f7f7 | 2751 | error was encountered. |
b07d03e0 | 2752 | |
a3a4a74e | 2753 | The first @code{aiocbp->aio_nbytes} bytes of the file for which |
c756c71c UD |
2754 | @code{aiocbp->aio_fildes} is a descriptor are written to the buffer |
2755 | starting at @code{aiocbp->aio_buf}. Reading starts at the absolute | |
2756 | position @code{aiocbp->aio_offset} in the file. | |
b07d03e0 UD |
2757 | |
2758 | If prioritized I/O is supported by the platform the | |
2759 | @code{aiocbp->aio_reqprio} value is used to adjust the priority before | |
2760 | the request is actually enqueued. | |
2761 | ||
2762 | The calling process is notified about the termination of the read | |
2763 | request according to the @code{aiocbp->aio_sigevent} value. | |
2764 | ||
04b9968b | 2765 | When @code{aio_read} returns, the return value is zero if no error |
b07d03e0 | 2766 | occurred that can be found before the process is enqueued. If such an |
04b9968b UD |
2767 | early error is found, the function returns @math{-1} and sets |
2768 | @code{errno} to one of the following values: | |
b07d03e0 UD |
2769 | |
2770 | @table @code | |
2771 | @item EAGAIN | |
2772 | The request was not enqueued due to (temporarily) exceeded resource | |
2773 | limitations. | |
2774 | @item ENOSYS | |
2775 | The @code{aio_read} function is not implemented. | |
2776 | @item EBADF | |
2777 | The @code{aiocbp->aio_fildes} descriptor is not valid. This condition | |
04b9968b | 2778 | need not be recognized before enqueueing the request and so this error |
fed8f7f7 | 2779 | might also be signaled asynchronously. |
b07d03e0 UD |
2780 | @item EINVAL |
2781 | The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is | |
2782 | invalid. This condition need not be recognized before enqueueing the | |
49c091e5 | 2783 | request and so this error might also be signaled asynchronously. |
b07d03e0 UD |
2784 | @end table |
2785 | ||
04b9968b UD |
2786 | If @code{aio_read} returns zero, the current status of the request |
2787 | can be queried using @code{aio_error} and @code{aio_return} functions. | |
2788 | As long as the value returned by @code{aio_error} is @code{EINPROGRESS} | |
2789 | the operation has not yet completed. If @code{aio_error} returns zero, | |
78759725 UD |
2790 | the operation successfully terminated, otherwise the value is to be |
2791 | interpreted as an error code. If the function terminated, the result of | |
2792 | the operation can be obtained using a call to @code{aio_return}. The | |
2793 | returned value is the same as an equivalent call to @code{read} would | |
04b9968b | 2794 | have returned. Possible error codes returned by @code{aio_error} are: |
b07d03e0 UD |
2795 | |
2796 | @table @code | |
2797 | @item EBADF | |
2798 | The @code{aiocbp->aio_fildes} descriptor is not valid. | |
2799 | @item ECANCELED | |
19e4c7dd | 2800 | The operation was canceled before the operation was finished |
b07d03e0 UD |
2801 | (@pxref{Cancel AIO Operations}) |
2802 | @item EINVAL | |
2803 | The @code{aiocbp->aio_offset} value is invalid. | |
2804 | @end table | |
a3a4a74e UD |
2805 | |
2806 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2807 | function is in fact @code{aio_read64} since the LFS interface transparently | |
2808 | replaces the normal implementation. | |
b07d03e0 UD |
2809 | @end deftypefun |
2810 | ||
8ded91fb | 2811 | @deftypefun int aio_read64 (struct aiocb64 *@var{aiocbp}) |
d08a7e4c | 2812 | @standards{Unix98, aio.h} |
2cc3615c | 2813 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
b07d03e0 | 2814 | This function is similar to the @code{aio_read} function. The only |
19e4c7dd AJ |
2815 | difference is that on @w{32 bit} machines, the file descriptor should |
2816 | be opened in the large file mode. Internally, @code{aio_read64} uses | |
a3a4a74e UD |
2817 | functionality equivalent to @code{lseek64} (@pxref{File Position |
2818 | Primitive}) to position the file descriptor correctly for the reading, | |
9739d2d5 | 2819 | as opposed to the @code{lseek} functionality used in @code{aio_read}. |
a3a4a74e | 2820 | |
19e4c7dd | 2821 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
a3a4a74e | 2822 | function is available under the name @code{aio_read} and so transparently |
04b9968b | 2823 | replaces the interface for small files on 32 bit machines. |
b07d03e0 UD |
2824 | @end deftypefun |
2825 | ||
19e4c7dd | 2826 | To write data asynchronously to a file, there exists an equivalent pair |
a3a4a74e UD |
2827 | of functions with a very similar interface. |
2828 | ||
a3a4a74e | 2829 | @deftypefun int aio_write (struct aiocb *@var{aiocbp}) |
d08a7e4c | 2830 | @standards{POSIX.1b, aio.h} |
2cc3615c | 2831 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
a3a4a74e UD |
2832 | This function initiates an asynchronous write operation. The function |
2833 | call immediately returns after the operation was enqueued or if before | |
fed8f7f7 | 2834 | this happens an error was encountered. |
a3a4a74e UD |
2835 | |
2836 | The first @code{aiocbp->aio_nbytes} bytes from the buffer starting at | |
2837 | @code{aiocbp->aio_buf} are written to the file for which | |
9dcc8f11 | 2838 | @code{aiocbp->aio_fildes} is a descriptor, starting at the absolute |
a3a4a74e UD |
2839 | position @code{aiocbp->aio_offset} in the file. |
2840 | ||
19e4c7dd | 2841 | If prioritized I/O is supported by the platform, the |
a3a4a74e UD |
2842 | @code{aiocbp->aio_reqprio} value is used to adjust the priority before |
2843 | the request is actually enqueued. | |
2844 | ||
2845 | The calling process is notified about the termination of the read | |
2846 | request according to the @code{aiocbp->aio_sigevent} value. | |
2847 | ||
19e4c7dd | 2848 | When @code{aio_write} returns, the return value is zero if no error |
a3a4a74e UD |
2849 | occurred that can be found before the process is enqueued. If such an |
2850 | early error is found the function returns @math{-1} and sets | |
2851 | @code{errno} to one of the following values. | |
2852 | ||
2853 | @table @code | |
2854 | @item EAGAIN | |
2855 | The request was not enqueued due to (temporarily) exceeded resource | |
2856 | limitations. | |
2857 | @item ENOSYS | |
2858 | The @code{aio_write} function is not implemented. | |
2859 | @item EBADF | |
2860 | The @code{aiocbp->aio_fildes} descriptor is not valid. This condition | |
19e4c7dd | 2861 | may not be recognized before enqueueing the request, and so this error |
fed8f7f7 | 2862 | might also be signaled asynchronously. |
a3a4a74e | 2863 | @item EINVAL |
19e4c7dd AJ |
2864 | The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqprio} value is |
2865 | invalid. This condition may not be recognized before enqueueing the | |
fed8f7f7 | 2866 | request and so this error might also be signaled asynchronously. |
a3a4a74e UD |
2867 | @end table |
2868 | ||
19e4c7dd | 2869 | In the case @code{aio_write} returns zero, the current status of the |
9739d2d5 | 2870 | request can be queried using the @code{aio_error} and @code{aio_return} |
c756c71c | 2871 | functions. As long as the value returned by @code{aio_error} is |
a3a4a74e | 2872 | @code{EINPROGRESS} the operation has not yet completed. If |
19e4c7dd | 2873 | @code{aio_error} returns zero, the operation successfully terminated, |
a3a4a74e | 2874 | otherwise the value is to be interpreted as an error code. If the |
9739d2d5 | 2875 | function terminated, the result of the operation can be obtained using a call |
a3a4a74e | 2876 | to @code{aio_return}. The returned value is the same as an equivalent |
19e4c7dd | 2877 | call to @code{read} would have returned. Possible error codes returned |
a3a4a74e UD |
2878 | by @code{aio_error} are: |
2879 | ||
2880 | @table @code | |
2881 | @item EBADF | |
2882 | The @code{aiocbp->aio_fildes} descriptor is not valid. | |
2883 | @item ECANCELED | |
19e4c7dd | 2884 | The operation was canceled before the operation was finished. |
a3a4a74e UD |
2885 | (@pxref{Cancel AIO Operations}) |
2886 | @item EINVAL | |
2887 | The @code{aiocbp->aio_offset} value is invalid. | |
2888 | @end table | |
2889 | ||
19e4c7dd | 2890 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
a3a4a74e UD |
2891 | function is in fact @code{aio_write64} since the LFS interface transparently |
2892 | replaces the normal implementation. | |
2893 | @end deftypefun | |
2894 | ||
8ded91fb | 2895 | @deftypefun int aio_write64 (struct aiocb64 *@var{aiocbp}) |
d08a7e4c | 2896 | @standards{Unix98, aio.h} |
2cc3615c | 2897 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
a3a4a74e | 2898 | This function is similar to the @code{aio_write} function. The only |
04b9968b | 2899 | difference is that on @w{32 bit} machines the file descriptor should |
a3a4a74e UD |
2900 | be opened in the large file mode. Internally @code{aio_write64} uses |
2901 | functionality equivalent to @code{lseek64} (@pxref{File Position | |
2902 | Primitive}) to position the file descriptor correctly for the writing, | |
9739d2d5 | 2903 | as opposed to the @code{lseek} functionality used in @code{aio_write}. |
a3a4a74e | 2904 | |
19e4c7dd | 2905 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
a3a4a74e | 2906 | function is available under the name @code{aio_write} and so transparently |
04b9968b | 2907 | replaces the interface for small files on 32 bit machines. |
a3a4a74e UD |
2908 | @end deftypefun |
2909 | ||
19e4c7dd AJ |
2910 | Besides these functions with the more or less traditional interface, |
2911 | POSIX.1b also defines a function which can initiate more than one | |
2912 | operation at a time, and which can handle freely mixed read and write | |
2913 | operations. It is therefore similar to a combination of @code{readv} and | |
a3a4a74e UD |
2914 | @code{writev}. |
2915 | ||
a3a4a74e | 2916 | @deftypefun int lio_listio (int @var{mode}, struct aiocb *const @var{list}[], int @var{nent}, struct sigevent *@var{sig}) |
d08a7e4c | 2917 | @standards{POSIX.1b, aio.h} |
2cc3615c AO |
2918 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
2919 | @c Call lio_listio_internal, that takes the aio_requests_mutex lock and | |
2920 | @c enqueues each request. Then, it waits for notification or prepares | |
2921 | @c for it before releasing the lock. Even though it performs memory | |
2922 | @c allocation and locking of its own, it doesn't add any classes of | |
2923 | @c safety issues that aren't already covered by aio_enqueue_request. | |
a3a4a74e UD |
2924 | The @code{lio_listio} function can be used to enqueue an arbitrary |
2925 | number of read and write requests at one time. The requests can all be | |
2926 | meant for the same file, all for different files or every solution in | |
2927 | between. | |
2928 | ||
2929 | @code{lio_listio} gets the @var{nent} requests from the array pointed to | |
19e4c7dd | 2930 | by @var{list}. The operation to be performed is determined by the |
a3a4a74e | 2931 | @code{aio_lio_opcode} member in each element of @var{list}. If this |
19e4c7dd | 2932 | field is @code{LIO_READ} a read operation is enqueued, similar to a call |
a3a4a74e UD |
2933 | of @code{aio_read} for this element of the array (except that the way |
2934 | the termination is signalled is different, as we will see below). If | |
19e4c7dd | 2935 | the @code{aio_lio_opcode} member is @code{LIO_WRITE} a write operation |
a3a4a74e UD |
2936 | is enqueued. Otherwise the @code{aio_lio_opcode} must be @code{LIO_NOP} |
2937 | in which case this element of @var{list} is simply ignored. This | |
2938 | ``operation'' is useful in situations where one has a fixed array of | |
2939 | @code{struct aiocb} elements from which only a few need to be handled at | |
2940 | a time. Another situation is where the @code{lio_listio} call was | |
19e4c7dd | 2941 | canceled before all requests are processed (@pxref{Cancel AIO |
a3a4a74e UD |
2942 | Operations}) and the remaining requests have to be reissued. |
2943 | ||
fed8f7f7 | 2944 | The other members of each element of the array pointed to by |
a3a4a74e UD |
2945 | @code{list} must have values suitable for the operation as described in |
2946 | the documentation for @code{aio_read} and @code{aio_write} above. | |
2947 | ||
2948 | The @var{mode} argument determines how @code{lio_listio} behaves after | |
2949 | having enqueued all the requests. If @var{mode} is @code{LIO_WAIT} it | |
2950 | waits until all requests terminated. Otherwise @var{mode} must be | |
fed8f7f7 | 2951 | @code{LIO_NOWAIT} and in this case the function returns immediately after |
a3a4a74e UD |
2952 | having enqueued all the requests. In this case the caller gets a |
2953 | notification of the termination of all requests according to the | |
2954 | @var{sig} parameter. If @var{sig} is @code{NULL} no notification is | |
9739d2d5 | 2955 | sent. Otherwise a signal is sent or a thread is started, just as |
a3a4a74e UD |
2956 | described in the description for @code{aio_read} or @code{aio_write}. |
2957 | ||
19e4c7dd | 2958 | If @var{mode} is @code{LIO_WAIT}, the return value of @code{lio_listio} |
a3a4a74e | 2959 | is @math{0} when all requests completed successfully. Otherwise the |
9739d2d5 | 2960 | function returns @math{-1} and @code{errno} is set accordingly. To find |
a3a4a74e UD |
2961 | out which request or requests failed one has to use the @code{aio_error} |
2962 | function on all the elements of the array @var{list}. | |
2963 | ||
19e4c7dd | 2964 | In case @var{mode} is @code{LIO_NOWAIT}, the function returns @math{0} if |
a3a4a74e UD |
2965 | all requests were enqueued correctly. The current state of the requests |
2966 | can be found using @code{aio_error} and @code{aio_return} as described | |
19e4c7dd | 2967 | above. If @code{lio_listio} returns @math{-1} in this mode, the |
a3a4a74e | 2968 | global variable @code{errno} is set accordingly. If a request did not |
19e4c7dd AJ |
2969 | yet terminate, a call to @code{aio_error} returns @code{EINPROGRESS}. If |
2970 | the value is different, the request is finished and the error value (or | |
a3a4a74e UD |
2971 | @math{0}) is returned and the result of the operation can be retrieved |
2972 | using @code{aio_return}. | |
2973 | ||
2974 | Possible values for @code{errno} are: | |
2975 | ||
2976 | @table @code | |
2977 | @item EAGAIN | |
19e4c7dd | 2978 | The resources necessary to queue all the requests are not available at |
a3a4a74e | 2979 | the moment. The error status for each element of @var{list} must be |
19e4c7dd | 2980 | checked to determine which request failed. |
a3a4a74e | 2981 | |
fed8f7f7 | 2982 | Another reason could be that the system wide limit of AIO requests is |
a7a93d50 | 2983 | exceeded. This cannot be the case for the implementation on @gnusystems{} |
a3a4a74e UD |
2984 | since no arbitrary limits exist. |
2985 | @item EINVAL | |
2986 | The @var{mode} parameter is invalid or @var{nent} is larger than | |
2987 | @code{AIO_LISTIO_MAX}. | |
2988 | @item EIO | |
2989 | One or more of the request's I/O operations failed. The error status of | |
19e4c7dd | 2990 | each request should be checked to determine which one failed. |
a3a4a74e UD |
2991 | @item ENOSYS |
2992 | The @code{lio_listio} function is not supported. | |
2993 | @end table | |
2994 | ||
2995 | If the @var{mode} parameter is @code{LIO_NOWAIT} and the caller cancels | |
19e4c7dd | 2996 | a request, the error status for this request returned by |
a3a4a74e UD |
2997 | @code{aio_error} is @code{ECANCELED}. |
2998 | ||
19e4c7dd | 2999 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
a3a4a74e UD |
3000 | function is in fact @code{lio_listio64} since the LFS interface |
3001 | transparently replaces the normal implementation. | |
3002 | @end deftypefun | |
3003 | ||
8ded91fb | 3004 | @deftypefun int lio_listio64 (int @var{mode}, struct aiocb64 *const @var{list}[], int @var{nent}, struct sigevent *@var{sig}) |
d08a7e4c | 3005 | @standards{Unix98, aio.h} |
2cc3615c | 3006 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
19e4c7dd AJ |
3007 | This function is similar to the @code{lio_listio} function. The only |
3008 | difference is that on @w{32 bit} machines, the file descriptor should | |
3009 | be opened in the large file mode. Internally, @code{lio_listio64} uses | |
a3a4a74e UD |
3010 | functionality equivalent to @code{lseek64} (@pxref{File Position |
3011 | Primitive}) to position the file descriptor correctly for the reading or | |
9739d2d5 | 3012 | writing, as opposed to the @code{lseek} functionality used in |
a3a4a74e UD |
3013 | @code{lio_listio}. |
3014 | ||
19e4c7dd | 3015 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
a3a4a74e | 3016 | function is available under the name @code{lio_listio} and so |
04b9968b | 3017 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
3018 | machines. |
3019 | @end deftypefun | |
3020 | ||
3021 | @node Status of AIO Operations | |
3022 | @subsection Getting the Status of AIO Operations | |
3023 | ||
fed8f7f7 | 3024 | As already described in the documentation of the functions in the last |
04b9968b UD |
3025 | section, it must be possible to get information about the status of an I/O |
3026 | request. When the operation is performed truly asynchronously (as with | |
19e4c7dd AJ |
3027 | @code{aio_read} and @code{aio_write} and with @code{lio_listio} when the |
3028 | mode is @code{LIO_NOWAIT}), one sometimes needs to know whether a | |
3029 | specific request already terminated and if so, what the result was. | |
04b9968b | 3030 | The following two functions allow you to get this kind of information. |
a3a4a74e | 3031 | |
a3a4a74e | 3032 | @deftypefun int aio_error (const struct aiocb *@var{aiocbp}) |
d08a7e4c | 3033 | @standards{POSIX.1b, aio.h} |
2cc3615c | 3034 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
a3a4a74e | 3035 | This function determines the error state of the request described by the |
fed8f7f7 | 3036 | @code{struct aiocb} variable pointed to by @var{aiocbp}. If the |
a3a4a74e UD |
3037 | request has not yet terminated the value returned is always |
3038 | @code{EINPROGRESS}. Once the request has terminated the value | |
3039 | @code{aio_error} returns is either @math{0} if the request completed | |
fed8f7f7 | 3040 | successfully or it returns the value which would be stored in the |
a3a4a74e UD |
3041 | @code{errno} variable if the request would have been done using |
3042 | @code{read}, @code{write}, or @code{fsync}. | |
3043 | ||
3044 | The function can return @code{ENOSYS} if it is not implemented. It | |
3045 | could also return @code{EINVAL} if the @var{aiocbp} parameter does not | |
3046 | refer to an asynchronous operation whose return status is not yet known. | |
3047 | ||
3048 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
3049 | function is in fact @code{aio_error64} since the LFS interface | |
3050 | transparently replaces the normal implementation. | |
3051 | @end deftypefun | |
3052 | ||
a3a4a74e | 3053 | @deftypefun int aio_error64 (const struct aiocb64 *@var{aiocbp}) |
d08a7e4c | 3054 | @standards{Unix98, aio.h} |
2cc3615c | 3055 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
a3a4a74e UD |
3056 | This function is similar to @code{aio_error} with the only difference |
3057 | that the argument is a reference to a variable of type @code{struct | |
3058 | aiocb64}. | |
3059 | ||
3060 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
3061 | function is available under the name @code{aio_error} and so | |
04b9968b | 3062 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
3063 | machines. |
3064 | @end deftypefun | |
3065 | ||
8ded91fb | 3066 | @deftypefun ssize_t aio_return (struct aiocb *@var{aiocbp}) |
d08a7e4c | 3067 | @standards{POSIX.1b, aio.h} |
2cc3615c | 3068 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
a3a4a74e UD |
3069 | This function can be used to retrieve the return status of the operation |
3070 | carried out by the request described in the variable pointed to by | |
3071 | @var{aiocbp}. As long as the error status of this request as returned | |
9739d2d5 | 3072 | by @code{aio_error} is @code{EINPROGRESS} the return value of this function is |
a3a4a74e UD |
3073 | undefined. |
3074 | ||
fed8f7f7 UD |
3075 | Once the request is finished this function can be used exactly once to |
3076 | retrieve the return value. Following calls might lead to undefined | |
19e4c7dd | 3077 | behavior. The return value itself is the value which would have been |
a3a4a74e UD |
3078 | returned by the @code{read}, @code{write}, or @code{fsync} call. |
3079 | ||
3080 | The function can return @code{ENOSYS} if it is not implemented. It | |
3081 | could also return @code{EINVAL} if the @var{aiocbp} parameter does not | |
3082 | refer to an asynchronous operation whose return status is not yet known. | |
3083 | ||
3084 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
3085 | function is in fact @code{aio_return64} since the LFS interface | |
3086 | transparently replaces the normal implementation. | |
3087 | @end deftypefun | |
3088 | ||
8ded91fb | 3089 | @deftypefun ssize_t aio_return64 (struct aiocb64 *@var{aiocbp}) |
d08a7e4c | 3090 | @standards{Unix98, aio.h} |
2cc3615c | 3091 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
a3a4a74e UD |
3092 | This function is similar to @code{aio_return} with the only difference |
3093 | that the argument is a reference to a variable of type @code{struct | |
3094 | aiocb64}. | |
3095 | ||
3096 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
3097 | function is available under the name @code{aio_return} and so | |
04b9968b | 3098 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
3099 | machines. |
3100 | @end deftypefun | |
3101 | ||
3102 | @node Synchronizing AIO Operations | |
3103 | @subsection Getting into a Consistent State | |
3104 | ||
3105 | When dealing with asynchronous operations it is sometimes necessary to | |
fed8f7f7 | 3106 | get into a consistent state. This would mean for AIO that one wants to |
9739d2d5 | 3107 | know whether a certain request or a group of requests were processed. |
a3a4a74e | 3108 | This could be done by waiting for the notification sent by the system |
04b9968b | 3109 | after the operation terminated, but this sometimes would mean wasting |
a3a4a74e UD |
3110 | resources (mainly computation time). Instead POSIX.1b defines two |
3111 | functions which will help with most kinds of consistency. | |
3112 | ||
3113 | The @code{aio_fsync} and @code{aio_fsync64} functions are only available | |
19e4c7dd | 3114 | if the symbol @code{_POSIX_SYNCHRONIZED_IO} is defined in @file{unistd.h}. |
a3a4a74e UD |
3115 | |
3116 | @cindex synchronizing | |
a3a4a74e | 3117 | @deftypefun int aio_fsync (int @var{op}, struct aiocb *@var{aiocbp}) |
d08a7e4c | 3118 | @standards{POSIX.1b, aio.h} |
2cc3615c AO |
3119 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
3120 | @c After fcntl to check that the FD is open, it calls | |
3121 | @c aio_enqueue_request. | |
9739d2d5 | 3122 | Calling this function forces all I/O operations queued at the |
fed8f7f7 | 3123 | time of the function call operating on the file descriptor |
a3a4a74e | 3124 | @code{aiocbp->aio_fildes} into the synchronized I/O completion state |
04b9968b | 3125 | (@pxref{Synchronizing I/O}). The @code{aio_fsync} function returns |
a3a4a74e UD |
3126 | immediately but the notification through the method described in |
3127 | @code{aiocbp->aio_sigevent} will happen only after all requests for this | |
04b9968b | 3128 | file descriptor have terminated and the file is synchronized. This also |
a3a4a74e | 3129 | means that requests for this very same file descriptor which are queued |
04b9968b | 3130 | after the synchronization request are not affected. |
a3a4a74e UD |
3131 | |
3132 | If @var{op} is @code{O_DSYNC} the synchronization happens as with a call | |
3133 | to @code{fdatasync}. Otherwise @var{op} should be @code{O_SYNC} and | |
fed8f7f7 | 3134 | the synchronization happens as with @code{fsync}. |
a3a4a74e | 3135 | |
19e4c7dd | 3136 | As long as the synchronization has not happened, a call to |
a3a4a74e | 3137 | @code{aio_error} with the reference to the object pointed to by |
fed8f7f7 UD |
3138 | @var{aiocbp} returns @code{EINPROGRESS}. Once the synchronization is |
3139 | done @code{aio_error} return @math{0} if the synchronization was not | |
a3a4a74e UD |
3140 | successful. Otherwise the value returned is the value to which the |
3141 | @code{fsync} or @code{fdatasync} function would have set the | |
3142 | @code{errno} variable. In this case nothing can be assumed about the | |
9739d2d5 | 3143 | consistency of the data written to this file descriptor. |
a3a4a74e UD |
3144 | |
3145 | The return value of this function is @math{0} if the request was | |
19e4c7dd | 3146 | successfully enqueued. Otherwise the return value is @math{-1} and |
a3a4a74e UD |
3147 | @code{errno} is set to one of the following values: |
3148 | ||
3149 | @table @code | |
3150 | @item EAGAIN | |
fed8f7f7 | 3151 | The request could not be enqueued due to temporary lack of resources. |
a3a4a74e | 3152 | @item EBADF |
47792506 | 3153 | The file descriptor @code{@var{aiocbp}->aio_fildes} is not valid. |
a3a4a74e UD |
3154 | @item EINVAL |
3155 | The implementation does not support I/O synchronization or the @var{op} | |
3156 | parameter is other than @code{O_DSYNC} and @code{O_SYNC}. | |
3157 | @item ENOSYS | |
3158 | This function is not implemented. | |
3159 | @end table | |
3160 | ||
3161 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
37de3d55 | 3162 | function is in fact @code{aio_fsync64} since the LFS interface |
a3a4a74e UD |
3163 | transparently replaces the normal implementation. |
3164 | @end deftypefun | |
3165 | ||
a3a4a74e | 3166 | @deftypefun int aio_fsync64 (int @var{op}, struct aiocb64 *@var{aiocbp}) |
d08a7e4c | 3167 | @standards{Unix98, aio.h} |
2cc3615c | 3168 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
a3a4a74e UD |
3169 | This function is similar to @code{aio_fsync} with the only difference |
3170 | that the argument is a reference to a variable of type @code{struct | |
3171 | aiocb64}. | |
3172 | ||
3173 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
3174 | function is available under the name @code{aio_fsync} and so | |
04b9968b | 3175 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
3176 | machines. |
3177 | @end deftypefun | |
3178 | ||
fed8f7f7 | 3179 | Another method of synchronization is to wait until one or more requests of a |
a3a4a74e UD |
3180 | specific set terminated. This could be achieved by the @code{aio_*} |
3181 | functions to notify the initiating process about the termination but in | |
3182 | some situations this is not the ideal solution. In a program which | |
3183 | constantly updates clients somehow connected to the server it is not | |
3184 | always the best solution to go round robin since some connections might | |
9739d2d5 | 3185 | be slow. On the other hand letting the @code{aio_*} functions notify the |
a3a4a74e | 3186 | caller might also be not the best solution since whenever the process |
9739d2d5 | 3187 | works on preparing data for a client it makes no sense to be |
a3a4a74e UD |
3188 | interrupted by a notification since the new client will not be handled |
3189 | before the current client is served. For situations like this | |
3190 | @code{aio_suspend} should be used. | |
3191 | ||
a3a4a74e | 3192 | @deftypefun int aio_suspend (const struct aiocb *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) |
d08a7e4c | 3193 | @standards{POSIX.1b, aio.h} |
2cc3615c AO |
3194 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} |
3195 | @c Take aio_requests_mutex, set up waitlist and requestlist, wait | |
3196 | @c for completion or timeout, and release the mutex. | |
19e4c7dd | 3197 | When calling this function, the calling thread is suspended until at |
a3a4a74e | 3198 | least one of the requests pointed to by the @var{nent} elements of the |
19e4c7dd AJ |
3199 | array @var{list} has completed. If any of the requests has already |
3200 | completed at the time @code{aio_suspend} is called, the function returns | |
3201 | immediately. Whether a request has terminated or not is determined by | |
a3a4a74e | 3202 | comparing the error status of the request with @code{EINPROGRESS}. If |
19e4c7dd | 3203 | an element of @var{list} is @code{NULL}, the entry is simply ignored. |
a3a4a74e | 3204 | |
19e4c7dd AJ |
3205 | If no request has finished, the calling process is suspended. If |
3206 | @var{timeout} is @code{NULL}, the process is not woken until a request | |
3207 | has finished. If @var{timeout} is not @code{NULL}, the process remains | |
3208 | suspended at least as long as specified in @var{timeout}. In this case, | |
a3a4a74e UD |
3209 | @code{aio_suspend} returns with an error. |
3210 | ||
fed8f7f7 | 3211 | The return value of the function is @math{0} if one or more requests |
a3a4a74e UD |
3212 | from the @var{list} have terminated. Otherwise the function returns |
3213 | @math{-1} and @code{errno} is set to one of the following values: | |
3214 | ||
3215 | @table @code | |
3216 | @item EAGAIN | |
3217 | None of the requests from the @var{list} completed in the time specified | |
3218 | by @var{timeout}. | |
3219 | @item EINTR | |
3220 | A signal interrupted the @code{aio_suspend} function. This signal might | |
3221 | also be sent by the AIO implementation while signalling the termination | |
3222 | of one of the requests. | |
3223 | @item ENOSYS | |
3224 | The @code{aio_suspend} function is not implemented. | |
3225 | @end table | |
3226 | ||
3227 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
3228 | function is in fact @code{aio_suspend64} since the LFS interface | |
3229 | transparently replaces the normal implementation. | |
3230 | @end deftypefun | |
3231 | ||
a3a4a74e | 3232 | @deftypefun int aio_suspend64 (const struct aiocb64 *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) |
d08a7e4c | 3233 | @standards{Unix98, aio.h} |
2cc3615c | 3234 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} |
a3a4a74e UD |
3235 | This function is similar to @code{aio_suspend} with the only difference |
3236 | that the argument is a reference to a variable of type @code{struct | |
3237 | aiocb64}. | |
3238 | ||
3239 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
3240 | function is available under the name @code{aio_suspend} and so | |
04b9968b | 3241 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
3242 | machines. |
3243 | @end deftypefun | |
b07d03e0 UD |
3244 | |
3245 | @node Cancel AIO Operations | |
04b9968b | 3246 | @subsection Cancellation of AIO Operations |
b07d03e0 | 3247 | |
19e4c7dd | 3248 | When one or more requests are asynchronously processed, it might be |
a3a4a74e | 3249 | useful in some situations to cancel a selected operation, e.g., if it |
19e4c7dd AJ |
3250 | becomes obvious that the written data is no longer accurate and would |
3251 | have to be overwritten soon. As an example, assume an application, which | |
a3a4a74e UD |
3252 | writes data in files in a situation where new incoming data would have |
3253 | to be written in a file which will be updated by an enqueued request. | |
19e4c7dd AJ |
3254 | The POSIX AIO implementation provides such a function, but this function |
3255 | is not capable of forcing the cancellation of the request. It is up to the | |
a3a4a74e UD |
3256 | implementation to decide whether it is possible to cancel the operation |
3257 | or not. Therefore using this function is merely a hint. | |
3258 | ||
a3a4a74e | 3259 | @deftypefun int aio_cancel (int @var{fildes}, struct aiocb *@var{aiocbp}) |
d08a7e4c | 3260 | @standards{POSIX.1b, aio.h} |
2cc3615c AO |
3261 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
3262 | @c After fcntl to check the fd is open, hold aio_requests_mutex, call | |
3263 | @c aio_find_req_fd, aio_remove_request, then aio_notify and | |
3264 | @c aio_free_request each request before releasing the lock. | |
3265 | @c aio_notify calls aio_notify_only and free, besides cond signal or | |
3266 | @c similar. aio_notify_only calls pthread_attr_init, | |
3267 | @c pthread_attr_setdetachstate, malloc, pthread_create, | |
3268 | @c notify_func_wrapper, aio_sigqueue, getpid, raise. | |
3269 | @c notify_func_wraper calls aio_start_notify_thread, free and then the | |
3270 | @c notifier function. | |
a3a4a74e | 3271 | The @code{aio_cancel} function can be used to cancel one or more |
19e4c7dd AJ |
3272 | outstanding requests. If the @var{aiocbp} parameter is @code{NULL}, the |
3273 | function tries to cancel all of the outstanding requests which would process | |
3274 | the file descriptor @var{fildes} (i.e., whose @code{aio_fildes} member | |
3275 | is @var{fildes}). If @var{aiocbp} is not @code{NULL}, @code{aio_cancel} | |
3276 | attempts to cancel the specific request pointed to by @var{aiocbp}. | |
a3a4a74e | 3277 | |
19e4c7dd | 3278 | For requests which were successfully canceled, the normal notification |
a3a4a74e UD |
3279 | about the termination of the request should take place. I.e., depending |
3280 | on the @code{struct sigevent} object which controls this, nothing | |
3281 | happens, a signal is sent or a thread is started. If the request cannot | |
19e4c7dd | 3282 | be canceled, it terminates the usual way after performing the operation. |
a3a4a74e | 3283 | |
19e4c7dd | 3284 | After a request is successfully canceled, a call to @code{aio_error} with |
a3a4a74e UD |
3285 | a reference to this request as the parameter will return |
3286 | @code{ECANCELED} and a call to @code{aio_return} will return @math{-1}. | |
19e4c7dd | 3287 | If the request wasn't canceled and is still running the error status is |
a3a4a74e UD |
3288 | still @code{EINPROGRESS}. |
3289 | ||
3290 | The return value of the function is @code{AIO_CANCELED} if there were | |
19e4c7dd AJ |
3291 | requests which haven't terminated and which were successfully canceled. |
3292 | If there is one or more requests left which couldn't be canceled, the | |
a3a4a74e | 3293 | return value is @code{AIO_NOTCANCELED}. In this case @code{aio_error} |
9739d2d5 | 3294 | must be used to find out which of the, perhaps multiple, requests (if |
19e4c7dd | 3295 | @var{aiocbp} is @code{NULL}) weren't successfully canceled. If all |
a3a4a74e UD |
3296 | requests already terminated at the time @code{aio_cancel} is called the |
3297 | return value is @code{AIO_ALLDONE}. | |
3298 | ||
3299 | If an error occurred during the execution of @code{aio_cancel} the | |
3300 | function returns @math{-1} and sets @code{errno} to one of the following | |
3301 | values. | |
3302 | ||
3303 | @table @code | |
3304 | @item EBADF | |
3305 | The file descriptor @var{fildes} is not valid. | |
3306 | @item ENOSYS | |
3307 | @code{aio_cancel} is not implemented. | |
3308 | @end table | |
3309 | ||
19e4c7dd | 3310 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
a3a4a74e UD |
3311 | function is in fact @code{aio_cancel64} since the LFS interface |
3312 | transparently replaces the normal implementation. | |
3313 | @end deftypefun | |
3314 | ||
19e4c7dd | 3315 | @deftypefun int aio_cancel64 (int @var{fildes}, struct aiocb64 *@var{aiocbp}) |
d08a7e4c | 3316 | @standards{Unix98, aio.h} |
2cc3615c | 3317 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
a3a4a74e UD |
3318 | This function is similar to @code{aio_cancel} with the only difference |
3319 | that the argument is a reference to a variable of type @code{struct | |
3320 | aiocb64}. | |
3321 | ||
19e4c7dd | 3322 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
a3a4a74e | 3323 | function is available under the name @code{aio_cancel} and so |
04b9968b | 3324 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
3325 | machines. |
3326 | @end deftypefun | |
3327 | ||
3328 | @node Configuration of AIO | |
3329 | @subsection How to optimize the AIO implementation | |
3330 | ||
3331 | The POSIX standard does not specify how the AIO functions are | |
19e4c7dd | 3332 | implemented. They could be system calls, but it is also possible to |
a3a4a74e UD |
3333 | emulate them at userlevel. |
3334 | ||
9739d2d5 | 3335 | At the time of writing, the available implementation is a user-level |
19e4c7dd AJ |
3336 | implementation which uses threads for handling the enqueued requests. |
3337 | While this implementation requires making some decisions about | |
9739d2d5 | 3338 | limitations, hard limitations are something best avoided |
1f77f049 | 3339 | in @theglibc{}. Therefore, @theglibc{} provides a means |
19e4c7dd | 3340 | for tuning the AIO implementation according to the individual use. |
a3a4a74e | 3341 | |
a3a4a74e | 3342 | @deftp {Data Type} {struct aioinit} |
d08a7e4c | 3343 | @standards{GNU, aio.h} |
a3a4a74e UD |
3344 | This data type is used to pass the configuration or tunable parameters |
3345 | to the implementation. The program has to initialize the members of | |
3346 | this struct and pass it to the implementation using the @code{aio_init} | |
3347 | function. | |
3348 | ||
3349 | @table @code | |
3350 | @item int aio_threads | |
19e4c7dd | 3351 | This member specifies the maximal number of threads which may be used |
a3a4a74e UD |
3352 | at any one time. |
3353 | @item int aio_num | |
c756c71c | 3354 | This number provides an estimate on the maximal number of simultaneously |
a3a4a74e UD |
3355 | enqueued requests. |
3356 | @item int aio_locks | |
19e4c7dd | 3357 | Unused. |
a3a4a74e | 3358 | @item int aio_usedba |
19e4c7dd | 3359 | Unused. |
a3a4a74e | 3360 | @item int aio_debug |
19e4c7dd | 3361 | Unused. |
a3a4a74e | 3362 | @item int aio_numusers |
19e4c7dd | 3363 | Unused. |
a3a4a74e | 3364 | @item int aio_reserved[2] |
19e4c7dd | 3365 | Unused. |
a3a4a74e UD |
3366 | @end table |
3367 | @end deftp | |
3368 | ||
a3a4a74e | 3369 | @deftypefun void aio_init (const struct aioinit *@var{init}) |
d08a7e4c | 3370 | @standards{GNU, aio.h} |
2cc3615c AO |
3371 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} |
3372 | @c All changes to global objects are guarded by aio_requests_mutex. | |
a3a4a74e | 3373 | This function must be called before any other AIO function. Calling it |
19e4c7dd AJ |
3374 | is completely voluntary, as it is only meant to help the AIO |
3375 | implementation perform better. | |
a3a4a74e | 3376 | |
9739d2d5 | 3377 | Before calling @code{aio_init}, the members of a variable of |
a3a4a74e UD |
3378 | type @code{struct aioinit} must be initialized. Then a reference to |
3379 | this variable is passed as the parameter to @code{aio_init} which itself | |
3380 | may or may not pay attention to the hints. | |
3381 | ||
c756c71c | 3382 | The function has no return value and no error cases are defined. It is |
9739d2d5 | 3383 | an extension which follows a proposal from the SGI implementation in |
c756c71c | 3384 | @w{Irix 6}. It is not covered by POSIX.1b or Unix98. |
a3a4a74e | 3385 | @end deftypefun |
b07d03e0 | 3386 | |
28f540f4 RM |
3387 | @node Control Operations |
3388 | @section Control Operations on Files | |
3389 | ||
3390 | @cindex control operations on files | |
3391 | @cindex @code{fcntl} function | |
3392 | This section describes how you can perform various other operations on | |
3393 | file descriptors, such as inquiring about or setting flags describing | |
3394 | the status of the file descriptor, manipulating record locks, and the | |
3395 | like. All of these operations are performed by the function @code{fcntl}. | |
3396 | ||
3397 | The second argument to the @code{fcntl} function is a command that | |
3398 | specifies which operation to perform. The function and macros that name | |
3399 | various flags that are used with it are declared in the header file | |
3400 | @file{fcntl.h}. Many of these flags are also used by the @code{open} | |
3401 | function; see @ref{Opening and Closing Files}. | |
3402 | @pindex fcntl.h | |
3403 | ||
28f540f4 | 3404 | @deftypefun int fcntl (int @var{filedes}, int @var{command}, @dots{}) |
d08a7e4c | 3405 | @standards{POSIX.1, fcntl.h} |
2cc3615c | 3406 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
3407 | The @code{fcntl} function performs the operation specified by |
3408 | @var{command} on the file descriptor @var{filedes}. Some commands | |
3409 | require additional arguments to be supplied. These additional arguments | |
3410 | and the return value and error conditions are given in the detailed | |
3411 | descriptions of the individual commands. | |
3412 | ||
6c0be743 DD |
3413 | Briefly, here is a list of what the various commands are. For an |
3414 | exhaustive list of kernel-specific options, please see @xref{System | |
3415 | Calls}. | |
28f540f4 | 3416 | |
2fe82ca6 | 3417 | @vtable @code |
28f540f4 RM |
3418 | @item F_DUPFD |
3419 | Duplicate the file descriptor (return another file descriptor pointing | |
3420 | to the same open file). @xref{Duplicating Descriptors}. | |
3421 | ||
3422 | @item F_GETFD | |
3423 | Get flags associated with the file descriptor. @xref{Descriptor Flags}. | |
3424 | ||
3425 | @item F_SETFD | |
3426 | Set flags associated with the file descriptor. @xref{Descriptor Flags}. | |
3427 | ||
3428 | @item F_GETFL | |
3429 | Get flags associated with the open file. @xref{File Status Flags}. | |
3430 | ||
3431 | @item F_SETFL | |
3432 | Set flags associated with the open file. @xref{File Status Flags}. | |
3433 | ||
3434 | @item F_GETLK | |
0961f7e1 | 3435 | Test a file lock. @xref{File Locks}. |
28f540f4 RM |
3436 | |
3437 | @item F_SETLK | |
3438 | Set or clear a file lock. @xref{File Locks}. | |
3439 | ||
3440 | @item F_SETLKW | |
3441 | Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. | |
3442 | ||
0961f7e1 JL |
3443 | @item F_OFD_GETLK |
3444 | Test an open file description lock. @xref{Open File Description Locks}. | |
3445 | Specific to Linux. | |
3446 | ||
3447 | @item F_OFD_SETLK | |
3448 | Set or clear an open file description lock. @xref{Open File Description Locks}. | |
3449 | Specific to Linux. | |
3450 | ||
3451 | @item F_OFD_SETLKW | |
3452 | Like @code{F_OFD_SETLK}, but block until lock is acquired. | |
3453 | @xref{Open File Description Locks}. Specific to Linux. | |
3454 | ||
28f540f4 RM |
3455 | @item F_GETOWN |
3456 | Get process or process group ID to receive @code{SIGIO} signals. | |
3457 | @xref{Interrupt Input}. | |
3458 | ||
3459 | @item F_SETOWN | |
3460 | Set process or process group ID to receive @code{SIGIO} signals. | |
3461 | @xref{Interrupt Input}. | |
2fe82ca6 | 3462 | @end vtable |
dfd2257a | 3463 | |
06ab719d AZ |
3464 | This function is a cancellation point in multi-threaded programs for the |
3465 | commands @code{F_SETLKW} (and the LFS analogous @code{F_SETLKW64}) and | |
0b11b649 | 3466 | @code{F_OFD_SETLKW}. This is a problem if the thread allocates some |
06ab719d AZ |
3467 | resources (like memory, file descriptors, semaphores or whatever) at the time |
3468 | @code{fcntl} is called. If the thread gets canceled these resources stay | |
3469 | allocated until the program ends. To avoid this calls to @code{fcntl} should | |
3470 | be protected using cancellation handlers. | |
dfd2257a | 3471 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
28f540f4 RM |
3472 | @end deftypefun |
3473 | ||
3474 | ||
3475 | @node Duplicating Descriptors | |
3476 | @section Duplicating Descriptors | |
3477 | ||
3478 | @cindex duplicating file descriptors | |
3479 | @cindex redirecting input and output | |
3480 | ||
3481 | You can @dfn{duplicate} a file descriptor, or allocate another file | |
3482 | descriptor that refers to the same open file as the original. Duplicate | |
3483 | descriptors share one file position and one set of file status flags | |
3484 | (@pxref{File Status Flags}), but each has its own set of file descriptor | |
3485 | flags (@pxref{Descriptor Flags}). | |
3486 | ||
3487 | The major use of duplicating a file descriptor is to implement | |
3488 | @dfn{redirection} of input or output: that is, to change the | |
3489 | file or pipe that a particular file descriptor corresponds to. | |
3490 | ||
3491 | You can perform this operation using the @code{fcntl} function with the | |
3492 | @code{F_DUPFD} command, but there are also convenient functions | |
3493 | @code{dup} and @code{dup2} for duplicating descriptors. | |
3494 | ||
3495 | @pindex unistd.h | |
3496 | @pindex fcntl.h | |
3497 | The @code{fcntl} function and flags are declared in @file{fcntl.h}, | |
3498 | while prototypes for @code{dup} and @code{dup2} are in the header file | |
3499 | @file{unistd.h}. | |
3500 | ||
28f540f4 | 3501 | @deftypefun int dup (int @var{old}) |
d08a7e4c | 3502 | @standards{POSIX.1, unistd.h} |
2cc3615c | 3503 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
3504 | This function copies descriptor @var{old} to the first available |
3505 | descriptor number (the first number not currently open). It is | |
3506 | equivalent to @code{fcntl (@var{old}, F_DUPFD, 0)}. | |
3507 | @end deftypefun | |
3508 | ||
28f540f4 | 3509 | @deftypefun int dup2 (int @var{old}, int @var{new}) |
d08a7e4c | 3510 | @standards{POSIX.1, unistd.h} |
2cc3615c | 3511 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
3512 | This function copies the descriptor @var{old} to descriptor number |
3513 | @var{new}. | |
3514 | ||
3515 | If @var{old} is an invalid descriptor, then @code{dup2} does nothing; it | |
3516 | does not close @var{new}. Otherwise, the new duplicate of @var{old} | |
3517 | replaces any previous meaning of descriptor @var{new}, as if @var{new} | |
3518 | were closed first. | |
3519 | ||
3520 | If @var{old} and @var{new} are different numbers, and @var{old} is a | |
3521 | valid descriptor number, then @code{dup2} is equivalent to: | |
3522 | ||
3523 | @smallexample | |
3524 | close (@var{new}); | |
3525 | fcntl (@var{old}, F_DUPFD, @var{new}) | |
3526 | @end smallexample | |
3527 | ||
3528 | However, @code{dup2} does this atomically; there is no instant in the | |
3529 | middle of calling @code{dup2} at which @var{new} is closed and not yet a | |
3530 | duplicate of @var{old}. | |
3531 | @end deftypefun | |
3532 | ||
a07e000e DD |
3533 | @deftypefun int dup3 (int @var{old}, int @var{new}, int @var{flags}) |
3534 | @standards{Linux, unistd.h} | |
3535 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
3536 | This function is the same as @code{dup2} but creates the new | |
3537 | descriptor as if it had been opened with flags @var{flags}. The only | |
3538 | allowed flag is @code{O_CLOEXEC}. | |
3539 | @end deftypefun | |
3540 | ||
28f540f4 | 3541 | @deftypevr Macro int F_DUPFD |
d08a7e4c | 3542 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3543 | This macro is used as the @var{command} argument to @code{fcntl}, to |
3544 | copy the file descriptor given as the first argument. | |
3545 | ||
3546 | The form of the call in this case is: | |
3547 | ||
3548 | @smallexample | |
3549 | fcntl (@var{old}, F_DUPFD, @var{next-filedes}) | |
3550 | @end smallexample | |
3551 | ||
3552 | The @var{next-filedes} argument is of type @code{int} and specifies that | |
3553 | the file descriptor returned should be the next available one greater | |
3554 | than or equal to this value. | |
3555 | ||
3556 | The return value from @code{fcntl} with this command is normally the value | |
07435eb4 | 3557 | of the new file descriptor. A return value of @math{-1} indicates an |
28f540f4 RM |
3558 | error. The following @code{errno} error conditions are defined for |
3559 | this command: | |
3560 | ||
3561 | @table @code | |
3562 | @item EBADF | |
3563 | The @var{old} argument is invalid. | |
3564 | ||
3565 | @item EINVAL | |
3566 | The @var{next-filedes} argument is invalid. | |
3567 | ||
3568 | @item EMFILE | |
3569 | There are no more file descriptors available---your program is already | |
3570 | using the maximum. In BSD and GNU, the maximum is controlled by a | |
3571 | resource limit that can be changed; @pxref{Limits on Resources}, for | |
3572 | more information about the @code{RLIMIT_NOFILE} limit. | |
3573 | @end table | |
3574 | ||
3575 | @code{ENFILE} is not a possible error code for @code{dup2} because | |
3576 | @code{dup2} does not create a new opening of a file; duplicate | |
3577 | descriptors do not count toward the limit which @code{ENFILE} | |
3578 | indicates. @code{EMFILE} is possible because it refers to the limit on | |
3579 | distinct descriptor numbers in use in one process. | |
3580 | @end deftypevr | |
3581 | ||
3582 | Here is an example showing how to use @code{dup2} to do redirection. | |
3583 | Typically, redirection of the standard streams (like @code{stdin}) is | |
3584 | done by a shell or shell-like program before calling one of the | |
3585 | @code{exec} functions (@pxref{Executing a File}) to execute a new | |
3586 | program in a child process. When the new program is executed, it | |
3587 | creates and initializes the standard streams to point to the | |
3588 | corresponding file descriptors, before its @code{main} function is | |
3589 | invoked. | |
3590 | ||
3591 | So, to redirect standard input to a file, the shell could do something | |
3592 | like: | |
3593 | ||
3594 | @smallexample | |
3595 | pid = fork (); | |
3596 | if (pid == 0) | |
3597 | @{ | |
3598 | char *filename; | |
3599 | char *program; | |
3600 | int file; | |
3601 | @dots{} | |
3602 | file = TEMP_FAILURE_RETRY (open (filename, O_RDONLY)); | |
3603 | dup2 (file, STDIN_FILENO); | |
3604 | TEMP_FAILURE_RETRY (close (file)); | |
3605 | execv (program, NULL); | |
3606 | @} | |
3607 | @end smallexample | |
3608 | ||
3609 | There is also a more detailed example showing how to implement redirection | |
3610 | in the context of a pipeline of processes in @ref{Launching Jobs}. | |
3611 | ||
3612 | ||
3613 | @node Descriptor Flags | |
3614 | @section File Descriptor Flags | |
3615 | @cindex file descriptor flags | |
3616 | ||
3617 | @dfn{File descriptor flags} are miscellaneous attributes of a file | |
3618 | descriptor. These flags are associated with particular file | |
3619 | descriptors, so that if you have created duplicate file descriptors | |
3620 | from a single opening of a file, each descriptor has its own set of flags. | |
3621 | ||
3622 | Currently there is just one file descriptor flag: @code{FD_CLOEXEC}, | |
3623 | which causes the descriptor to be closed if you use any of the | |
3624 | @code{exec@dots{}} functions (@pxref{Executing a File}). | |
3625 | ||
3626 | The symbols in this section are defined in the header file | |
3627 | @file{fcntl.h}. | |
3628 | @pindex fcntl.h | |
3629 | ||
28f540f4 | 3630 | @deftypevr Macro int F_GETFD |
d08a7e4c | 3631 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3632 | This macro is used as the @var{command} argument to @code{fcntl}, to |
3633 | specify that it should return the file descriptor flags associated | |
2c6fe0bd | 3634 | with the @var{filedes} argument. |
28f540f4 RM |
3635 | |
3636 | The normal return value from @code{fcntl} with this command is a | |
3637 | nonnegative number which can be interpreted as the bitwise OR of the | |
3638 | individual flags (except that currently there is only one flag to use). | |
3639 | ||
07435eb4 | 3640 | In case of an error, @code{fcntl} returns @math{-1}. The following |
28f540f4 RM |
3641 | @code{errno} error conditions are defined for this command: |
3642 | ||
3643 | @table @code | |
3644 | @item EBADF | |
3645 | The @var{filedes} argument is invalid. | |
3646 | @end table | |
3647 | @end deftypevr | |
3648 | ||
3649 | ||
28f540f4 | 3650 | @deftypevr Macro int F_SETFD |
d08a7e4c | 3651 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3652 | This macro is used as the @var{command} argument to @code{fcntl}, to |
3653 | specify that it should set the file descriptor flags associated with the | |
3654 | @var{filedes} argument. This requires a third @code{int} argument to | |
3655 | specify the new flags, so the form of the call is: | |
3656 | ||
3657 | @smallexample | |
3658 | fcntl (@var{filedes}, F_SETFD, @var{new-flags}) | |
3659 | @end smallexample | |
3660 | ||
3661 | The normal return value from @code{fcntl} with this command is an | |
07435eb4 | 3662 | unspecified value other than @math{-1}, which indicates an error. |
28f540f4 RM |
3663 | The flags and error conditions are the same as for the @code{F_GETFD} |
3664 | command. | |
3665 | @end deftypevr | |
3666 | ||
3667 | The following macro is defined for use as a file descriptor flag with | |
3668 | the @code{fcntl} function. The value is an integer constant usable | |
3669 | as a bit mask value. | |
3670 | ||
28f540f4 | 3671 | @deftypevr Macro int FD_CLOEXEC |
d08a7e4c | 3672 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3673 | @cindex close-on-exec (file descriptor flag) |
3674 | This flag specifies that the file descriptor should be closed when | |
3675 | an @code{exec} function is invoked; see @ref{Executing a File}. When | |
3676 | a file descriptor is allocated (as with @code{open} or @code{dup}), | |
3677 | this bit is initially cleared on the new file descriptor, meaning that | |
3678 | descriptor will survive into the new program after @code{exec}. | |
3679 | @end deftypevr | |
3680 | ||
3681 | If you want to modify the file descriptor flags, you should get the | |
3682 | current flags with @code{F_GETFD} and modify the value. Don't assume | |
3683 | that the flags listed here are the only ones that are implemented; your | |
3684 | program may be run years from now and more flags may exist then. For | |
3685 | example, here is a function to set or clear the flag @code{FD_CLOEXEC} | |
3686 | without altering any other flags: | |
3687 | ||
3688 | @smallexample | |
3689 | /* @r{Set the @code{FD_CLOEXEC} flag of @var{desc} if @var{value} is nonzero,} | |
3690 | @r{or clear the flag if @var{value} is 0.} | |
2c6fe0bd | 3691 | @r{Return 0 on success, or -1 on error with @code{errno} set.} */ |
28f540f4 RM |
3692 | |
3693 | int | |
3694 | set_cloexec_flag (int desc, int value) | |
3695 | @{ | |
3696 | int oldflags = fcntl (desc, F_GETFD, 0); | |
8e96ae1a | 3697 | /* @r{If reading the flags failed, return error indication now.} */ |
28f540f4 RM |
3698 | if (oldflags < 0) |
3699 | return oldflags; | |
3700 | /* @r{Set just the flag we want to set.} */ | |
3701 | if (value != 0) | |
3702 | oldflags |= FD_CLOEXEC; | |
3703 | else | |
3704 | oldflags &= ~FD_CLOEXEC; | |
3705 | /* @r{Store modified flag word in the descriptor.} */ | |
3706 | return fcntl (desc, F_SETFD, oldflags); | |
3707 | @} | |
3708 | @end smallexample | |
3709 | ||
3710 | @node File Status Flags | |
3711 | @section File Status Flags | |
3712 | @cindex file status flags | |
3713 | ||
3714 | @dfn{File status flags} are used to specify attributes of the opening of a | |
3715 | file. Unlike the file descriptor flags discussed in @ref{Descriptor | |
3716 | Flags}, the file status flags are shared by duplicated file descriptors | |
3717 | resulting from a single opening of the file. The file status flags are | |
3718 | specified with the @var{flags} argument to @code{open}; | |
3719 | @pxref{Opening and Closing Files}. | |
3720 | ||
3721 | File status flags fall into three categories, which are described in the | |
3722 | following sections. | |
3723 | ||
3724 | @itemize @bullet | |
3725 | @item | |
3726 | @ref{Access Modes}, specify what type of access is allowed to the | |
3727 | file: reading, writing, or both. They are set by @code{open} and are | |
3728 | returned by @code{fcntl}, but cannot be changed. | |
3729 | ||
3730 | @item | |
3731 | @ref{Open-time Flags}, control details of what @code{open} will do. | |
3732 | These flags are not preserved after the @code{open} call. | |
3733 | ||
3734 | @item | |
3735 | @ref{Operating Modes}, affect how operations such as @code{read} and | |
3736 | @code{write} are done. They are set by @code{open}, and can be fetched or | |
3737 | changed with @code{fcntl}. | |
3738 | @end itemize | |
3739 | ||
3740 | The symbols in this section are defined in the header file | |
3741 | @file{fcntl.h}. | |
3742 | @pindex fcntl.h | |
3743 | ||
3744 | @menu | |
3745 | * Access Modes:: Whether the descriptor can read or write. | |
3746 | * Open-time Flags:: Details of @code{open}. | |
3747 | * Operating Modes:: Special modes to control I/O operations. | |
3748 | * Getting File Status Flags:: Fetching and changing these flags. | |
3749 | @end menu | |
3750 | ||
3751 | @node Access Modes | |
3752 | @subsection File Access Modes | |
3753 | ||
e960d831 FW |
3754 | The file access mode allows a file descriptor to be used for reading, |
3755 | writing, both, or neither. The access mode is determined when the file | |
3756 | is opened, and never change. | |
28f540f4 | 3757 | |
28f540f4 | 3758 | @deftypevr Macro int O_RDONLY |
d08a7e4c | 3759 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3760 | Open the file for read access. |
3761 | @end deftypevr | |
3762 | ||
28f540f4 | 3763 | @deftypevr Macro int O_WRONLY |
d08a7e4c | 3764 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3765 | Open the file for write access. |
3766 | @end deftypevr | |
3767 | ||
28f540f4 | 3768 | @deftypevr Macro int O_RDWR |
d08a7e4c | 3769 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3770 | Open the file for both reading and writing. |
3771 | @end deftypevr | |
3772 | ||
e960d831 FW |
3773 | @deftypevr Macro int O_PATH |
3774 | @standards{Linux, fcntl.h} | |
3775 | Obtain a file descriptor for the file, but do not open the file for | |
3776 | reading or writing. Permission checks for the file itself are skipped | |
3777 | when the file is opened (but permission to access the directory that | |
3778 | contains it is still needed), and permissions are checked when the | |
3779 | descriptor is used later on. | |
3780 | ||
3781 | For example, such descriptors can be used with the @code{fexecve} | |
3782 | function (@pxref{Executing a File}). | |
3783 | ||
3784 | This access mode is specific to Linux. On @gnuhurdsystems{}, it is | |
3785 | possible to use @code{O_EXEC} explicitly, or specify no access modes | |
3786 | at all (see below). | |
3787 | @end deftypevr | |
3788 | ||
3789 | The portable file access modes @code{O_RDONLY}, @code{O_WRONLY}, and | |
3790 | @code{O_RDWR} may not correspond to individual bits. To determine the | |
3791 | file access mode with @code{fcntl}, you must extract the access mode | |
3792 | bits from the retrieved file status flags, using the @code{O_ACCMODE} | |
3793 | mask. | |
3794 | ||
3795 | @deftypevr Macro int O_ACCMODE | |
3796 | @standards{POSIX.1, fcntl.h} | |
3797 | ||
3798 | This macro is a mask that can be bitwise-ANDed with the file status flag | |
3799 | value to recover the file access mode, assuming that a standard file | |
3800 | access mode is in use. | |
3801 | @end deftypevr | |
3802 | ||
3803 | If a non-standard file access mode is used (such as @code{O_PATH} or | |
3804 | @code{O_EXEC}), masking with @code{O_ACCMODE} may give incorrect | |
3805 | results. These non-standard access modes are identified by individual | |
3806 | bits and have to be checked directly (without masking with | |
3807 | @code{O_ACCMODE} first). | |
3808 | ||
3809 | On @gnuhurdsystems{} (but not on other systems), @code{O_RDONLY} and | |
28f540f4 RM |
3810 | @code{O_WRONLY} are independent bits that can be bitwise-ORed together, |
3811 | and it is valid for either bit to be set or clear. This means that | |
3812 | @code{O_RDWR} is the same as @code{O_RDONLY|O_WRONLY}. A file access | |
3813 | mode of zero is permissible; it allows no operations that do input or | |
3814 | output to the file, but does allow other operations such as | |
a7a93d50 | 3815 | @code{fchmod}. On @gnuhurdsystems{}, since ``read-only'' or ``write-only'' |
28f540f4 | 3816 | is a misnomer, @file{fcntl.h} defines additional names for the file |
e960d831 | 3817 | access modes. |
28f540f4 | 3818 | |
28f540f4 | 3819 | @deftypevr Macro int O_READ |
d08a7e4c | 3820 | @standards{GNU, fcntl.h (optional)} |
e960d831 | 3821 | Open the file for reading. Same as @code{O_RDONLY}; only defined on GNU/Hurd. |
28f540f4 RM |
3822 | @end deftypevr |
3823 | ||
28f540f4 | 3824 | @deftypevr Macro int O_WRITE |
d08a7e4c | 3825 | @standards{GNU, fcntl.h (optional)} |
e960d831 | 3826 | Open the file for writing. Same as @code{O_WRONLY}; only defined on GNU/Hurd. |
28f540f4 RM |
3827 | @end deftypevr |
3828 | ||
28f540f4 | 3829 | @deftypevr Macro int O_EXEC |
d08a7e4c | 3830 | @standards{GNU, fcntl.h (optional)} |
e960d831 | 3831 | Open the file for executing. Only defined on GNU/Hurd. |
28f540f4 RM |
3832 | @end deftypevr |
3833 | ||
3834 | @node Open-time Flags | |
3835 | @subsection Open-time Flags | |
3836 | ||
3837 | The open-time flags specify options affecting how @code{open} will behave. | |
3838 | These options are not preserved once the file is open. The exception to | |
3839 | this is @code{O_NONBLOCK}, which is also an I/O operating mode and so it | |
3840 | @emph{is} saved. @xref{Opening and Closing Files}, for how to call | |
3841 | @code{open}. | |
3842 | ||
3843 | There are two sorts of options specified by open-time flags. | |
3844 | ||
3845 | @itemize @bullet | |
3846 | @item | |
3847 | @dfn{File name translation flags} affect how @code{open} looks up the | |
3848 | file name to locate the file, and whether the file can be created. | |
3849 | @cindex file name translation flags | |
3850 | @cindex flags, file name translation | |
3851 | ||
3852 | @item | |
3853 | @dfn{Open-time action flags} specify extra operations that @code{open} will | |
3854 | perform on the file once it is open. | |
3855 | @cindex open-time action flags | |
3856 | @cindex flags, open-time action | |
3857 | @end itemize | |
3858 | ||
3859 | Here are the file name translation flags. | |
3860 | ||
28f540f4 | 3861 | @deftypevr Macro int O_CREAT |
d08a7e4c | 3862 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3863 | If set, the file will be created if it doesn't already exist. |
3864 | @c !!! mode arg, umask | |
3865 | @cindex create on open (file status flag) | |
3866 | @end deftypevr | |
3867 | ||
28f540f4 | 3868 | @deftypevr Macro int O_EXCL |
d08a7e4c | 3869 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3870 | If both @code{O_CREAT} and @code{O_EXCL} are set, then @code{open} fails |
3871 | if the specified file already exists. This is guaranteed to never | |
3872 | clobber an existing file. | |
b9af29c0 FW |
3873 | |
3874 | The @code{O_EXCL} flag has a special meaning in combination with | |
3875 | @code{O_TMPFILE}; see below. | |
3876 | @end deftypevr | |
3877 | ||
fef7c63c FW |
3878 | @deftypevr Macro int O_DIRECTORY |
3879 | @standards{POSIX.1, fcntl.h} | |
3880 | If set, the open operation fails if the given name is not the name of | |
3881 | a directory. The @code{errno} variable is set to @code{ENOTDIR} for | |
3882 | this error condition. | |
3883 | @end deftypevr | |
3884 | ||
ad14f4f8 FW |
3885 | @deftypevr Macro int O_NOFOLLOW |
3886 | @standards{POSIX.1, fcntl.h} | |
3887 | If set, the open operation fails if the final component of the file name | |
3888 | refers to a symbolic link. The @code{errno} variable is set to | |
3889 | @code{ELOOP} for this error condition. | |
3890 | @end deftypevr | |
3891 | ||
b9af29c0 FW |
3892 | @deftypevr Macro int O_TMPFILE |
3893 | @standards{GNU, fcntl.h} | |
3894 | If this flag is specified, functions in the @code{open} family create an | |
3895 | unnamed temporary file. In this case, the pathname argument to the | |
3896 | @code{open} family of functions (@pxref{Opening and Closing Files}) is | |
3897 | interpreted as the directory in which the temporary file is created | |
3898 | (thus determining the file system which provides the storage for the | |
3899 | file). The @code{O_TMPFILE} flag must be combined with @code{O_WRONLY} | |
3900 | or @code{O_RDWR}, and the @var{mode} argument is required. | |
3901 | ||
3902 | The temporary file can later be given a name using @code{linkat}, | |
3903 | turning it into a regular file. This allows the atomic creation of a | |
3904 | file with the specific file attributes (mode and extended attributes) | |
3905 | and file contents. If, for security reasons, it is not desirable that a | |
3906 | name can be given to the file, the @code{O_EXCL} flag can be specified | |
3907 | along with @code{O_TMPFILE}. | |
3908 | ||
3909 | Not all kernels support this open flag. If this flag is unsupported, an | |
3910 | attempt to create an unnamed temporary file fails with an error of | |
3911 | @code{EINVAL}. If the underlying file system does not support the | |
3912 | @code{O_TMPFILE} flag, an @code{EOPNOTSUPP} error is the result. | |
3913 | ||
3914 | The @code{O_TMPFILE} flag is a GNU extension. | |
28f540f4 RM |
3915 | @end deftypevr |
3916 | ||
28f540f4 | 3917 | @deftypevr Macro int O_NONBLOCK |
d08a7e4c | 3918 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3919 | @cindex non-blocking open |
3920 | This prevents @code{open} from blocking for a ``long time'' to open the | |
3921 | file. This is only meaningful for some kinds of files, usually devices | |
3922 | such as serial ports; when it is not meaningful, it is harmless and | |
9739d2d5 | 3923 | ignored. Often, opening a port to a modem blocks until the modem reports |
28f540f4 RM |
3924 | carrier detection; if @code{O_NONBLOCK} is specified, @code{open} will |
3925 | return immediately without a carrier. | |
3926 | ||
3927 | Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O operating | |
3928 | mode and a file name translation flag. This means that specifying | |
3929 | @code{O_NONBLOCK} in @code{open} also sets nonblocking I/O mode; | |
3930 | @pxref{Operating Modes}. To open the file without blocking but do normal | |
3931 | I/O that blocks, you must call @code{open} with @code{O_NONBLOCK} set and | |
3932 | then call @code{fcntl} to turn the bit off. | |
3933 | @end deftypevr | |
3934 | ||
28f540f4 | 3935 | @deftypevr Macro int O_NOCTTY |
d08a7e4c | 3936 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3937 | If the named file is a terminal device, don't make it the controlling |
3938 | terminal for the process. @xref{Job Control}, for information about | |
3939 | what it means to be the controlling terminal. | |
3940 | ||
a7a93d50 JM |
3941 | On @gnuhurdsystems{} and 4.4 BSD, opening a file never makes it the |
3942 | controlling terminal and @code{O_NOCTTY} is zero. However, @gnulinuxsystems{} | |
3943 | and some other systems use a nonzero value for @code{O_NOCTTY} and set the | |
28f540f4 RM |
3944 | controlling terminal when you open a file that is a terminal device; so |
3945 | to be portable, use @code{O_NOCTTY} when it is important to avoid this. | |
3946 | @cindex controlling terminal, setting | |
3947 | @end deftypevr | |
3948 | ||
a7a93d50 JM |
3949 | The following three file name translation flags exist only on |
3950 | @gnuhurdsystems{}. | |
28f540f4 | 3951 | |
28f540f4 | 3952 | @deftypevr Macro int O_IGNORE_CTTY |
d08a7e4c | 3953 | @standards{GNU, fcntl.h (optional)} |
28f540f4 RM |
3954 | Do not recognize the named file as the controlling terminal, even if it |
3955 | refers to the process's existing controlling terminal device. Operations | |
3956 | on the new file descriptor will never induce job control signals. | |
3957 | @xref{Job Control}. | |
3958 | @end deftypevr | |
3959 | ||
28f540f4 | 3960 | @deftypevr Macro int O_NOLINK |
d08a7e4c | 3961 | @standards{GNU, fcntl.h (optional)} |
28f540f4 RM |
3962 | If the named file is a symbolic link, open the link itself instead of |
3963 | the file it refers to. (@code{fstat} on the new file descriptor will | |
3964 | return the information returned by @code{lstat} on the link's name.) | |
3965 | @cindex symbolic link, opening | |
3966 | @end deftypevr | |
3967 | ||
28f540f4 | 3968 | @deftypevr Macro int O_NOTRANS |
d08a7e4c | 3969 | @standards{GNU, fcntl.h (optional)} |
28f540f4 RM |
3970 | If the named file is specially translated, do not invoke the translator. |
3971 | Open the bare file the translator itself sees. | |
3972 | @end deftypevr | |
3973 | ||
3974 | ||
3975 | The open-time action flags tell @code{open} to do additional operations | |
3976 | which are not really related to opening the file. The reason to do them | |
3977 | as part of @code{open} instead of in separate calls is that @code{open} | |
3978 | can do them @i{atomically}. | |
3979 | ||
28f540f4 | 3980 | @deftypevr Macro int O_TRUNC |
d08a7e4c | 3981 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
3982 | Truncate the file to zero length. This option is only useful for |
3983 | regular files, not special files such as directories or FIFOs. POSIX.1 | |
3984 | requires that you open the file for writing to use @code{O_TRUNC}. In | |
3985 | BSD and GNU you must have permission to write the file to truncate it, | |
3986 | but you need not open for write access. | |
3987 | ||
3988 | This is the only open-time action flag specified by POSIX.1. There is | |
3989 | no good reason for truncation to be done by @code{open}, instead of by | |
3990 | calling @code{ftruncate} afterwards. The @code{O_TRUNC} flag existed in | |
3991 | Unix before @code{ftruncate} was invented, and is retained for backward | |
3992 | compatibility. | |
3993 | @end deftypevr | |
3994 | ||
27e309c1 UD |
3995 | The remaining operating modes are BSD extensions. They exist only |
3996 | on some systems. On other systems, these macros are not defined. | |
3997 | ||
28f540f4 | 3998 | @deftypevr Macro int O_SHLOCK |
d08a7e4c | 3999 | @standards{BSD, fcntl.h (optional)} |
28f540f4 RM |
4000 | Acquire a shared lock on the file, as with @code{flock}. |
4001 | @xref{File Locks}. | |
4002 | ||
4003 | If @code{O_CREAT} is specified, the locking is done atomically when | |
4004 | creating the file. You are guaranteed that no other process will get | |
4005 | the lock on the new file first. | |
4006 | @end deftypevr | |
4007 | ||
28f540f4 | 4008 | @deftypevr Macro int O_EXLOCK |
d08a7e4c | 4009 | @standards{BSD, fcntl.h (optional)} |
28f540f4 RM |
4010 | Acquire an exclusive lock on the file, as with @code{flock}. |
4011 | @xref{File Locks}. This is atomic like @code{O_SHLOCK}. | |
4012 | @end deftypevr | |
4013 | ||
4014 | @node Operating Modes | |
4015 | @subsection I/O Operating Modes | |
4016 | ||
4017 | The operating modes affect how input and output operations using a file | |
4018 | descriptor work. These flags are set by @code{open} and can be fetched | |
4019 | and changed with @code{fcntl}. | |
4020 | ||
28f540f4 | 4021 | @deftypevr Macro int O_APPEND |
d08a7e4c | 4022 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4023 | The bit that enables append mode for the file. If set, then all |
4024 | @code{write} operations write the data at the end of the file, extending | |
4025 | it, regardless of the current file position. This is the only reliable | |
4026 | way to append to a file. In append mode, you are guaranteed that the | |
4027 | data you write will always go to the current end of the file, regardless | |
4028 | of other processes writing to the file. Conversely, if you simply set | |
4029 | the file position to the end of file and write, then another process can | |
4030 | extend the file after you set the file position but before you write, | |
4031 | resulting in your data appearing someplace before the real end of file. | |
4032 | @end deftypevr | |
4033 | ||
2c6fe0bd | 4034 | @deftypevr Macro int O_NONBLOCK |
d08a7e4c | 4035 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4036 | The bit that enables nonblocking mode for the file. If this bit is set, |
4037 | @code{read} requests on the file can return immediately with a failure | |
4038 | status if there is no input immediately available, instead of blocking. | |
4039 | Likewise, @code{write} requests can also return immediately with a | |
4040 | failure status if the output can't be written immediately. | |
4041 | ||
4042 | Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O | |
4043 | operating mode and a file name translation flag; @pxref{Open-time Flags}. | |
4044 | @end deftypevr | |
4045 | ||
28f540f4 | 4046 | @deftypevr Macro int O_NDELAY |
d08a7e4c | 4047 | @standards{BSD, fcntl.h} |
28f540f4 RM |
4048 | This is an obsolete name for @code{O_NONBLOCK}, provided for |
4049 | compatibility with BSD. It is not defined by the POSIX.1 standard. | |
4050 | @end deftypevr | |
4051 | ||
4052 | The remaining operating modes are BSD and GNU extensions. They exist only | |
4053 | on some systems. On other systems, these macros are not defined. | |
4054 | ||
28f540f4 | 4055 | @deftypevr Macro int O_ASYNC |
d08a7e4c | 4056 | @standards{BSD, fcntl.h} |
28f540f4 RM |
4057 | The bit that enables asynchronous input mode. If set, then @code{SIGIO} |
4058 | signals will be generated when input is available. @xref{Interrupt Input}. | |
4059 | ||
4060 | Asynchronous input mode is a BSD feature. | |
4061 | @end deftypevr | |
4062 | ||
28f540f4 | 4063 | @deftypevr Macro int O_FSYNC |
d08a7e4c | 4064 | @standards{BSD, fcntl.h} |
28f540f4 RM |
4065 | The bit that enables synchronous writing for the file. If set, each |
4066 | @code{write} call will make sure the data is reliably stored on disk before | |
4067 | returning. @c !!! xref fsync | |
4068 | ||
4069 | Synchronous writing is a BSD feature. | |
4070 | @end deftypevr | |
4071 | ||
28f540f4 | 4072 | @deftypevr Macro int O_SYNC |
d08a7e4c | 4073 | @standards{BSD, fcntl.h} |
28f540f4 RM |
4074 | This is another name for @code{O_FSYNC}. They have the same value. |
4075 | @end deftypevr | |
4076 | ||
28f540f4 | 4077 | @deftypevr Macro int O_NOATIME |
d08a7e4c | 4078 | @standards{GNU, fcntl.h} |
28f540f4 RM |
4079 | If this bit is set, @code{read} will not update the access time of the |
4080 | file. @xref{File Times}. This is used by programs that do backups, so | |
4081 | that backing a file up does not count as reading it. | |
4082 | Only the owner of the file or the superuser may use this bit. | |
4083 | ||
4084 | This is a GNU extension. | |
4085 | @end deftypevr | |
4086 | ||
4087 | @node Getting File Status Flags | |
4088 | @subsection Getting and Setting File Status Flags | |
4089 | ||
4090 | The @code{fcntl} function can fetch or change file status flags. | |
4091 | ||
28f540f4 | 4092 | @deftypevr Macro int F_GETFL |
d08a7e4c | 4093 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4094 | This macro is used as the @var{command} argument to @code{fcntl}, to |
4095 | read the file status flags for the open file with descriptor | |
4096 | @var{filedes}. | |
4097 | ||
4098 | The normal return value from @code{fcntl} with this command is a | |
4099 | nonnegative number which can be interpreted as the bitwise OR of the | |
4100 | individual flags. Since the file access modes are not single-bit values, | |
4101 | you can mask off other bits in the returned flags with @code{O_ACCMODE} | |
4102 | to compare them. | |
4103 | ||
07435eb4 | 4104 | In case of an error, @code{fcntl} returns @math{-1}. The following |
28f540f4 RM |
4105 | @code{errno} error conditions are defined for this command: |
4106 | ||
4107 | @table @code | |
4108 | @item EBADF | |
4109 | The @var{filedes} argument is invalid. | |
4110 | @end table | |
4111 | @end deftypevr | |
4112 | ||
28f540f4 | 4113 | @deftypevr Macro int F_SETFL |
d08a7e4c | 4114 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4115 | This macro is used as the @var{command} argument to @code{fcntl}, to set |
4116 | the file status flags for the open file corresponding to the | |
4117 | @var{filedes} argument. This command requires a third @code{int} | |
4118 | argument to specify the new flags, so the call looks like this: | |
4119 | ||
4120 | @smallexample | |
4121 | fcntl (@var{filedes}, F_SETFL, @var{new-flags}) | |
4122 | @end smallexample | |
4123 | ||
4124 | You can't change the access mode for the file in this way; that is, | |
4125 | whether the file descriptor was opened for reading or writing. | |
4126 | ||
4127 | The normal return value from @code{fcntl} with this command is an | |
07435eb4 | 4128 | unspecified value other than @math{-1}, which indicates an error. The |
28f540f4 RM |
4129 | error conditions are the same as for the @code{F_GETFL} command. |
4130 | @end deftypevr | |
4131 | ||
4132 | If you want to modify the file status flags, you should get the current | |
4133 | flags with @code{F_GETFL} and modify the value. Don't assume that the | |
4134 | flags listed here are the only ones that are implemented; your program | |
4135 | may be run years from now and more flags may exist then. For example, | |
4136 | here is a function to set or clear the flag @code{O_NONBLOCK} without | |
4137 | altering any other flags: | |
4138 | ||
4139 | @smallexample | |
4140 | @group | |
4141 | /* @r{Set the @code{O_NONBLOCK} flag of @var{desc} if @var{value} is nonzero,} | |
4142 | @r{or clear the flag if @var{value} is 0.} | |
2c6fe0bd | 4143 | @r{Return 0 on success, or -1 on error with @code{errno} set.} */ |
28f540f4 RM |
4144 | |
4145 | int | |
4146 | set_nonblock_flag (int desc, int value) | |
4147 | @{ | |
4148 | int oldflags = fcntl (desc, F_GETFL, 0); | |
4149 | /* @r{If reading the flags failed, return error indication now.} */ | |
4150 | if (oldflags == -1) | |
4151 | return -1; | |
4152 | /* @r{Set just the flag we want to set.} */ | |
4153 | if (value != 0) | |
4154 | oldflags |= O_NONBLOCK; | |
4155 | else | |
4156 | oldflags &= ~O_NONBLOCK; | |
4157 | /* @r{Store modified flag word in the descriptor.} */ | |
4158 | return fcntl (desc, F_SETFL, oldflags); | |
4159 | @} | |
4160 | @end group | |
4161 | @end smallexample | |
4162 | ||
4163 | @node File Locks | |
4164 | @section File Locks | |
4165 | ||
4166 | @cindex file locks | |
4167 | @cindex record locking | |
0961f7e1 JL |
4168 | This section describes record locks that are associated with the process. |
4169 | There is also a different type of record lock that is associated with the | |
4170 | open file description instead of the process. @xref{Open File Description Locks}. | |
4171 | ||
28f540f4 RM |
4172 | The remaining @code{fcntl} commands are used to support @dfn{record |
4173 | locking}, which permits multiple cooperating programs to prevent each | |
4174 | other from simultaneously accessing parts of a file in error-prone | |
4175 | ways. | |
4176 | ||
4177 | @cindex exclusive lock | |
4178 | @cindex write lock | |
4179 | An @dfn{exclusive} or @dfn{write} lock gives a process exclusive access | |
4180 | for writing to the specified part of the file. While a write lock is in | |
4181 | place, no other process can lock that part of the file. | |
4182 | ||
4183 | @cindex shared lock | |
4184 | @cindex read lock | |
4185 | A @dfn{shared} or @dfn{read} lock prohibits any other process from | |
4186 | requesting a write lock on the specified part of the file. However, | |
4187 | other processes can request read locks. | |
4188 | ||
4189 | The @code{read} and @code{write} functions do not actually check to see | |
4190 | whether there are any locks in place. If you want to implement a | |
4191 | locking protocol for a file shared by multiple processes, your application | |
4192 | must do explicit @code{fcntl} calls to request and clear locks at the | |
4193 | appropriate points. | |
4194 | ||
4195 | Locks are associated with processes. A process can only have one kind | |
4196 | of lock set for each byte of a given file. When any file descriptor for | |
4197 | that file is closed by the process, all of the locks that process holds | |
4198 | on that file are released, even if the locks were made using other | |
4199 | descriptors that remain open. Likewise, locks are released when a | |
4200 | process exits, and are not inherited by child processes created using | |
4201 | @code{fork} (@pxref{Creating a Process}). | |
4202 | ||
4203 | When making a lock, use a @code{struct flock} to specify what kind of | |
4204 | lock and where. This data type and the associated macros for the | |
4205 | @code{fcntl} function are declared in the header file @file{fcntl.h}. | |
4206 | @pindex fcntl.h | |
4207 | ||
28f540f4 | 4208 | @deftp {Data Type} {struct flock} |
d08a7e4c | 4209 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4210 | This structure is used with the @code{fcntl} function to describe a file |
4211 | lock. It has these members: | |
4212 | ||
4213 | @table @code | |
4214 | @item short int l_type | |
4215 | Specifies the type of the lock; one of @code{F_RDLCK}, @code{F_WRLCK}, or | |
4216 | @code{F_UNLCK}. | |
4217 | ||
4218 | @item short int l_whence | |
4219 | This corresponds to the @var{whence} argument to @code{fseek} or | |
4220 | @code{lseek}, and specifies what the offset is relative to. Its value | |
4221 | can be one of @code{SEEK_SET}, @code{SEEK_CUR}, or @code{SEEK_END}. | |
4222 | ||
4223 | @item off_t l_start | |
4224 | This specifies the offset of the start of the region to which the lock | |
9739d2d5 | 4225 | applies, and is given in bytes relative to the point specified by the |
28f540f4 RM |
4226 | @code{l_whence} member. |
4227 | ||
4228 | @item off_t l_len | |
4229 | This specifies the length of the region to be locked. A value of | |
4230 | @code{0} is treated specially; it means the region extends to the end of | |
4231 | the file. | |
4232 | ||
4233 | @item pid_t l_pid | |
4234 | This field is the process ID (@pxref{Process Creation Concepts}) of the | |
4235 | process holding the lock. It is filled in by calling @code{fcntl} with | |
0961f7e1 JL |
4236 | the @code{F_GETLK} command, but is ignored when making a lock. If the |
4237 | conflicting lock is an open file description lock | |
4238 | (@pxref{Open File Description Locks}), then this field will be set to | |
4239 | @math{-1}. | |
28f540f4 RM |
4240 | @end table |
4241 | @end deftp | |
4242 | ||
28f540f4 | 4243 | @deftypevr Macro int F_GETLK |
d08a7e4c | 4244 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4245 | This macro is used as the @var{command} argument to @code{fcntl}, to |
4246 | specify that it should get information about a lock. This command | |
4247 | requires a third argument of type @w{@code{struct flock *}} to be passed | |
4248 | to @code{fcntl}, so that the form of the call is: | |
4249 | ||
4250 | @smallexample | |
4251 | fcntl (@var{filedes}, F_GETLK, @var{lockp}) | |
4252 | @end smallexample | |
4253 | ||
4254 | If there is a lock already in place that would block the lock described | |
4255 | by the @var{lockp} argument, information about that lock overwrites | |
4256 | @code{*@var{lockp}}. Existing locks are not reported if they are | |
4257 | compatible with making a new lock as specified. Thus, you should | |
4258 | specify a lock type of @code{F_WRLCK} if you want to find out about both | |
4259 | read and write locks, or @code{F_RDLCK} if you want to find out about | |
4260 | write locks only. | |
4261 | ||
4262 | There might be more than one lock affecting the region specified by the | |
4263 | @var{lockp} argument, but @code{fcntl} only returns information about | |
4264 | one of them. The @code{l_whence} member of the @var{lockp} structure is | |
4265 | set to @code{SEEK_SET} and the @code{l_start} and @code{l_len} fields | |
4266 | set to identify the locked region. | |
4267 | ||
4268 | If no lock applies, the only change to the @var{lockp} structure is to | |
4269 | update the @code{l_type} to a value of @code{F_UNLCK}. | |
4270 | ||
4271 | The normal return value from @code{fcntl} with this command is an | |
07435eb4 | 4272 | unspecified value other than @math{-1}, which is reserved to indicate an |
28f540f4 RM |
4273 | error. The following @code{errno} error conditions are defined for |
4274 | this command: | |
4275 | ||
4276 | @table @code | |
4277 | @item EBADF | |
4278 | The @var{filedes} argument is invalid. | |
4279 | ||
4280 | @item EINVAL | |
4281 | Either the @var{lockp} argument doesn't specify valid lock information, | |
4282 | or the file associated with @var{filedes} doesn't support locks. | |
4283 | @end table | |
4284 | @end deftypevr | |
4285 | ||
28f540f4 | 4286 | @deftypevr Macro int F_SETLK |
d08a7e4c | 4287 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4288 | This macro is used as the @var{command} argument to @code{fcntl}, to |
4289 | specify that it should set or clear a lock. This command requires a | |
4290 | third argument of type @w{@code{struct flock *}} to be passed to | |
4291 | @code{fcntl}, so that the form of the call is: | |
4292 | ||
4293 | @smallexample | |
4294 | fcntl (@var{filedes}, F_SETLK, @var{lockp}) | |
4295 | @end smallexample | |
4296 | ||
4297 | If the process already has a lock on any part of the region, the old lock | |
4298 | on that part is replaced with the new lock. You can remove a lock | |
4299 | by specifying a lock type of @code{F_UNLCK}. | |
4300 | ||
4301 | If the lock cannot be set, @code{fcntl} returns immediately with a value | |
9739d2d5 RJ |
4302 | of @math{-1}. This function does not block while waiting for other processes |
4303 | to release locks. If @code{fcntl} succeeds, it returns a value other | |
07435eb4 | 4304 | than @math{-1}. |
28f540f4 RM |
4305 | |
4306 | The following @code{errno} error conditions are defined for this | |
4307 | function: | |
4308 | ||
4309 | @table @code | |
4310 | @item EAGAIN | |
4311 | @itemx EACCES | |
4312 | The lock cannot be set because it is blocked by an existing lock on the | |
4313 | file. Some systems use @code{EAGAIN} in this case, and other systems | |
4314 | use @code{EACCES}; your program should treat them alike, after | |
a7a93d50 | 4315 | @code{F_SETLK}. (@gnulinuxhurdsystems{} always use @code{EAGAIN}.) |
28f540f4 RM |
4316 | |
4317 | @item EBADF | |
4318 | Either: the @var{filedes} argument is invalid; you requested a read lock | |
4319 | but the @var{filedes} is not open for read access; or, you requested a | |
4320 | write lock but the @var{filedes} is not open for write access. | |
4321 | ||
4322 | @item EINVAL | |
4323 | Either the @var{lockp} argument doesn't specify valid lock information, | |
4324 | or the file associated with @var{filedes} doesn't support locks. | |
4325 | ||
4326 | @item ENOLCK | |
4327 | The system has run out of file lock resources; there are already too | |
4328 | many file locks in place. | |
4329 | ||
4330 | Well-designed file systems never report this error, because they have no | |
4331 | limitation on the number of locks. However, you must still take account | |
4332 | of the possibility of this error, as it could result from network access | |
4333 | to a file system on another machine. | |
4334 | @end table | |
4335 | @end deftypevr | |
4336 | ||
28f540f4 | 4337 | @deftypevr Macro int F_SETLKW |
d08a7e4c | 4338 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4339 | This macro is used as the @var{command} argument to @code{fcntl}, to |
4340 | specify that it should set or clear a lock. It is just like the | |
4341 | @code{F_SETLK} command, but causes the process to block (or wait) | |
4342 | until the request can be specified. | |
4343 | ||
4344 | This command requires a third argument of type @code{struct flock *}, as | |
4345 | for the @code{F_SETLK} command. | |
4346 | ||
4347 | The @code{fcntl} return values and errors are the same as for the | |
4348 | @code{F_SETLK} command, but these additional @code{errno} error conditions | |
4349 | are defined for this command: | |
4350 | ||
4351 | @table @code | |
4352 | @item EINTR | |
4353 | The function was interrupted by a signal while it was waiting. | |
4354 | @xref{Interrupted Primitives}. | |
4355 | ||
4356 | @item EDEADLK | |
4357 | The specified region is being locked by another process. But that | |
4358 | process is waiting to lock a region which the current process has | |
4359 | locked, so waiting for the lock would result in deadlock. The system | |
4360 | does not guarantee that it will detect all such conditions, but it lets | |
4361 | you know if it notices one. | |
4362 | @end table | |
4363 | @end deftypevr | |
4364 | ||
4365 | ||
4366 | The following macros are defined for use as values for the @code{l_type} | |
4367 | member of the @code{flock} structure. The values are integer constants. | |
4368 | ||
2fe82ca6 | 4369 | @vtable @code |
28f540f4 | 4370 | @item F_RDLCK |
d08a7e4c | 4371 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4372 | This macro is used to specify a read (or shared) lock. |
4373 | ||
28f540f4 | 4374 | @item F_WRLCK |
d08a7e4c | 4375 | @standards{POSIX.1, fcntl.h} |
28f540f4 RM |
4376 | This macro is used to specify a write (or exclusive) lock. |
4377 | ||
28f540f4 | 4378 | @item F_UNLCK |
d08a7e4c | 4379 | @standards{POSIX.1, fcntl.h} |
28f540f4 | 4380 | This macro is used to specify that the region is unlocked. |
2fe82ca6 | 4381 | @end vtable |
28f540f4 RM |
4382 | |
4383 | As an example of a situation where file locking is useful, consider a | |
4384 | program that can be run simultaneously by several different users, that | |
4385 | logs status information to a common file. One example of such a program | |
4386 | might be a game that uses a file to keep track of high scores. Another | |
4387 | example might be a program that records usage or accounting information | |
4388 | for billing purposes. | |
4389 | ||
4390 | Having multiple copies of the program simultaneously writing to the | |
4391 | file could cause the contents of the file to become mixed up. But | |
4392 | you can prevent this kind of problem by setting a write lock on the | |
2c6fe0bd | 4393 | file before actually writing to the file. |
28f540f4 RM |
4394 | |
4395 | If the program also needs to read the file and wants to make sure that | |
4396 | the contents of the file are in a consistent state, then it can also use | |
4397 | a read lock. While the read lock is set, no other process can lock | |
4398 | that part of the file for writing. | |
4399 | ||
4400 | @c ??? This section could use an example program. | |
4401 | ||
0961f7e1 | 4402 | Remember that file locks are only an @emph{advisory} protocol for |
28f540f4 RM |
4403 | controlling access to a file. There is still potential for access to |
4404 | the file by programs that don't use the lock protocol. | |
4405 | ||
0961f7e1 JL |
4406 | @node Open File Description Locks |
4407 | @section Open File Description Locks | |
4408 | ||
4409 | In contrast to process-associated record locks (@pxref{File Locks}), | |
4410 | open file description record locks are associated with an open file | |
4411 | description rather than a process. | |
4412 | ||
4413 | Using @code{fcntl} to apply an open file description lock on a region that | |
4414 | already has an existing open file description lock that was created via the | |
4415 | same file descriptor will never cause a lock conflict. | |
4416 | ||
4417 | Open file description locks are also inherited by child processes across | |
4418 | @code{fork}, or @code{clone} with @code{CLONE_FILES} set | |
4419 | (@pxref{Creating a Process}), along with the file descriptor. | |
4420 | ||
4421 | It is important to distinguish between the open file @emph{description} (an | |
4422 | instance of an open file, usually created by a call to @code{open}) and | |
4423 | an open file @emph{descriptor}, which is a numeric value that refers to the | |
4424 | open file description. The locks described here are associated with the | |
4425 | open file @emph{description} and not the open file @emph{descriptor}. | |
4426 | ||
4427 | Using @code{dup} (@pxref{Duplicating Descriptors}) to copy a file | |
4428 | descriptor does not give you a new open file description, but rather copies a | |
4429 | reference to an existing open file description and assigns it to a new | |
4430 | file descriptor. Thus, open file description locks set on a file | |
4431 | descriptor cloned by @code{dup} will never conflict with open file | |
4432 | description locks set on the original descriptor since they refer to the | |
4433 | same open file description. Depending on the range and type of lock | |
4434 | involved, the original lock may be modified by a @code{F_OFD_SETLK} or | |
4435 | @code{F_OFD_SETLKW} command in this situation however. | |
4436 | ||
4437 | Open file description locks always conflict with process-associated locks, | |
4438 | even if acquired by the same process or on the same open file | |
4439 | descriptor. | |
4440 | ||
4441 | Open file description locks use the same @code{struct flock} as | |
4442 | process-associated locks as an argument (@pxref{File Locks}) and the | |
4443 | macros for the @code{command} values are also declared in the header file | |
4444 | @file{fcntl.h}. To use them, the macro @code{_GNU_SOURCE} must be | |
4445 | defined prior to including any header file. | |
4446 | ||
4447 | In contrast to process-associated locks, any @code{struct flock} used as | |
4448 | an argument to open file description lock commands must have the @code{l_pid} | |
4449 | value set to @math{0}. Also, when returning information about an | |
4450 | open file description lock in a @code{F_GETLK} or @code{F_OFD_GETLK} request, | |
4451 | the @code{l_pid} field in @code{struct flock} will be set to @math{-1} | |
4452 | to indicate that the lock is not associated with a process. | |
4453 | ||
4454 | When the same @code{struct flock} is reused as an argument to a | |
4455 | @code{F_OFD_SETLK} or @code{F_OFD_SETLKW} request after being used for an | |
4456 | @code{F_OFD_GETLK} request, it is necessary to inspect and reset the | |
4457 | @code{l_pid} field to @math{0}. | |
4458 | ||
4459 | @pindex fcntl.h. | |
4460 | ||
4461 | @deftypevr Macro int F_OFD_GETLK | |
4462 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
4463 | specify that it should get information about a lock. This command | |
4464 | requires a third argument of type @w{@code{struct flock *}} to be passed | |
4465 | to @code{fcntl}, so that the form of the call is: | |
4466 | ||
4467 | @smallexample | |
4468 | fcntl (@var{filedes}, F_OFD_GETLK, @var{lockp}) | |
4469 | @end smallexample | |
4470 | ||
4471 | If there is a lock already in place that would block the lock described | |
4472 | by the @var{lockp} argument, information about that lock is written to | |
4473 | @code{*@var{lockp}}. Existing locks are not reported if they are | |
4474 | compatible with making a new lock as specified. Thus, you should | |
4475 | specify a lock type of @code{F_WRLCK} if you want to find out about both | |
4476 | read and write locks, or @code{F_RDLCK} if you want to find out about | |
4477 | write locks only. | |
4478 | ||
4479 | There might be more than one lock affecting the region specified by the | |
4480 | @var{lockp} argument, but @code{fcntl} only returns information about | |
4481 | one of them. Which lock is returned in this situation is undefined. | |
4482 | ||
4483 | The @code{l_whence} member of the @var{lockp} structure are set to | |
4484 | @code{SEEK_SET} and the @code{l_start} and @code{l_len} fields are set | |
4485 | to identify the locked region. | |
4486 | ||
4487 | If no conflicting lock exists, the only change to the @var{lockp} structure | |
4488 | is to update the @code{l_type} field to the value @code{F_UNLCK}. | |
4489 | ||
4490 | The normal return value from @code{fcntl} with this command is either @math{0} | |
4491 | on success or @math{-1}, which indicates an error. The following @code{errno} | |
4492 | error conditions are defined for this command: | |
4493 | ||
4494 | @table @code | |
4495 | @item EBADF | |
4496 | The @var{filedes} argument is invalid. | |
4497 | ||
4498 | @item EINVAL | |
4499 | Either the @var{lockp} argument doesn't specify valid lock information, | |
4500 | the operating system kernel doesn't support open file description locks, or the file | |
4501 | associated with @var{filedes} doesn't support locks. | |
4502 | @end table | |
4503 | @end deftypevr | |
4504 | ||
0961f7e1 | 4505 | @deftypevr Macro int F_OFD_SETLK |
d08a7e4c | 4506 | @standards{POSIX.1, fcntl.h} |
0961f7e1 JL |
4507 | This macro is used as the @var{command} argument to @code{fcntl}, to |
4508 | specify that it should set or clear a lock. This command requires a | |
4509 | third argument of type @w{@code{struct flock *}} to be passed to | |
4510 | @code{fcntl}, so that the form of the call is: | |
4511 | ||
4512 | @smallexample | |
4513 | fcntl (@var{filedes}, F_OFD_SETLK, @var{lockp}) | |
4514 | @end smallexample | |
4515 | ||
4516 | If the open file already has a lock on any part of the | |
4517 | region, the old lock on that part is replaced with the new lock. You | |
4518 | can remove a lock by specifying a lock type of @code{F_UNLCK}. | |
4519 | ||
4520 | If the lock cannot be set, @code{fcntl} returns immediately with a value | |
4521 | of @math{-1}. This command does not wait for other tasks | |
4522 | to release locks. If @code{fcntl} succeeds, it returns @math{0}. | |
4523 | ||
4524 | The following @code{errno} error conditions are defined for this | |
4525 | command: | |
4526 | ||
4527 | @table @code | |
4528 | @item EAGAIN | |
4529 | The lock cannot be set because it is blocked by an existing lock on the | |
4530 | file. | |
4531 | ||
4532 | @item EBADF | |
4533 | Either: the @var{filedes} argument is invalid; you requested a read lock | |
4534 | but the @var{filedes} is not open for read access; or, you requested a | |
4535 | write lock but the @var{filedes} is not open for write access. | |
4536 | ||
4537 | @item EINVAL | |
4538 | Either the @var{lockp} argument doesn't specify valid lock information, | |
4539 | the operating system kernel doesn't support open file description locks, or the | |
4540 | file associated with @var{filedes} doesn't support locks. | |
4541 | ||
4542 | @item ENOLCK | |
4543 | The system has run out of file lock resources; there are already too | |
4544 | many file locks in place. | |
4545 | ||
4546 | Well-designed file systems never report this error, because they have no | |
4547 | limitation on the number of locks. However, you must still take account | |
4548 | of the possibility of this error, as it could result from network access | |
4549 | to a file system on another machine. | |
4550 | @end table | |
4551 | @end deftypevr | |
4552 | ||
0961f7e1 | 4553 | @deftypevr Macro int F_OFD_SETLKW |
d08a7e4c | 4554 | @standards{POSIX.1, fcntl.h} |
0961f7e1 JL |
4555 | This macro is used as the @var{command} argument to @code{fcntl}, to |
4556 | specify that it should set or clear a lock. It is just like the | |
4557 | @code{F_OFD_SETLK} command, but causes the process to wait until the request | |
4558 | can be completed. | |
4559 | ||
4560 | This command requires a third argument of type @code{struct flock *}, as | |
4561 | for the @code{F_OFD_SETLK} command. | |
4562 | ||
4563 | The @code{fcntl} return values and errors are the same as for the | |
4564 | @code{F_OFD_SETLK} command, but these additional @code{errno} error conditions | |
4565 | are defined for this command: | |
4566 | ||
4567 | @table @code | |
4568 | @item EINTR | |
4569 | The function was interrupted by a signal while it was waiting. | |
4570 | @xref{Interrupted Primitives}. | |
4571 | ||
4572 | @end table | |
4573 | @end deftypevr | |
4574 | ||
4575 | Open file description locks are useful in the same sorts of situations as | |
4576 | process-associated locks. They can also be used to synchronize file | |
4577 | access between threads within the same process by having each thread perform | |
4578 | its own @code{open} of the file, to obtain its own open file description. | |
4579 | ||
4580 | Because open file description locks are automatically freed only upon | |
4581 | closing the last file descriptor that refers to the open file | |
4582 | description, this locking mechanism avoids the possibility that locks | |
4583 | are inadvertently released due to a library routine opening and closing | |
4584 | a file without the application being aware. | |
4585 | ||
4586 | As with process-associated locks, open file description locks are advisory. | |
4587 | ||
4588 | @node Open File Description Locks Example | |
4589 | @section Open File Description Locks Example | |
4590 | ||
4591 | Here is an example of using open file description locks in a threaded | |
4592 | program. If this program used process-associated locks, then it would be | |
4593 | subject to data corruption because process-associated locks are shared | |
4594 | by the threads inside a process, and thus cannot be used by one thread | |
4595 | to lock out another thread in the same process. | |
4596 | ||
4597 | Proper error handling has been omitted in the following program for | |
4598 | brevity. | |
4599 | ||
4600 | @smallexample | |
4601 | @include ofdlocks.c.texi | |
4602 | @end smallexample | |
4603 | ||
4604 | This example creates three threads each of which loops five times, | |
4605 | appending to the file. Access to the file is serialized via open file | |
4606 | description locks. If we compile and run the above program, we'll end up | |
4607 | with /tmp/foo that has 15 lines in it. | |
4608 | ||
4609 | If we, however, were to replace the @code{F_OFD_SETLK} and | |
4610 | @code{F_OFD_SETLKW} commands with their process-associated lock | |
4611 | equivalents, the locking essentially becomes a noop since it is all done | |
4612 | within the context of the same process. That leads to data corruption | |
4613 | (typically manifested as missing lines) as some threads race in and | |
4614 | overwrite the data written by others. | |
4615 | ||
28f540f4 RM |
4616 | @node Interrupt Input |
4617 | @section Interrupt-Driven Input | |
4618 | ||
4619 | @cindex interrupt-driven input | |
4620 | If you set the @code{O_ASYNC} status flag on a file descriptor | |
4621 | (@pxref{File Status Flags}), a @code{SIGIO} signal is sent whenever | |
4622 | input or output becomes possible on that file descriptor. The process | |
4623 | or process group to receive the signal can be selected by using the | |
4624 | @code{F_SETOWN} command to the @code{fcntl} function. If the file | |
4625 | descriptor is a socket, this also selects the recipient of @code{SIGURG} | |
4626 | signals that are delivered when out-of-band data arrives on that socket; | |
4627 | see @ref{Out-of-Band Data}. (@code{SIGURG} is sent in any situation | |
4628 | where @code{select} would report the socket as having an ``exceptional | |
4629 | condition''. @xref{Waiting for I/O}.) | |
4630 | ||
4631 | If the file descriptor corresponds to a terminal device, then @code{SIGIO} | |
2c6fe0bd | 4632 | signals are sent to the foreground process group of the terminal. |
28f540f4 RM |
4633 | @xref{Job Control}. |
4634 | ||
4635 | @pindex fcntl.h | |
4636 | The symbols in this section are defined in the header file | |
4637 | @file{fcntl.h}. | |
4638 | ||
28f540f4 | 4639 | @deftypevr Macro int F_GETOWN |
d08a7e4c | 4640 | @standards{BSD, fcntl.h} |
28f540f4 RM |
4641 | This macro is used as the @var{command} argument to @code{fcntl}, to |
4642 | specify that it should get information about the process or process | |
4643 | group to which @code{SIGIO} signals are sent. (For a terminal, this is | |
4644 | actually the foreground process group ID, which you can get using | |
4645 | @code{tcgetpgrp}; see @ref{Terminal Access Functions}.) | |
4646 | ||
4647 | The return value is interpreted as a process ID; if negative, its | |
4648 | absolute value is the process group ID. | |
4649 | ||
4650 | The following @code{errno} error condition is defined for this command: | |
4651 | ||
4652 | @table @code | |
4653 | @item EBADF | |
4654 | The @var{filedes} argument is invalid. | |
4655 | @end table | |
4656 | @end deftypevr | |
4657 | ||
28f540f4 | 4658 | @deftypevr Macro int F_SETOWN |
d08a7e4c | 4659 | @standards{BSD, fcntl.h} |
28f540f4 RM |
4660 | This macro is used as the @var{command} argument to @code{fcntl}, to |
4661 | specify that it should set the process or process group to which | |
4662 | @code{SIGIO} signals are sent. This command requires a third argument | |
4663 | of type @code{pid_t} to be passed to @code{fcntl}, so that the form of | |
4664 | the call is: | |
4665 | ||
4666 | @smallexample | |
4667 | fcntl (@var{filedes}, F_SETOWN, @var{pid}) | |
4668 | @end smallexample | |
4669 | ||
4670 | The @var{pid} argument should be a process ID. You can also pass a | |
4671 | negative number whose absolute value is a process group ID. | |
4672 | ||
07435eb4 | 4673 | The return value from @code{fcntl} with this command is @math{-1} |
28f540f4 RM |
4674 | in case of error and some other value if successful. The following |
4675 | @code{errno} error conditions are defined for this command: | |
4676 | ||
4677 | @table @code | |
4678 | @item EBADF | |
4679 | The @var{filedes} argument is invalid. | |
4680 | ||
4681 | @item ESRCH | |
4682 | There is no process or process group corresponding to @var{pid}. | |
4683 | @end table | |
4684 | @end deftypevr | |
4685 | ||
4686 | @c ??? This section could use an example program. | |
07435eb4 UD |
4687 | |
4688 | @node IOCTLs | |
4689 | @section Generic I/O Control operations | |
4690 | @cindex generic i/o control operations | |
4691 | @cindex IOCTLs | |
4692 | ||
a7a93d50 | 4693 | @gnusystems{} can handle most input/output operations on many different |
07435eb4 UD |
4694 | devices and objects in terms of a few file primitives - @code{read}, |
4695 | @code{write} and @code{lseek}. However, most devices also have a few | |
cf822e3c | 4696 | peculiar operations which do not fit into this model. Such as: |
07435eb4 UD |
4697 | |
4698 | @itemize @bullet | |
4699 | ||
4700 | @item | |
4701 | Changing the character font used on a terminal. | |
4702 | ||
4703 | @item | |
4704 | Telling a magnetic tape system to rewind or fast forward. (Since they | |
4705 | cannot move in byte increments, @code{lseek} is inapplicable). | |
4706 | ||
4707 | @item | |
4708 | Ejecting a disk from a drive. | |
4709 | ||
4710 | @item | |
4711 | Playing an audio track from a CD-ROM drive. | |
4712 | ||
4713 | @item | |
4714 | Maintaining routing tables for a network. | |
4715 | ||
4716 | @end itemize | |
4717 | ||
4718 | Although some such objects such as sockets and terminals | |
4719 | @footnote{Actually, the terminal-specific functions are implemented with | |
4720 | IOCTLs on many platforms.} have special functions of their own, it would | |
4721 | not be practical to create functions for all these cases. | |
4722 | ||
4723 | Instead these minor operations, known as @dfn{IOCTL}s, are assigned code | |
4724 | numbers and multiplexed through the @code{ioctl} function, defined in | |
4725 | @code{sys/ioctl.h}. The code numbers themselves are defined in many | |
4726 | different headers. | |
4727 | ||
4728 | @deftypefun int ioctl (int @var{filedes}, int @var{command}, @dots{}) | |
d08a7e4c | 4729 | @standards{BSD, sys/ioctl.h} |
2cc3615c | 4730 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
07435eb4 UD |
4731 | |
4732 | The @code{ioctl} function performs the generic I/O operation | |
4733 | @var{command} on @var{filedes}. | |
4734 | ||
4735 | A third argument is usually present, either a single number or a pointer | |
4736 | to a structure. The meaning of this argument, the returned value, and | |
4737 | any error codes depends upon the command used. Often @math{-1} is | |
4738 | returned for a failure. | |
4739 | ||
4740 | @end deftypefun | |
4741 | ||
4742 | On some systems, IOCTLs used by different devices share the same numbers. | |
4743 | Thus, although use of an inappropriate IOCTL @emph{usually} only produces | |
4744 | an error, you should not attempt to use device-specific IOCTLs on an | |
4745 | unknown device. | |
4746 | ||
4747 | Most IOCTLs are OS-specific and/or only used in special system utilities, | |
4748 | and are thus beyond the scope of this document. For an example of the use | |
8b7fb588 | 4749 | of an IOCTL, see @ref{Out-of-Band Data}. |
2cc3615c | 4750 | |
6c0be743 DD |
4751 | @node Other Low-Level I/O APIs |
4752 | @section Other low-level-I/O-related functions | |
4753 | ||
4754 | @deftp {Data Type} {struct pollfd} | |
4755 | @standards{POSIX.1,poll.h} | |
4756 | @end deftp | |
4757 | ||
4758 | @deftp {Data Type} {struct epoll_event} | |
4759 | @standards{Linux,sys/epoll.h} | |
4760 | @end deftp | |
4761 | ||
4762 | @deftypefun int poll (struct pollfd *@var{fds}, nfds_t @var{nfds}, int @var{timeout}) | |
4763 | ||
4764 | @manpagefunctionstub{poll,2} | |
4765 | @end deftypefun | |
4766 | ||
4767 | @deftypefun int epoll_create(int @var{size}) | |
4768 | ||
4769 | @manpagefunctionstub{epoll_create,2} | |
4770 | @end deftypefun | |
4771 | ||
4772 | @deftypefun int epoll_wait(int @var{epfd}, struct epoll_event *@var{events}, int @var{maxevents}, int @var{timeout}) | |
4773 | ||
4774 | @manpagefunctionstub{epoll_wait,2} | |
4775 | @end deftypefun |