]>
Commit | Line | Data |
---|---|---|
390955cb | 1 | @node String and Array Utilities, Character Set Handling, Character Handling, Top |
7a68c94a | 2 | @c %MENU% Utilities for copying and comparing strings and arrays |
28f540f4 RM |
3 | @chapter String and Array Utilities |
4 | ||
2cc4b9cc | 5 | Operations on strings (null-terminated byte sequences) are an important part of |
1f77f049 | 6 | many programs. @Theglibc{} provides an extensive set of string |
28f540f4 RM |
7 | utility functions, including functions for copying, concatenating, |
8 | comparing, and searching strings. Many of these functions can also | |
9 | operate on arbitrary regions of storage; for example, the @code{memcpy} | |
a5113b14 | 10 | function can be used to copy the contents of any kind of array. |
28f540f4 RM |
11 | |
12 | It's fairly common for beginning C programmers to ``reinvent the wheel'' | |
13 | by duplicating this functionality in their own code, but it pays to | |
14 | become familiar with the library functions and to make use of them, | |
15 | since this offers benefits in maintenance, efficiency, and portability. | |
16 | ||
17 | For instance, you could easily compare one string to another in two | |
18 | lines of C code, but if you use the built-in @code{strcmp} function, | |
19 | you're less likely to make a mistake. And, since these library | |
20 | functions are typically highly optimized, your program may run faster | |
21 | too. | |
22 | ||
23 | @menu | |
24 | * Representation of Strings:: Introduction to basic concepts. | |
25 | * String/Array Conventions:: Whether to use a string function or an | |
26 | arbitrary array function. | |
27 | * String Length:: Determining the length of a string. | |
0a13c9e9 PE |
28 | * Copying Strings and Arrays:: Functions to copy strings and arrays. |
29 | * Concatenating Strings:: Functions to concatenate strings while copying. | |
30 | * Truncating Strings:: Functions to truncate strings while copying. | |
28f540f4 RM |
31 | * String/Array Comparison:: Functions for byte-wise and character-wise |
32 | comparison. | |
33 | * Collation Functions:: Functions for collating strings. | |
34 | * Search Functions:: Searching for a specific element or substring. | |
35 | * Finding Tokens in a String:: Splitting a string into tokens by looking | |
36 | for delimiters. | |
ea1bd74d ZW |
37 | * Erasing Sensitive Data:: Clearing memory which contains sensitive |
38 | data, after it's no longer needed. | |
b10a0acc ZW |
39 | * Shuffling Bytes:: Or how to flash-cook a string. |
40 | * Obfuscating Data:: Reversibly obscuring data from casual view. | |
b4012b75 | 41 | * Encode Binary Data:: Encoding and Decoding of Binary Data. |
b13927da | 42 | * Argz and Envz Vectors:: Null-separated string vectors. |
28f540f4 RM |
43 | @end menu |
44 | ||
b4012b75 | 45 | @node Representation of Strings |
28f540f4 RM |
46 | @section Representation of Strings |
47 | @cindex string, representation of | |
48 | ||
49 | This section is a quick summary of string concepts for beginning C | |
2cc4b9cc | 50 | programmers. It describes how strings are represented in C |
28f540f4 RM |
51 | and some common pitfalls. If you are already familiar with this |
52 | material, you can skip this section. | |
53 | ||
54 | @cindex string | |
2cc4b9cc PE |
55 | A @dfn{string} is a null-terminated array of bytes of type @code{char}, |
56 | including the terminating null byte. String-valued | |
28f540f4 | 57 | variables are usually declared to be pointers of type @code{char *}. |
1fb22592 | 58 | Such variables do not include space for the contents of a string; that has |
28f540f4 RM |
59 | to be stored somewhere else---in an array variable, a string constant, |
60 | or dynamically allocated memory (@pxref{Memory Allocation}). It's up to | |
61 | you to store the address of the chosen memory space into the pointer | |
62 | variable. Alternatively you can store a @dfn{null pointer} in the | |
63 | pointer variable. The null pointer does not point anywhere, so | |
64 | attempting to reference the string it points to gets an error. | |
65 | ||
2cc4b9cc PE |
66 | @cindex multibyte character |
67 | @cindex multibyte string | |
68 | @cindex wide string | |
69 | A @dfn{multibyte character} is a sequence of one or more bytes that | |
70 | represents a single character using the locale's encoding scheme; a | |
71 | null byte always represents the null character. A @dfn{multibyte | |
72 | string} is a string that consists entirely of multibyte | |
73 | characters. In contrast, a @dfn{wide string} is a null-terminated | |
74 | sequence of @code{wchar_t} objects. A wide-string variable is usually | |
75 | declared to be a pointer of type @code{wchar_t *}, by analogy with | |
76 | string variables and @code{char *}. @xref{Extended Char Intro}. | |
77 | ||
78 | @cindex null byte | |
8a2f1f5b | 79 | @cindex null wide character |
2cc4b9cc PE |
80 | By convention, the @dfn{null byte}, @code{'\0'}, |
81 | marks the end of a string and the @dfn{null wide character}, | |
82 | @code{L'\0'}, marks the end of a wide string. For example, in | |
8a2f1f5b | 83 | testing to see whether the @code{char *} variable @var{p} points to a |
2cc4b9cc | 84 | null byte marking the end of a string, you can write |
8a2f1f5b | 85 | @code{!*@var{p}} or @code{*@var{p} == '\0'}. |
28f540f4 | 86 | |
2cc4b9cc PE |
87 | A null byte is quite different conceptually from a null pointer, |
88 | although both are represented by the integer constant @code{0}. | |
28f540f4 RM |
89 | |
90 | @cindex string literal | |
2cc4b9cc PE |
91 | A @dfn{string literal} appears in C program source as a multibyte |
92 | string between double-quote characters (@samp{"}). If the | |
93 | initial double-quote character is immediately preceded by a capital | |
94 | @samp{L} (ell) character (as in @code{L"foo"}), it is a wide string | |
95 | literal. String literals can also contribute to @dfn{string | |
96 | concatenation}: @code{"a" "b"} is the same as @code{"ab"}. | |
97 | For wide strings one can use either | |
8a2f1f5b UD |
98 | @code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is |
99 | not allowed by the GNU C compiler, because literals are placed in | |
100 | read-only storage. | |
28f540f4 | 101 | |
2cc4b9cc | 102 | Arrays that are declared @code{const} cannot be modified |
28f540f4 RM |
103 | either. It's generally good style to declare non-modifiable string |
104 | pointers to be of type @code{const char *}, since this often allows the | |
105 | C compiler to detect accidental modifications as well as providing some | |
106 | amount of documentation about what your program intends to do with the | |
107 | string. | |
108 | ||
2cc4b9cc PE |
109 | The amount of memory allocated for a byte array may extend past the null byte |
110 | that marks the end of the string that the array contains. In this | |
dd7d45e8 | 111 | document, the term @dfn{allocated size} is always used to refer to the |
2cc4b9cc PE |
112 | total amount of memory allocated for an array, while the term |
113 | @dfn{length} refers to the number of bytes up to (but not including) | |
114 | the terminating null byte. Wide strings are similar, except their | |
115 | sizes and lengths count wide characters, not bytes. | |
28f540f4 RM |
116 | @cindex length of string |
117 | @cindex allocation size of string | |
118 | @cindex size of string | |
119 | @cindex string length | |
120 | @cindex string allocation | |
121 | ||
2cc4b9cc | 122 | A notorious source of program bugs is trying to put more bytes into a |
28f540f4 | 123 | string than fit in its allocated size. When writing code that extends |
2cc4b9cc | 124 | strings or moves bytes into a pre-allocated array, you should be |
1fb22592 | 125 | very careful to keep track of the length of the string and make explicit |
28f540f4 RM |
126 | checks for overflowing the array. Many of the library functions |
127 | @emph{do not} do this for you! Remember also that you need to allocate | |
2cc4b9cc | 128 | an extra byte to hold the null byte that marks the end of the |
28f540f4 RM |
129 | string. |
130 | ||
8a2f1f5b UD |
131 | @cindex single-byte string |
132 | @cindex multibyte string | |
2cc4b9cc | 133 | Originally strings were sequences of bytes where each byte represented a |
8a2f1f5b UD |
134 | single character. This is still true today if the strings are encoded |
135 | using a single-byte character encoding. Things are different if the | |
136 | strings are encoded using a multibyte encoding (for more information on | |
137 | encodings see @ref{Extended Char Intro}). There is no difference in | |
138 | the programming interface for these two kind of strings; the programmer | |
139 | has to be aware of this and interpret the byte sequences accordingly. | |
140 | ||
141 | But since there is no separate interface taking care of these | |
142 | differences the byte-based string functions are sometimes hard to use. | |
143 | Since the count parameters of these functions specify bytes a call to | |
2cc4b9cc | 144 | @code{memcpy} could cut a multibyte character in the middle and put an |
8a2f1f5b UD |
145 | incomplete (and therefore unusable) byte sequence in the target buffer. |
146 | ||
2cc4b9cc | 147 | @cindex wide string |
8a2f1f5b UD |
148 | To avoid these problems later versions of the @w{ISO C} standard |
149 | introduce a second set of functions which are operating on @dfn{wide | |
150 | characters} (@pxref{Extended Char Intro}). These functions don't have | |
151 | the problems the single-byte versions have since every wide character is | |
152 | a legal, interpretable value. This does not mean that cutting wide | |
2cc4b9cc | 153 | strings at arbitrary points is without problems. It normally |
8a2f1f5b UD |
154 | is for alphabet-based languages (except for non-normalized text) but |
155 | languages based on syllables still have the problem that more than one | |
156 | wide character is necessary to complete a logical unit. This is a | |
157 | higher level problem which the @w{C library} functions are not designed | |
158 | to solve. But it is at least good that no invalid byte sequences can be | |
2cc4b9cc PE |
159 | created. Also, the higher level functions can also much more easily operate |
160 | on wide characters than on multibyte characters so that a common strategy | |
8a2f1f5b UD |
161 | is to use wide characters internally whenever text is more than simply |
162 | copied. | |
163 | ||
164 | The remaining of this chapter will discuss the functions for handling | |
2cc4b9cc PE |
165 | wide strings in parallel with the discussion of |
166 | strings since there is almost always an exact equivalent | |
8a2f1f5b UD |
167 | available. |
168 | ||
b4012b75 | 169 | @node String/Array Conventions |
28f540f4 RM |
170 | @section String and Array Conventions |
171 | ||
172 | This chapter describes both functions that work on arbitrary arrays or | |
2cc4b9cc PE |
173 | blocks of memory, and functions that are specific to strings and wide |
174 | strings. | |
28f540f4 RM |
175 | |
176 | Functions that operate on arbitrary blocks of memory have names | |
8a2f1f5b UD |
177 | beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and |
178 | @code{wmemcpy}) and invariably take an argument which specifies the size | |
179 | (in bytes and wide characters respectively) of the block of memory to | |
28f540f4 | 180 | operate on. The array arguments and return values for these functions |
d1dcb565 | 181 | have type @code{void *} or @code{wchar_t *}. As a matter of style, the |
8a2f1f5b UD |
182 | elements of the arrays used with the @samp{mem} functions are referred |
183 | to as ``bytes''. You can pass any kind of pointer to these functions, | |
184 | and the @code{sizeof} operator is useful in computing the value for the | |
185 | size argument. Parameters to the @samp{wmem} functions must be of type | |
186 | @code{wchar_t *}. These functions are not really usable with anything | |
187 | but arrays of this type. | |
188 | ||
189 | In contrast, functions that operate specifically on strings and wide | |
2cc4b9cc | 190 | strings have names beginning with @samp{str} and @samp{wcs} |
8a2f1f5b | 191 | respectively (such as @code{strcpy} and @code{wcscpy}) and look for a |
2cc4b9cc | 192 | terminating null byte or null wide character instead of requiring an explicit |
8a2f1f5b | 193 | size argument to be passed. (Some of these functions accept a specified |
2cc4b9cc PE |
194 | maximum length, but they also check for premature termination.) |
195 | The array arguments and return values for these | |
8a2f1f5b | 196 | functions have type @code{char *} and @code{wchar_t *} respectively, and |
2cc4b9cc | 197 | the array elements are referred to as ``bytes'' and ``wide |
8a2f1f5b UD |
198 | characters''. |
199 | ||
200 | In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs} | |
201 | versions of a function. The one that is more appropriate to use depends | |
202 | on the exact situation. When your program is manipulating arbitrary | |
203 | arrays or blocks of storage, then you should always use the @samp{mem} | |
2cc4b9cc | 204 | functions. On the other hand, when you are manipulating |
8a2f1f5b UD |
205 | strings it is usually more convenient to use the @samp{str}/@samp{wcs} |
206 | functions, unless you already know the length of the string in advance. | |
207 | The @samp{wmem} functions should be used for wide character arrays with | |
208 | known size. | |
209 | ||
210 | @cindex wint_t | |
211 | @cindex parameter promotion | |
212 | Some of the memory and string functions take single characters as | |
213 | arguments. Since a value of type @code{char} is automatically promoted | |
9dcc8f11 | 214 | into a value of type @code{int} when used as a parameter, the functions |
8a2f1f5b | 215 | are declared with @code{int} as the type of the parameter in question. |
2cc4b9cc | 216 | In case of the wide character functions the situation is similar: the |
8a2f1f5b UD |
217 | parameter type for a single wide character is @code{wint_t} and not |
218 | @code{wchar_t}. This would for many implementations not be necessary | |
2cc4b9cc | 219 | since @code{wchar_t} is large enough to not be automatically |
8a2f1f5b UD |
220 | promoted, but since the @w{ISO C} standard does not require such a |
221 | choice of types the @code{wint_t} type is used. | |
28f540f4 | 222 | |
b4012b75 | 223 | @node String Length |
28f540f4 RM |
224 | @section String Length |
225 | ||
226 | You can get the length of a string using the @code{strlen} function. | |
227 | This function is declared in the header file @file{string.h}. | |
228 | @pindex string.h | |
229 | ||
28f540f4 | 230 | @deftypefun size_t strlen (const char *@var{s}) |
d08a7e4c | 231 | @standards{ISO, string.h} |
11087373 | 232 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
2cc4b9cc | 233 | The @code{strlen} function returns the length of the |
8a2f1f5b | 234 | string @var{s} in bytes. (In other words, it returns the offset of the |
2cc4b9cc | 235 | terminating null byte within the array.) |
28f540f4 RM |
236 | |
237 | For example, | |
238 | @smallexample | |
239 | strlen ("hello, world") | |
240 | @result{} 12 | |
241 | @end smallexample | |
242 | ||
2cc4b9cc | 243 | When applied to an array, the @code{strlen} function returns |
dd7d45e8 | 244 | the length of the string stored there, not its allocated size. You can |
2cc4b9cc | 245 | get the allocated size of the array that holds a string using |
28f540f4 RM |
246 | the @code{sizeof} operator: |
247 | ||
248 | @smallexample | |
a5113b14 | 249 | char string[32] = "hello, world"; |
28f540f4 RM |
250 | sizeof (string) |
251 | @result{} 32 | |
252 | strlen (string) | |
253 | @result{} 12 | |
254 | @end smallexample | |
dd7d45e8 | 255 | |
2cc4b9cc | 256 | But beware, this will not work unless @var{string} is the |
dd7d45e8 UD |
257 | array itself, not a pointer to it. For example: |
258 | ||
259 | @smallexample | |
260 | char string[32] = "hello, world"; | |
261 | char *ptr = string; | |
262 | sizeof (string) | |
263 | @result{} 32 | |
264 | sizeof (ptr) | |
265 | @result{} 4 /* @r{(on a machine with 4 byte pointers)} */ | |
266 | @end smallexample | |
267 | ||
268 | This is an easy mistake to make when you are working with functions that | |
269 | take string arguments; those arguments are always pointers, not arrays. | |
270 | ||
8a2f1f5b UD |
271 | It must also be noted that for multibyte encoded strings the return |
272 | value does not have to correspond to the number of characters in the | |
273 | string. To get this value the string can be converted to wide | |
274 | characters and @code{wcslen} can be used or something like the following | |
275 | code can be used: | |
276 | ||
277 | @smallexample | |
278 | /* @r{The input is in @code{string}.} | |
279 | @r{The length is expected in @code{n}.} */ | |
280 | @{ | |
281 | mbstate_t t; | |
282 | char *scopy = string; | |
283 | /* In initial state. */ | |
284 | memset (&t, '\0', sizeof (t)); | |
285 | /* Determine number of characters. */ | |
286 | n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t); | |
287 | @} | |
288 | @end smallexample | |
289 | ||
290 | This is cumbersome to do so if the number of characters (as opposed to | |
291 | bytes) is needed often it is better to work with wide characters. | |
292 | @end deftypefun | |
293 | ||
294 | The wide character equivalent is declared in @file{wchar.h}. | |
295 | ||
8a2f1f5b | 296 | @deftypefun size_t wcslen (const wchar_t *@var{ws}) |
d08a7e4c | 297 | @standards{ISO, wchar.h} |
11087373 | 298 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
299 | The @code{wcslen} function is the wide character equivalent to |
300 | @code{strlen}. The return value is the number of wide characters in the | |
2cc4b9cc | 301 | wide string pointed to by @var{ws} (this is also the offset of |
8a2f1f5b UD |
302 | the terminating null wide character of @var{ws}). |
303 | ||
2cc4b9cc | 304 | Since there are no multi wide character sequences making up one wide |
8a2f1f5b UD |
305 | character the return value is not only the offset in the array, it is |
306 | also the number of wide characters. | |
307 | ||
308 | This function was introduced in @w{Amendment 1} to @w{ISO C90}. | |
28f540f4 RM |
309 | @end deftypefun |
310 | ||
4547c1a4 | 311 | @deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen}) |
b79238db | 312 | @standards{POSIX.1, string.h} |
11087373 | 313 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b79238db PE |
314 | This returns the offset of the first null byte in the array @var{s}, |
315 | except that it returns @var{maxlen} if the first @var{maxlen} bytes | |
316 | are all non-null. | |
317 | Therefore this function is equivalent to | |
ebaf36eb JM |
318 | @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})} |
319 | but it | |
2cc4b9cc PE |
320 | is more efficient and works even if @var{s} is not null-terminated so |
321 | long as @var{maxlen} does not exceed the size of @var{s}'s array. | |
4547c1a4 UD |
322 | |
323 | @smallexample | |
324 | char string[32] = "hello, world"; | |
325 | strnlen (string, 32) | |
326 | @result{} 12 | |
327 | strnlen (string, 5) | |
328 | @result{} 5 | |
329 | @end smallexample | |
330 | ||
b79238db PE |
331 | This function is part of POSIX.1-2008 and later editions, but was |
332 | available in @theglibc{} and other systems as an extension long before | |
333 | it was standardized. It is declared in @file{string.h}. | |
8a2f1f5b UD |
334 | @end deftypefun |
335 | ||
8a2f1f5b | 336 | @deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen}) |
d08a7e4c | 337 | @standards{GNU, wchar.h} |
11087373 | 338 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
339 | @code{wcsnlen} is the wide character equivalent to @code{strnlen}. The |
340 | @var{maxlen} parameter specifies the maximum number of wide characters. | |
341 | ||
b79238db PE |
342 | This function is part of POSIX.1-2008 and later editions, and is |
343 | declared in @file{wchar.h}. | |
4547c1a4 UD |
344 | @end deftypefun |
345 | ||
0a13c9e9 PE |
346 | @node Copying Strings and Arrays |
347 | @section Copying Strings and Arrays | |
28f540f4 RM |
348 | |
349 | You can use the functions described in this section to copy the contents | |
0a13c9e9 PE |
350 | of strings, wide strings, and arrays. The @samp{str} and @samp{mem} |
351 | functions are declared in @file{string.h} while the @samp{w} functions | |
352 | are declared in @file{wchar.h}. | |
28f540f4 | 353 | @pindex string.h |
8a2f1f5b | 354 | @pindex wchar.h |
28f540f4 RM |
355 | @cindex copying strings and arrays |
356 | @cindex string copy functions | |
357 | @cindex array copy functions | |
358 | @cindex concatenating strings | |
359 | @cindex string concatenation functions | |
360 | ||
361 | A helpful way to remember the ordering of the arguments to the functions | |
362 | in this section is that it corresponds to an assignment expression, with | |
0a13c9e9 PE |
363 | the destination array specified to the left of the source array. Most |
364 | of these functions return the address of the destination array; a few | |
365 | return the address of the destination's terminating null, or of just | |
366 | past the destination. | |
28f540f4 RM |
367 | |
368 | Most of these functions do not work properly if the source and | |
369 | destination arrays overlap. For example, if the beginning of the | |
370 | destination array overlaps the end of the source array, the original | |
371 | contents of that part of the source array may get overwritten before it | |
372 | is copied. Even worse, in the case of the string functions, the null | |
2cc4b9cc | 373 | byte marking the end of the string may be lost, and the copy |
28f540f4 RM |
374 | function might get stuck in a loop trashing all the memory allocated to |
375 | your program. | |
376 | ||
377 | All functions that have problems copying between overlapping arrays are | |
378 | explicitly identified in this manual. In addition to functions in this | |
379 | section, there are a few others like @code{sprintf} (@pxref{Formatted | |
380 | Output Functions}) and @code{scanf} (@pxref{Formatted Input | |
381 | Functions}). | |
382 | ||
8a2f1f5b | 383 | @deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size}) |
d08a7e4c | 384 | @standards{ISO, string.h} |
11087373 | 385 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
386 | The @code{memcpy} function copies @var{size} bytes from the object |
387 | beginning at @var{from} into the object beginning at @var{to}. The | |
388 | behavior of this function is undefined if the two arrays @var{to} and | |
389 | @var{from} overlap; use @code{memmove} instead if overlapping is possible. | |
390 | ||
391 | The value returned by @code{memcpy} is the value of @var{to}. | |
392 | ||
393 | Here is an example of how you might use @code{memcpy} to copy the | |
394 | contents of an array: | |
395 | ||
396 | @smallexample | |
397 | struct foo *oldarray, *newarray; | |
398 | int arraysize; | |
399 | @dots{} | |
400 | memcpy (new, old, arraysize * sizeof (struct foo)); | |
401 | @end smallexample | |
402 | @end deftypefun | |
403 | ||
79827876 | 404 | @deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
d08a7e4c | 405 | @standards{ISO, wchar.h} |
11087373 | 406 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
407 | The @code{wmemcpy} function copies @var{size} wide characters from the object |
408 | beginning at @var{wfrom} into the object beginning at @var{wto}. The | |
409 | behavior of this function is undefined if the two arrays @var{wto} and | |
410 | @var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible. | |
411 | ||
412 | The following is a possible implementation of @code{wmemcpy} but there | |
413 | are more optimizations possible. | |
414 | ||
415 | @smallexample | |
416 | wchar_t * | |
417 | wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, | |
418 | size_t size) | |
419 | @{ | |
420 | return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t)); | |
421 | @} | |
422 | @end smallexample | |
423 | ||
424 | The value returned by @code{wmemcpy} is the value of @var{wto}. | |
425 | ||
426 | This function was introduced in @w{Amendment 1} to @w{ISO C90}. | |
427 | @end deftypefun | |
428 | ||
8a2f1f5b | 429 | @deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size}) |
d08a7e4c | 430 | @standards{GNU, string.h} |
11087373 | 431 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
4547c1a4 | 432 | The @code{mempcpy} function is nearly identical to the @code{memcpy} |
f2ea0f5b | 433 | function. It copies @var{size} bytes from the object beginning at |
4547c1a4 | 434 | @code{from} into the object pointed to by @var{to}. But instead of |
976780fd | 435 | returning the value of @var{to} it returns a pointer to the byte |
4547c1a4 UD |
436 | following the last written byte in the object beginning at @var{to}. |
437 | I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}. | |
438 | ||
439 | This function is useful in situations where a number of objects shall be | |
440 | copied to consecutive memory positions. | |
441 | ||
442 | @smallexample | |
443 | void * | |
444 | combine (void *o1, size_t s1, void *o2, size_t s2) | |
445 | @{ | |
446 | void *result = malloc (s1 + s2); | |
447 | if (result != NULL) | |
448 | mempcpy (mempcpy (result, o1, s1), o2, s2); | |
449 | return result; | |
450 | @} | |
451 | @end smallexample | |
452 | ||
453 | This function is a GNU extension. | |
454 | @end deftypefun | |
455 | ||
8a2f1f5b | 456 | @deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
d08a7e4c | 457 | @standards{GNU, wchar.h} |
11087373 | 458 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
459 | The @code{wmempcpy} function is nearly identical to the @code{wmemcpy} |
460 | function. It copies @var{size} wide characters from the object | |
461 | beginning at @code{wfrom} into the object pointed to by @var{wto}. But | |
462 | instead of returning the value of @var{wto} it returns a pointer to the | |
463 | wide character following the last written wide character in the object | |
464 | beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}. | |
465 | ||
466 | This function is useful in situations where a number of objects shall be | |
467 | copied to consecutive memory positions. | |
468 | ||
469 | The following is a possible implementation of @code{wmemcpy} but there | |
470 | are more optimizations possible. | |
471 | ||
472 | @smallexample | |
473 | wchar_t * | |
474 | wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, | |
475 | size_t size) | |
476 | @{ | |
477 | return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); | |
478 | @} | |
479 | @end smallexample | |
480 | ||
481 | This function is a GNU extension. | |
482 | @end deftypefun | |
483 | ||
28f540f4 | 484 | @deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size}) |
d08a7e4c | 485 | @standards{ISO, string.h} |
11087373 | 486 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
487 | @code{memmove} copies the @var{size} bytes at @var{from} into the |
488 | @var{size} bytes at @var{to}, even if those two blocks of space | |
489 | overlap. In the case of overlap, @code{memmove} is careful to copy the | |
490 | original values of the bytes in the block at @var{from}, including those | |
491 | bytes which also belong to the block at @var{to}. | |
8a2f1f5b UD |
492 | |
493 | The value returned by @code{memmove} is the value of @var{to}. | |
494 | @end deftypefun | |
495 | ||
8ded91fb | 496 | @deftypefun {wchar_t *} wmemmove (wchar_t *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size}) |
d08a7e4c | 497 | @standards{ISO, wchar.h} |
11087373 | 498 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
499 | @code{wmemmove} copies the @var{size} wide characters at @var{wfrom} |
500 | into the @var{size} wide characters at @var{wto}, even if those two | |
f0f308c1 | 501 | blocks of space overlap. In the case of overlap, @code{wmemmove} is |
8a2f1f5b UD |
502 | careful to copy the original values of the wide characters in the block |
503 | at @var{wfrom}, including those wide characters which also belong to the | |
504 | block at @var{wto}. | |
505 | ||
506 | The following is a possible implementation of @code{wmemcpy} but there | |
507 | are more optimizations possible. | |
508 | ||
509 | @smallexample | |
510 | wchar_t * | |
511 | wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, | |
512 | size_t size) | |
513 | @{ | |
514 | return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); | |
515 | @} | |
516 | @end smallexample | |
517 | ||
518 | The value returned by @code{wmemmove} is the value of @var{wto}. | |
519 | ||
520 | This function is a GNU extension. | |
28f540f4 RM |
521 | @end deftypefun |
522 | ||
8a2f1f5b | 523 | @deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size}) |
d08a7e4c | 524 | @standards{SVID, string.h} |
11087373 | 525 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
526 | This function copies no more than @var{size} bytes from @var{from} to |
527 | @var{to}, stopping if a byte matching @var{c} is found. The return | |
528 | value is a pointer into @var{to} one byte past where @var{c} was copied, | |
529 | or a null pointer if no byte matching @var{c} appeared in the first | |
530 | @var{size} bytes of @var{from}. | |
531 | @end deftypefun | |
532 | ||
28f540f4 | 533 | @deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size}) |
d08a7e4c | 534 | @standards{ISO, string.h} |
11087373 | 535 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
536 | This function copies the value of @var{c} (converted to an |
537 | @code{unsigned char}) into each of the first @var{size} bytes of the | |
538 | object beginning at @var{block}. It returns the value of @var{block}. | |
539 | @end deftypefun | |
540 | ||
8a2f1f5b | 541 | @deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size}) |
d08a7e4c | 542 | @standards{ISO, wchar.h} |
11087373 | 543 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
544 | This function copies the value of @var{wc} into each of the first |
545 | @var{size} wide characters of the object beginning at @var{block}. It | |
546 | returns the value of @var{block}. | |
547 | @end deftypefun | |
548 | ||
8a2f1f5b | 549 | @deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from}) |
d08a7e4c | 550 | @standards{ISO, string.h} |
11087373 | 551 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
2cc4b9cc PE |
552 | This copies bytes from the string @var{from} (up to and including |
553 | the terminating null byte) into the string @var{to}. Like | |
28f540f4 RM |
554 | @code{memcpy}, this function has undefined results if the strings |
555 | overlap. The return value is the value of @var{to}. | |
556 | @end deftypefun | |
557 | ||
8a2f1f5b | 558 | @deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) |
d08a7e4c | 559 | @standards{ISO, wchar.h} |
11087373 | 560 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
2cc4b9cc | 561 | This copies wide characters from the wide string @var{wfrom} (up to and |
8a2f1f5b UD |
562 | including the terminating null wide character) into the string |
563 | @var{wto}. Like @code{wmemcpy}, this function has undefined results if | |
564 | the strings overlap. The return value is the value of @var{wto}. | |
565 | @end deftypefun | |
566 | ||
28f540f4 | 567 | @deftypefun {char *} strdup (const char *@var{s}) |
a448ee41 | 568 | @standards{SVID, string.h} |
11087373 | 569 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2cc4b9cc | 570 | This function copies the string @var{s} into a newly |
28f540f4 RM |
571 | allocated string. The string is allocated using @code{malloc}; see |
572 | @ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space | |
573 | for the new string, @code{strdup} returns a null pointer. Otherwise it | |
574 | returns a pointer to the new string. | |
575 | @end deftypefun | |
576 | ||
8a2f1f5b | 577 | @deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws}) |
d08a7e4c | 578 | @standards{GNU, wchar.h} |
11087373 | 579 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2cc4b9cc | 580 | This function copies the wide string @var{ws} |
8a2f1f5b UD |
581 | into a newly allocated string. The string is allocated using |
582 | @code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc} | |
583 | cannot allocate space for the new string, @code{wcsdup} returns a null | |
2cc4b9cc | 584 | pointer. Otherwise it returns a pointer to the new wide string. |
8a2f1f5b UD |
585 | |
586 | This function is a GNU extension. | |
587 | @end deftypefun | |
588 | ||
8a2f1f5b | 589 | @deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from}) |
d08a7e4c | 590 | @standards{Unknown origin, string.h} |
11087373 | 591 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
592 | This function is like @code{strcpy}, except that it returns a pointer to |
593 | the end of the string @var{to} (that is, the address of the terminating | |
2cc4b9cc | 594 | null byte @code{to + strlen (from)}) rather than the beginning. |
28f540f4 RM |
595 | |
596 | For example, this program uses @code{stpcpy} to concatenate @samp{foo} | |
597 | and @samp{bar} to produce @samp{foobar}, which it then prints. | |
598 | ||
599 | @smallexample | |
600 | @include stpcpy.c.texi | |
601 | @end smallexample | |
602 | ||
c30c3f46 RM |
603 | This function is part of POSIX.1-2008 and later editions, but was |
604 | available in @theglibc{} and other systems as an extension long before | |
605 | it was standardized. | |
28f540f4 | 606 | |
8a2f1f5b UD |
607 | Its behavior is undefined if the strings overlap. The function is |
608 | declared in @file{string.h}. | |
609 | @end deftypefun | |
610 | ||
8a2f1f5b | 611 | @deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) |
d08a7e4c | 612 | @standards{GNU, wchar.h} |
11087373 | 613 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
614 | This function is like @code{wcscpy}, except that it returns a pointer to |
615 | the end of the string @var{wto} (that is, the address of the terminating | |
2cc4b9cc | 616 | null wide character @code{wto + wcslen (wfrom)}) rather than the beginning. |
8a2f1f5b UD |
617 | |
618 | This function is not part of ISO or POSIX but was found useful while | |
1f77f049 | 619 | developing @theglibc{} itself. |
8a2f1f5b UD |
620 | |
621 | The behavior of @code{wcpcpy} is undefined if the strings overlap. | |
622 | ||
623 | @code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}. | |
28f540f4 RM |
624 | @end deftypefun |
625 | ||
26b4d766 | 626 | @deftypefn {Macro} {char *} strdupa (const char *@var{s}) |
d08a7e4c | 627 | @standards{GNU, string.h} |
11087373 | 628 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
976780fd | 629 | This macro is similar to @code{strdup} but allocates the new string |
dd7d45e8 UD |
630 | using @code{alloca} instead of @code{malloc} (@pxref{Variable Size |
631 | Automatic}). This means of course the returned string has the same | |
632 | limitations as any block of memory allocated using @code{alloca}. | |
706074a5 | 633 | |
dd7d45e8 | 634 | For obvious reasons @code{strdupa} is implemented only as a macro; |
40a55d20 | 635 | you cannot get the address of this function. Despite this limitation |
706074a5 UD |
636 | it is a useful function. The following code shows a situation where |
637 | using @code{malloc} would be a lot more expensive. | |
638 | ||
639 | @smallexample | |
640 | @include strdupa.c.texi | |
641 | @end smallexample | |
642 | ||
643 | Please note that calling @code{strtok} using @var{path} directly is | |
8a2f1f5b UD |
644 | invalid. It is also not allowed to call @code{strdupa} in the argument |
645 | list of @code{strtok} since @code{strdupa} uses @code{alloca} | |
646 | (@pxref{Variable Size Automatic}) can interfere with the parameter | |
647 | passing. | |
706074a5 UD |
648 | |
649 | This function is only available if GNU CC is used. | |
26b4d766 | 650 | @end deftypefn |
706074a5 | 651 | |
0a13c9e9 | 652 | @deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size}) |
d08a7e4c | 653 | @standards{BSD, string.h} |
11087373 | 654 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
0a13c9e9 PE |
655 | This is a partially obsolete alternative for @code{memmove}, derived from |
656 | BSD. Note that it is not quite equivalent to @code{memmove}, because the | |
657 | arguments are not in the same order and there is no return value. | |
658 | @end deftypefun | |
706074a5 | 659 | |
0a13c9e9 | 660 | @deftypefun void bzero (void *@var{block}, size_t @var{size}) |
d08a7e4c | 661 | @standards{BSD, string.h} |
0a13c9e9 PE |
662 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
663 | This is a partially obsolete alternative for @code{memset}, derived from | |
664 | BSD. Note that it is not as general as @code{memset}, because the only | |
665 | value it can store is zero. | |
666 | @end deftypefun | |
706074a5 | 667 | |
0a13c9e9 PE |
668 | @node Concatenating Strings |
669 | @section Concatenating Strings | |
670 | @pindex string.h | |
671 | @pindex wchar.h | |
672 | @cindex concatenating strings | |
673 | @cindex string concatenation functions | |
674 | ||
675 | The functions described in this section concatenate the contents of a | |
676 | string or wide string to another. They follow the string-copying | |
677 | functions in their conventions. @xref{Copying Strings and Arrays}. | |
678 | @samp{strcat} is declared in the header file @file{string.h} while | |
679 | @samp{wcscat} is declared in @file{wchar.h}. | |
706074a5 | 680 | |
1fb22592 PE |
681 | As noted below, these functions are problematic as their callers may |
682 | have performance issues. | |
683 | ||
8a2f1f5b | 684 | @deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from}) |
d08a7e4c | 685 | @standards{ISO, string.h} |
11087373 | 686 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 687 | The @code{strcat} function is similar to @code{strcpy}, except that the |
2cc4b9cc PE |
688 | bytes from @var{from} are concatenated or appended to the end of |
689 | @var{to}, instead of overwriting it. That is, the first byte from | |
690 | @var{from} overwrites the null byte marking the end of @var{to}. | |
28f540f4 RM |
691 | |
692 | An equivalent definition for @code{strcat} would be: | |
693 | ||
694 | @smallexample | |
695 | char * | |
8a2f1f5b | 696 | strcat (char *restrict to, const char *restrict from) |
28f540f4 RM |
697 | @{ |
698 | strcpy (to + strlen (to), from); | |
699 | return to; | |
700 | @} | |
701 | @end smallexample | |
702 | ||
703 | This function has undefined results if the strings overlap. | |
0a13c9e9 PE |
704 | |
705 | As noted below, this function has significant performance issues. | |
28f540f4 RM |
706 | @end deftypefun |
707 | ||
8a2f1f5b | 708 | @deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) |
d08a7e4c | 709 | @standards{ISO, wchar.h} |
11087373 | 710 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b | 711 | The @code{wcscat} function is similar to @code{wcscpy}, except that the |
2cc4b9cc PE |
712 | wide characters from @var{wfrom} are concatenated or appended to the end of |
713 | @var{wto}, instead of overwriting it. That is, the first wide character from | |
714 | @var{wfrom} overwrites the null wide character marking the end of @var{wto}. | |
8a2f1f5b UD |
715 | |
716 | An equivalent definition for @code{wcscat} would be: | |
717 | ||
718 | @smallexample | |
719 | wchar_t * | |
720 | wcscat (wchar_t *wto, const wchar_t *wfrom) | |
721 | @{ | |
722 | wcscpy (wto + wcslen (wto), wfrom); | |
723 | return wto; | |
724 | @} | |
725 | @end smallexample | |
726 | ||
727 | This function has undefined results if the strings overlap. | |
0a13c9e9 PE |
728 | |
729 | As noted below, this function has significant performance issues. | |
8a2f1f5b UD |
730 | @end deftypefun |
731 | ||
d2fda60e PE |
732 | Programmers using the @code{strcat} or @code{wcscat} functions (or the |
733 | @code{strlcat}, @code{strncat} and @code{wcsncat} functions defined in | |
0a13c9e9 | 734 | a later section, for that matter) |
8a2f1f5b UD |
735 | can easily be recognized as lazy and reckless. In almost all situations |
736 | the lengths of the participating strings are known (it better should be | |
737 | since how can one otherwise ensure the allocated size of the buffer is | |
738 | sufficient?) Or at least, one could know them if one keeps track of the | |
ee2752ea | 739 | results of the various function calls. But then it is very inefficient |
8a2f1f5b UD |
740 | to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the |
741 | end of the destination string so that the actual copying can start. | |
742 | This is a common example: | |
ee2752ea | 743 | |
ee2752ea UD |
744 | @cindex va_copy |
745 | @smallexample | |
49c091e5 | 746 | /* @r{This function concatenates arbitrarily many strings. The last} |
ee2752ea UD |
747 | @r{parameter must be @code{NULL}.} */ |
748 | char * | |
8a2f1f5b | 749 | concat (const char *str, @dots{}) |
ee2752ea UD |
750 | @{ |
751 | va_list ap, ap2; | |
752 | size_t total = 1; | |
ee2752ea UD |
753 | |
754 | va_start (ap, str); | |
b5982523 | 755 | va_copy (ap2, ap); |
ee2752ea UD |
756 | |
757 | /* @r{Determine how much space we need.} */ | |
bdc674d9 | 758 | for (const char *s = str; s != NULL; s = va_arg (ap, const char *)) |
ee2752ea UD |
759 | total += strlen (s); |
760 | ||
761 | va_end (ap); | |
762 | ||
bdc674d9 | 763 | char *result = malloc (total); |
ee2752ea UD |
764 | if (result != NULL) |
765 | @{ | |
766 | result[0] = '\0'; | |
767 | ||
768 | /* @r{Copy the strings.} */ | |
769 | for (s = str; s != NULL; s = va_arg (ap2, const char *)) | |
770 | strcat (result, s); | |
771 | @} | |
772 | ||
773 | va_end (ap2); | |
774 | ||
775 | return result; | |
776 | @} | |
777 | @end smallexample | |
778 | ||
779 | This looks quite simple, especially the second loop where the strings | |
780 | are actually copied. But these innocent lines hide a major performance | |
781 | penalty. Just imagine that ten strings of 100 bytes each have to be | |
782 | concatenated. For the second string we search the already stored 100 | |
783 | bytes for the end of the string so that we can append the next string. | |
784 | For all strings in total the comparisons necessary to find the end of | |
785 | the intermediate results sums up to 5500! If we combine the copying | |
786 | with the search for the allocation we can write this function more | |
f0f308c1 | 787 | efficiently: |
ee2752ea UD |
788 | |
789 | @smallexample | |
790 | char * | |
8a2f1f5b | 791 | concat (const char *str, @dots{}) |
ee2752ea | 792 | @{ |
ee2752ea | 793 | size_t allocated = 100; |
bdc674d9 | 794 | char *result = malloc (allocated); |
ee2752ea | 795 | |
623281e0 | 796 | if (result != NULL) |
ee2752ea | 797 | @{ |
bdc674d9 PE |
798 | va_list ap; |
799 | size_t resultlen = 0; | |
ee2752ea UD |
800 | char *newp; |
801 | ||
623281e0 | 802 | va_start (ap, str); |
ee2752ea | 803 | |
bdc674d9 | 804 | for (const char *s = str; s != NULL; s = va_arg (ap, const char *)) |
ee2752ea UD |
805 | @{ |
806 | size_t len = strlen (s); | |
807 | ||
808 | /* @r{Resize the allocated memory if necessary.} */ | |
bdc674d9 | 809 | if (resultlen + len + 1 > allocated) |
ee2752ea | 810 | @{ |
bdc674d9 PE |
811 | allocated += len; |
812 | newp = reallocarray (result, allocated, 2); | |
813 | allocated *= 2; | |
ee2752ea UD |
814 | if (newp == NULL) |
815 | @{ | |
816 | free (result); | |
817 | return NULL; | |
818 | @} | |
ee2752ea UD |
819 | result = newp; |
820 | @} | |
821 | ||
bdc674d9 PE |
822 | memcpy (result + resultlen, s, len); |
823 | resultlen += len; | |
ee2752ea UD |
824 | @} |
825 | ||
826 | /* @r{Terminate the result string.} */ | |
bdc674d9 | 827 | result[resultlen++] = '\0'; |
ee2752ea UD |
828 | |
829 | /* @r{Resize memory to the optimal size.} */ | |
bdc674d9 | 830 | newp = realloc (result, resultlen); |
ee2752ea UD |
831 | if (newp != NULL) |
832 | result = newp; | |
833 | ||
834 | va_end (ap); | |
835 | @} | |
836 | ||
837 | return result; | |
838 | @} | |
839 | @end smallexample | |
840 | ||
841 | With a bit more knowledge about the input strings one could fine-tune | |
842 | the memory allocation. The difference we are pointing to here is that | |
843 | we don't use @code{strcat} anymore. We always keep track of the length | |
f0f308c1 | 844 | of the current intermediate result so we can save ourselves the search for the |
ee2752ea | 845 | end of the string and use @code{mempcpy}. Please note that we also |
f0f308c1 RJ |
846 | don't use @code{stpcpy} which might seem more natural since we are handling |
847 | strings. But this is not necessary since we already know the | |
ee2752ea | 848 | length of the string and therefore can use the faster memory copying |
8a2f1f5b | 849 | function. The example would work for wide characters the same way. |
ee2752ea UD |
850 | |
851 | Whenever a programmer feels the need to use @code{strcat} she or he | |
f0f308c1 | 852 | should think twice and look through the program to see whether the code cannot |
1fb22592 | 853 | be rewritten to take advantage of already calculated results. |
d2fda60e PE |
854 | The related functions @code{strlcat}, @code{strncat}, |
855 | @code{wcscat} and @code{wcsncat} | |
1fb22592 PE |
856 | are almost always unnecessary, too. |
857 | Again: it is almost always unnecessary to use functions like @code{strcat}. | |
ee2752ea | 858 | |
0a13c9e9 PE |
859 | @node Truncating Strings |
860 | @section Truncating Strings while Copying | |
861 | @cindex truncating strings | |
862 | @cindex string truncation | |
863 | ||
864 | The functions described in this section copy or concatenate the | |
865 | possibly-truncated contents of a string or array to another, and | |
866 | similarly for wide strings. They follow the string-copying functions | |
867 | in their header conventions. @xref{Copying Strings and Arrays}. The | |
868 | @samp{str} functions are declared in the header file @file{string.h} | |
869 | and the @samp{wc} functions are declared in the file @file{wchar.h}. | |
870 | ||
1fb22592 PE |
871 | As noted below, these functions are problematic as their callers may |
872 | have truncation-related bugs and performance issues. | |
873 | ||
0a13c9e9 | 874 | @deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
a448ee41 | 875 | @standards{C90, string.h} |
0a13c9e9 PE |
876 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
877 | This function is similar to @code{strcpy} but always copies exactly | |
878 | @var{size} bytes into @var{to}. | |
879 | ||
880 | If @var{from} does not contain a null byte in its first @var{size} | |
881 | bytes, @code{strncpy} copies just the first @var{size} bytes. In this | |
882 | case no null terminator is written into @var{to}. | |
883 | ||
884 | Otherwise @var{from} must be a string with length less than | |
885 | @var{size}. In this case @code{strncpy} copies all of @var{from}, | |
886 | followed by enough null bytes to add up to @var{size} bytes in all. | |
887 | ||
888 | The behavior of @code{strncpy} is undefined if the strings overlap. | |
889 | ||
890 | This function was designed for now-rarely-used arrays consisting of | |
891 | non-null bytes followed by zero or more null bytes. It needs to set | |
892 | all @var{size} bytes of the destination, even when @var{size} is much | |
893 | greater than the length of @var{from}. As noted below, this function | |
1fb22592 | 894 | is generally a poor choice for processing strings. |
0a13c9e9 PE |
895 | @end deftypefun |
896 | ||
0a13c9e9 | 897 | @deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
d08a7e4c | 898 | @standards{ISO, wchar.h} |
0a13c9e9 PE |
899 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
900 | This function is similar to @code{wcscpy} but always copies exactly | |
901 | @var{size} wide characters into @var{wto}. | |
902 | ||
903 | If @var{wfrom} does not contain a null wide character in its first | |
904 | @var{size} wide characters, then @code{wcsncpy} copies just the first | |
905 | @var{size} wide characters. In this case no null terminator is | |
906 | written into @var{wto}. | |
907 | ||
908 | Otherwise @var{wfrom} must be a wide string with length less than | |
909 | @var{size}. In this case @code{wcsncpy} copies all of @var{wfrom}, | |
910 | followed by enough null wide characters to add up to @var{size} wide | |
911 | characters in all. | |
912 | ||
913 | The behavior of @code{wcsncpy} is undefined if the strings overlap. | |
914 | ||
915 | This function is the wide-character counterpart of @code{strncpy} and | |
916 | suffers from most of the problems that @code{strncpy} does. For | |
917 | example, as noted below, this function is generally a poor choice for | |
1fb22592 | 918 | processing strings. |
0a13c9e9 PE |
919 | @end deftypefun |
920 | ||
0a13c9e9 | 921 | @deftypefun {char *} strndup (const char *@var{s}, size_t @var{size}) |
d08a7e4c | 922 | @standards{GNU, string.h} |
0a13c9e9 PE |
923 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
924 | This function is similar to @code{strdup} but always copies at most | |
925 | @var{size} bytes into the newly allocated string. | |
926 | ||
927 | If the length of @var{s} is more than @var{size}, then @code{strndup} | |
928 | copies just the first @var{size} bytes and adds a closing null byte. | |
929 | Otherwise all bytes are copied and the string is terminated. | |
930 | ||
931 | This function differs from @code{strncpy} in that it always terminates | |
932 | the destination string. | |
933 | ||
934 | As noted below, this function is generally a poor choice for | |
1fb22592 | 935 | processing strings. |
0a13c9e9 PE |
936 | |
937 | @code{strndup} is a GNU extension. | |
938 | @end deftypefun | |
939 | ||
0a13c9e9 | 940 | @deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size}) |
d08a7e4c | 941 | @standards{GNU, string.h} |
0a13c9e9 PE |
942 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
943 | This function is similar to @code{strndup} but like @code{strdupa} it | |
944 | allocates the new string using @code{alloca} @pxref{Variable Size | |
945 | Automatic}. The same advantages and limitations of @code{strdupa} are | |
946 | valid for @code{strndupa}, too. | |
947 | ||
948 | This function is implemented only as a macro, just like @code{strdupa}. | |
949 | Just as @code{strdupa} this macro also must not be used inside the | |
950 | parameter list in a function call. | |
951 | ||
952 | As noted below, this function is generally a poor choice for | |
1fb22592 | 953 | processing strings. |
0a13c9e9 PE |
954 | |
955 | @code{strndupa} is only available if GNU CC is used. | |
956 | @end deftypefn | |
957 | ||
0a13c9e9 | 958 | @deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
d08a7e4c | 959 | @standards{GNU, string.h} |
0a13c9e9 PE |
960 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
961 | This function is similar to @code{stpcpy} but copies always exactly | |
962 | @var{size} bytes into @var{to}. | |
963 | ||
964 | If the length of @var{from} is more than @var{size}, then @code{stpncpy} | |
965 | copies just the first @var{size} bytes and returns a pointer to the | |
966 | byte directly following the one which was copied last. Note that in | |
967 | this case there is no null terminator written into @var{to}. | |
968 | ||
969 | If the length of @var{from} is less than @var{size}, then @code{stpncpy} | |
970 | copies all of @var{from}, followed by enough null bytes to add up | |
971 | to @var{size} bytes in all. This behavior is rarely useful, but it | |
972 | is implemented to be useful in contexts where this behavior of the | |
973 | @code{strncpy} is used. @code{stpncpy} returns a pointer to the | |
974 | @emph{first} written null byte. | |
975 | ||
976 | This function is not part of ISO or POSIX but was found useful while | |
977 | developing @theglibc{} itself. | |
978 | ||
979 | Its behavior is undefined if the strings overlap. The function is | |
980 | declared in @file{string.h}. | |
981 | ||
982 | As noted below, this function is generally a poor choice for | |
1fb22592 | 983 | processing strings. |
0a13c9e9 PE |
984 | @end deftypefun |
985 | ||
0a13c9e9 | 986 | @deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
d08a7e4c | 987 | @standards{GNU, wchar.h} |
0a13c9e9 PE |
988 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
989 | This function is similar to @code{wcpcpy} but copies always exactly | |
990 | @var{wsize} wide characters into @var{wto}. | |
991 | ||
992 | If the length of @var{wfrom} is more than @var{size}, then | |
993 | @code{wcpncpy} copies just the first @var{size} wide characters and | |
994 | returns a pointer to the wide character directly following the last | |
995 | non-null wide character which was copied last. Note that in this case | |
996 | there is no null terminator written into @var{wto}. | |
997 | ||
998 | If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy} | |
999 | copies all of @var{wfrom}, followed by enough null wide characters to add up | |
1000 | to @var{size} wide characters in all. This behavior is rarely useful, but it | |
1001 | is implemented to be useful in contexts where this behavior of the | |
1002 | @code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the | |
1003 | @emph{first} written null wide character. | |
1004 | ||
1005 | This function is not part of ISO or POSIX but was found useful while | |
1006 | developing @theglibc{} itself. | |
1007 | ||
1008 | Its behavior is undefined if the strings overlap. | |
1009 | ||
1010 | As noted below, this function is generally a poor choice for | |
1fb22592 | 1011 | processing strings. |
0a13c9e9 PE |
1012 | |
1013 | @code{wcpncpy} is a GNU extension. | |
1014 | @end deftypefun | |
1015 | ||
8a2f1f5b | 1016 | @deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
d08a7e4c | 1017 | @standards{ISO, string.h} |
11087373 | 1018 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 1019 | This function is like @code{strcat} except that not more than @var{size} |
2cc4b9cc PE |
1020 | bytes from @var{from} are appended to the end of @var{to}, and |
1021 | @var{from} need not be null-terminated. A single null byte is also | |
1022 | always appended to @var{to}, so the total | |
28f540f4 RM |
1023 | allocated size of @var{to} must be at least @code{@var{size} + 1} bytes |
1024 | longer than its initial length. | |
1025 | ||
1026 | The @code{strncat} function could be implemented like this: | |
1027 | ||
1028 | @smallexample | |
1029 | @group | |
1030 | char * | |
1031 | strncat (char *to, const char *from, size_t size) | |
1032 | @{ | |
5d1d4918 PE |
1033 | size_t len = strlen (to); |
1034 | memcpy (to + len, from, strnlen (from, size)); | |
1035 | to[len + strnlen (from, size)] = '\0'; | |
28f540f4 RM |
1036 | return to; |
1037 | @} | |
1038 | @end group | |
1039 | @end smallexample | |
1040 | ||
1041 | The behavior of @code{strncat} is undefined if the strings overlap. | |
0a13c9e9 PE |
1042 | |
1043 | As a companion to @code{strncpy}, @code{strncat} was designed for | |
1044 | now-rarely-used arrays consisting of non-null bytes followed by zero | |
dff8da6b | 1045 | or more null bytes. However, As noted below, this function is generally a poor |
1fb22592 | 1046 | choice for processing strings. Also, this function has significant |
0a13c9e9 | 1047 | performance issues. @xref{Concatenating Strings}. |
28f540f4 RM |
1048 | @end deftypefun |
1049 | ||
8a2f1f5b | 1050 | @deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
d08a7e4c | 1051 | @standards{ISO, wchar.h} |
11087373 | 1052 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b | 1053 | This function is like @code{wcscat} except that not more than @var{size} |
2cc4b9cc PE |
1054 | wide characters from @var{from} are appended to the end of @var{to}, |
1055 | and @var{from} need not be null-terminated. A single null wide | |
1056 | character is also always appended to @var{to}, so the total allocated | |
1057 | size of @var{to} must be at least @code{wcsnlen (@var{wfrom}, | |
1058 | @var{size}) + 1} wide characters longer than its initial length. | |
8a2f1f5b UD |
1059 | |
1060 | The @code{wcsncat} function could be implemented like this: | |
1061 | ||
1062 | @smallexample | |
1063 | @group | |
1064 | wchar_t * | |
1065 | wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom, | |
1066 | size_t size) | |
1067 | @{ | |
5d1d4918 PE |
1068 | size_t len = wcslen (wto); |
1069 | memcpy (wto + len, wfrom, wcsnlen (wfrom, size) * sizeof (wchar_t)); | |
1070 | wto[len + wcsnlen (wfrom, size)] = L'\0'; | |
8a2f1f5b UD |
1071 | return wto; |
1072 | @} | |
1073 | @end group | |
1074 | @end smallexample | |
1075 | ||
1076 | The behavior of @code{wcsncat} is undefined if the strings overlap. | |
28f540f4 | 1077 | |
0a13c9e9 | 1078 | As noted below, this function is generally a poor choice for |
1fb22592 | 1079 | processing strings. Also, this function has significant performance |
0a13c9e9 PE |
1080 | issues. @xref{Concatenating Strings}. |
1081 | @end deftypefun | |
1082 | ||
d2fda60e PE |
1083 | @deftypefun size_t strlcpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
1084 | @standards{BSD, string.h} | |
1085 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1086 | This function copies the string @var{from} to the destination array | |
1087 | @var{to}, limiting the result's size (including the null terminator) | |
1088 | to @var{size}. The caller should ensure that @var{size} includes room | |
1089 | for the result's terminating null byte. | |
1090 | ||
1091 | If @var{size} is greater than the length of the string @var{from}, | |
1092 | this function copies the non-null bytes of the string | |
1093 | @var{from} to the destination array @var{to}, | |
1094 | and terminates the copy with a null byte. Like other | |
1095 | string functions such as @code{strcpy}, but unlike @code{strncpy}, any | |
1096 | remaining bytes in the destination array remain unchanged. | |
1097 | ||
1098 | If @var{size} is nonzero and less than or equal to the the length of the string | |
1099 | @var{from}, this function copies only the first @samp{@var{size} - 1} | |
1100 | bytes to the destination array @var{to}, and writes a terminating null | |
1101 | byte to the last byte of the array. | |
1102 | ||
1103 | This function returns the length of the string @var{from}. This means | |
1104 | that truncation occurs if and only if the returned value is greater | |
1105 | than or equal to @var{size}. | |
1106 | ||
1107 | The behavior is undefined if @var{to} or @var{from} is a null pointer, | |
1108 | or if the destination array's size is less than @var{size}, or if the | |
1109 | string @var{from} overlaps the first @var{size} bytes of the | |
1110 | destination array. | |
1111 | ||
1112 | As noted below, this function is generally a poor choice for | |
1113 | processing strings. Also, this function has a performance issue, | |
1114 | as its time cost is proportional to the length of @var{from} | |
1115 | even when @var{size} is small. | |
1116 | ||
1117 | This function is derived from OpenBSD 2.4. | |
1118 | @end deftypefun | |
1119 | ||
1120 | @deftypefun size_t wcslcpy (wchar_t *restrict @var{to}, const wchar_t *restrict @var{from}, size_t @var{size}) | |
1121 | @standards{BSD, string.h} | |
1122 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1123 | This function is a variant of @code{strlcpy} for wide strings. | |
1124 | The @var{size} argument counts the length of the destination buffer in | |
1125 | wide characters (and not bytes). | |
1126 | ||
1127 | This function is derived from BSD. | |
1128 | @end deftypefun | |
1129 | ||
1130 | @deftypefun size_t strlcat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) | |
1131 | @standards{BSD, string.h} | |
1132 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1133 | This function appends the string @var{from} to the | |
1134 | string @var{to}, limiting the result's total size (including the null | |
1135 | terminator) to @var{size}. The caller should ensure that @var{size} | |
1136 | includes room for the result's terminating null byte. | |
1137 | ||
1138 | This function copies as much as possible of the string @var{from} into | |
1139 | the array at @var{to} of @var{size} bytes, starting at the terminating | |
1140 | null byte of the original string @var{to}. In effect, this appends | |
1141 | the string @var{from} to the string @var{to}. Although the resulting | |
1142 | string will contain a null terminator, it can be truncated (not all | |
1143 | bytes in @var{from} may be copied). | |
1144 | ||
1145 | This function returns the sum of the original length of @var{to} and | |
1146 | the length of @var{from}. This means that truncation occurs if and | |
1147 | only if the returned value is greater than or equal to @var{size}. | |
1148 | ||
1149 | The behavior is undefined if @var{to} or @var{from} is a null pointer, | |
1150 | or if the destination array's size is less than @var{size}, or if the | |
1151 | destination array does not contain a null byte in its first @var{size} | |
1152 | bytes, or if the string @var{from} overlaps the first @var{size} bytes | |
1153 | of the destination array. | |
1154 | ||
1155 | As noted below, this function is generally a poor choice for | |
1156 | processing strings. Also, this function has significant performance | |
1157 | issues. @xref{Concatenating Strings}. | |
1158 | ||
1159 | This function is derived from OpenBSD 2.4. | |
1160 | @end deftypefun | |
1161 | ||
1162 | @deftypefun size_t wcslcat (wchar_t *restrict @var{to}, const wchar_t *restrict @var{from}, size_t @var{size}) | |
1163 | @standards{BSD, string.h} | |
1164 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | |
1165 | This function is a variant of @code{strlcat} for wide strings. | |
1166 | The @var{size} argument counts the length of the destination buffer in | |
1167 | wide characters (and not bytes). | |
1168 | ||
1169 | This function is derived from BSD. | |
1170 | @end deftypefun | |
1171 | ||
0a13c9e9 | 1172 | Because these functions can abruptly truncate strings or wide strings, |
1fb22592 | 1173 | they are generally poor choices for processing them. When copying or |
0a13c9e9 PE |
1174 | concatening multibyte strings, they can truncate within a multibyte |
1175 | character so that the result is not a valid multibyte string. When | |
1176 | combining or concatenating multibyte or wide strings, they may | |
1177 | truncate the output after a combining character, resulting in a | |
1178 | corrupted grapheme. They can cause bugs even when processing | |
1179 | single-byte strings: for example, when calculating an ASCII-only user | |
1180 | name, a truncated name can identify the wrong user. | |
1181 | ||
1182 | Although some buffer overruns can be prevented by manually replacing | |
1183 | calls to copying functions with calls to truncation functions, there | |
54ae6d81 PE |
1184 | are often easier and safer automatic techniques, such as fortification |
1185 | (@pxref{Source Fortification}) and AddressSanitizer | |
1186 | (@pxref{Instrumentation Options,, Program Instrumentation Options, gcc, Using GCC}). | |
1187 | Because truncation functions can mask | |
0a13c9e9 PE |
1188 | application bugs that would otherwise be caught by the automatic |
1189 | techniques, these functions should be used only when the application's | |
1190 | underlying logic requires truncation. | |
1191 | ||
1192 | @strong{Note:} GNU programs should not truncate strings or wide | |
1193 | strings to fit arbitrary size limits. @xref{Semantics, , Writing | |
1194 | Robust Programs, standards, The GNU Coding Standards}. Instead of | |
1195 | string-truncation functions, it is usually better to use dynamic | |
1196 | memory allocation (@pxref{Unconstrained Allocation}) and functions | |
1197 | such as @code{strdup} or @code{asprintf} to construct strings. | |
28f540f4 | 1198 | |
b4012b75 | 1199 | @node String/Array Comparison |
28f540f4 RM |
1200 | @section String/Array Comparison |
1201 | @cindex comparing strings and arrays | |
1202 | @cindex string comparison functions | |
1203 | @cindex array comparison functions | |
1204 | @cindex predicates on strings | |
1205 | @cindex predicates on arrays | |
1206 | ||
1207 | You can use the functions in this section to perform comparisons on the | |
1208 | contents of strings and arrays. As well as checking for equality, these | |
1209 | functions can also be used as the ordering functions for sorting | |
1210 | operations. @xref{Searching and Sorting}, for an example of this. | |
1211 | ||
1212 | Unlike most comparison operations in C, the string comparison functions | |
1213 | return a nonzero value if the strings are @emph{not} equivalent rather | |
1214 | than if they are. The sign of the value indicates the relative ordering | |
2cc4b9cc | 1215 | of the first part of the strings that are not equivalent: a |
28f540f4 | 1216 | negative value indicates that the first string is ``less'' than the |
a5113b14 | 1217 | second, while a positive value indicates that the first string is |
28f540f4 RM |
1218 | ``greater''. |
1219 | ||
1220 | The most common use of these functions is to check only for equality. | |
1221 | This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}. | |
1222 | ||
1223 | All of these functions are declared in the header file @file{string.h}. | |
1224 | @pindex string.h | |
1225 | ||
28f540f4 | 1226 | @deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) |
d08a7e4c | 1227 | @standards{ISO, string.h} |
11087373 | 1228 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
1229 | The function @code{memcmp} compares the @var{size} bytes of memory |
1230 | beginning at @var{a1} against the @var{size} bytes of memory beginning | |
1231 | at @var{a2}. The value returned has the same sign as the difference | |
1232 | between the first differing pair of bytes (interpreted as @code{unsigned | |
1233 | char} objects, then promoted to @code{int}). | |
1234 | ||
1235 | If the contents of the two blocks are equal, @code{memcmp} returns | |
1236 | @code{0}. | |
1237 | @end deftypefun | |
1238 | ||
8a2f1f5b | 1239 | @deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size}) |
d08a7e4c | 1240 | @standards{ISO, wchar.h} |
11087373 | 1241 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
1242 | The function @code{wmemcmp} compares the @var{size} wide characters |
1243 | beginning at @var{a1} against the @var{size} wide characters beginning | |
1244 | at @var{a2}. The value returned is smaller than or larger than zero | |
1245 | depending on whether the first differing wide character is @var{a1} is | |
2cc4b9cc | 1246 | smaller or larger than the corresponding wide character in @var{a2}. |
8a2f1f5b UD |
1247 | |
1248 | If the contents of the two blocks are equal, @code{wmemcmp} returns | |
1249 | @code{0}. | |
1250 | @end deftypefun | |
1251 | ||
28f540f4 RM |
1252 | On arbitrary arrays, the @code{memcmp} function is mostly useful for |
1253 | testing equality. It usually isn't meaningful to do byte-wise ordering | |
1254 | comparisons on arrays of things other than bytes. For example, a | |
1255 | byte-wise comparison on the bytes that make up floating-point numbers | |
1256 | isn't likely to tell you anything about the relationship between the | |
1257 | values of the floating-point numbers. | |
1258 | ||
8a2f1f5b UD |
1259 | @code{wmemcmp} is really only useful to compare arrays of type |
1260 | @code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes | |
1261 | at a time and this number of bytes is system dependent. | |
1262 | ||
28f540f4 RM |
1263 | You should also be careful about using @code{memcmp} to compare objects |
1264 | that can contain ``holes'', such as the padding inserted into structure | |
1265 | objects to enforce alignment requirements, extra space at the end of | |
2cc4b9cc | 1266 | unions, and extra bytes at the ends of strings whose length is less |
28f540f4 RM |
1267 | than their allocated size. The contents of these ``holes'' are |
1268 | indeterminate and may cause strange behavior when performing byte-wise | |
1269 | comparisons. For more predictable results, perform an explicit | |
1270 | component-wise comparison. | |
1271 | ||
1272 | For example, given a structure type definition like: | |
1273 | ||
1274 | @smallexample | |
1275 | struct foo | |
1276 | @{ | |
1277 | unsigned char tag; | |
1278 | union | |
1279 | @{ | |
1280 | double f; | |
1281 | long i; | |
1282 | char *p; | |
1283 | @} value; | |
1284 | @}; | |
1285 | @end smallexample | |
1286 | ||
1287 | @noindent | |
1288 | you are better off writing a specialized comparison function to compare | |
1289 | @code{struct foo} objects instead of comparing them with @code{memcmp}. | |
1290 | ||
28f540f4 | 1291 | @deftypefun int strcmp (const char *@var{s1}, const char *@var{s2}) |
d08a7e4c | 1292 | @standards{ISO, string.h} |
11087373 | 1293 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
1294 | The @code{strcmp} function compares the string @var{s1} against |
1295 | @var{s2}, returning a value that has the same sign as the difference | |
2cc4b9cc | 1296 | between the first differing pair of bytes (interpreted as |
28f540f4 RM |
1297 | @code{unsigned char} objects, then promoted to @code{int}). |
1298 | ||
1299 | If the two strings are equal, @code{strcmp} returns @code{0}. | |
1300 | ||
1301 | A consequence of the ordering used by @code{strcmp} is that if @var{s1} | |
1302 | is an initial substring of @var{s2}, then @var{s1} is considered to be | |
1303 | ``less than'' @var{s2}. | |
8a2f1f5b UD |
1304 | |
1305 | @code{strcmp} does not take sorting conventions of the language the | |
1306 | strings are written in into account. To get that one has to use | |
1307 | @code{strcoll}. | |
1308 | @end deftypefun | |
1309 | ||
8a2f1f5b | 1310 | @deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) |
d08a7e4c | 1311 | @standards{ISO, wchar.h} |
11087373 | 1312 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b | 1313 | |
2cc4b9cc | 1314 | The @code{wcscmp} function compares the wide string @var{ws1} |
8a2f1f5b UD |
1315 | against @var{ws2}. The value returned is smaller than or larger than zero |
1316 | depending on whether the first differing wide character is @var{ws1} is | |
2cc4b9cc | 1317 | smaller or larger than the corresponding wide character in @var{ws2}. |
8a2f1f5b UD |
1318 | |
1319 | If the two strings are equal, @code{wcscmp} returns @code{0}. | |
1320 | ||
1321 | A consequence of the ordering used by @code{wcscmp} is that if @var{ws1} | |
1322 | is an initial substring of @var{ws2}, then @var{ws1} is considered to be | |
1323 | ``less than'' @var{ws2}. | |
1324 | ||
1325 | @code{wcscmp} does not take sorting conventions of the language the | |
1326 | strings are written in into account. To get that one has to use | |
1327 | @code{wcscoll}. | |
28f540f4 RM |
1328 | @end deftypefun |
1329 | ||
28f540f4 | 1330 | @deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2}) |
d08a7e4c | 1331 | @standards{BSD, string.h} |
11087373 AO |
1332 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
1333 | @c Although this calls tolower multiple times, it's a macro, and | |
1334 | @c strcasecmp is optimized so that the locale pointer is read only once. | |
1335 | @c There are some asm implementations too, for which the single-read | |
1336 | @c from locale TLS pointers also applies. | |
4547c1a4 | 1337 | This function is like @code{strcmp}, except that differences in case are |
2cc4b9cc PE |
1338 | ignored, and its arguments must be multibyte strings. |
1339 | How uppercase and lowercase characters are related is | |
4547c1a4 UD |
1340 | determined by the currently selected locale. In the standard @code{"C"} |
1341 | locale the characters @"A and @"a do not match but in a locale which | |
dd7d45e8 | 1342 | regards these characters as parts of the alphabet they do match. |
28f540f4 | 1343 | |
85c165be | 1344 | @noindent |
28f540f4 RM |
1345 | @code{strcasecmp} is derived from BSD. |
1346 | @end deftypefun | |
1347 | ||
8ded91fb | 1348 | @deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) |
d08a7e4c | 1349 | @standards{GNU, wchar.h} |
11087373 AO |
1350 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
1351 | @c Since towlower is not a macro, the locale object may be read multiple | |
1352 | @c times. | |
8a2f1f5b UD |
1353 | This function is like @code{wcscmp}, except that differences in case are |
1354 | ignored. How uppercase and lowercase characters are related is | |
1355 | determined by the currently selected locale. In the standard @code{"C"} | |
1356 | locale the characters @"A and @"a do not match but in a locale which | |
1357 | regards these characters as parts of the alphabet they do match. | |
1358 | ||
1359 | @noindent | |
1360 | @code{wcscasecmp} is a GNU extension. | |
1361 | @end deftypefun | |
1362 | ||
8a2f1f5b | 1363 | @deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size}) |
d08a7e4c | 1364 | @standards{ISO, string.h} |
11087373 | 1365 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b | 1366 | This function is the similar to @code{strcmp}, except that no more than |
2cc4b9cc PE |
1367 | @var{size} bytes are compared. In other words, if the two |
1368 | strings are the same in their first @var{size} bytes, the | |
8a2f1f5b UD |
1369 | return value is zero. |
1370 | @end deftypefun | |
1371 | ||
8a2f1f5b | 1372 | @deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size}) |
d08a7e4c | 1373 | @standards{ISO, wchar.h} |
11087373 | 1374 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
f0f308c1 | 1375 | This function is similar to @code{wcscmp}, except that no more than |
8a2f1f5b UD |
1376 | @var{size} wide characters are compared. In other words, if the two |
1377 | strings are the same in their first @var{size} wide characters, the | |
1378 | return value is zero. | |
1379 | @end deftypefun | |
1380 | ||
28f540f4 | 1381 | @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n}) |
d08a7e4c | 1382 | @standards{BSD, string.h} |
11087373 | 1383 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
28f540f4 | 1384 | This function is like @code{strncmp}, except that differences in case |
2cc4b9cc PE |
1385 | are ignored, and the compared parts of the arguments should consist of |
1386 | valid multibyte characters. | |
1387 | Like @code{strcasecmp}, it is locale dependent how | |
dd7d45e8 | 1388 | uppercase and lowercase characters are related. |
28f540f4 | 1389 | |
85c165be | 1390 | @noindent |
28f540f4 RM |
1391 | @code{strncasecmp} is a GNU extension. |
1392 | @end deftypefun | |
1393 | ||
8a2f1f5b | 1394 | @deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n}) |
d08a7e4c | 1395 | @standards{GNU, wchar.h} |
11087373 | 1396 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
8a2f1f5b UD |
1397 | This function is like @code{wcsncmp}, except that differences in case |
1398 | are ignored. Like @code{wcscasecmp}, it is locale dependent how | |
1399 | uppercase and lowercase characters are related. | |
1400 | ||
1401 | @noindent | |
1402 | @code{wcsncasecmp} is a GNU extension. | |
28f540f4 RM |
1403 | @end deftypefun |
1404 | ||
8a2f1f5b UD |
1405 | Here are some examples showing the use of @code{strcmp} and |
1406 | @code{strncmp} (equivalent examples can be constructed for the wide | |
1407 | character functions). These examples assume the use of the ASCII | |
1408 | character set. (If some other character set---say, EBCDIC---is used | |
1409 | instead, then the glyphs are associated with different numeric codes, | |
1410 | and the return values and ordering may differ.) | |
28f540f4 RM |
1411 | |
1412 | @smallexample | |
1413 | strcmp ("hello", "hello") | |
1414 | @result{} 0 /* @r{These two strings are the same.} */ | |
1415 | strcmp ("hello", "Hello") | |
1416 | @result{} 32 /* @r{Comparisons are case-sensitive.} */ | |
1417 | strcmp ("hello", "world") | |
2cc4b9cc | 1418 | @result{} -15 /* @r{The byte @code{'h'} comes before @code{'w'}.} */ |
28f540f4 | 1419 | strcmp ("hello", "hello, world") |
2cc4b9cc | 1420 | @result{} -44 /* @r{Comparing a null byte against a comma.} */ |
6952e59e | 1421 | strncmp ("hello", "hello, world", 5) |
2cc4b9cc | 1422 | @result{} 0 /* @r{The initial 5 bytes are the same.} */ |
28f540f4 | 1423 | strncmp ("hello, world", "hello, stupid world!!!", 5) |
2cc4b9cc | 1424 | @result{} 0 /* @r{The initial 5 bytes are the same.} */ |
28f540f4 RM |
1425 | @end smallexample |
1426 | ||
1f205a47 | 1427 | @deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2}) |
d08a7e4c | 1428 | @standards{GNU, string.h} |
11087373 AO |
1429 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
1430 | @c Calls isdigit multiple times, locale may change in between. | |
1f205a47 | 1431 | The @code{strverscmp} function compares the string @var{s1} against |
f2282d42 RM |
1432 | @var{s2}, considering them as holding indices/version numbers. The |
1433 | return value follows the same conventions as found in the | |
1434 | @code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no | |
f4a36548 FW |
1435 | digits, @code{strverscmp} behaves like @code{strcmp} |
1436 | (in the sense that the sign of the result is the same). | |
1f205a47 | 1437 | |
f4a36548 FW |
1438 | The comparison algorithm which the @code{strverscmp} function implements |
1439 | differs slightly from other version-comparison algorithms. The | |
1440 | implementation is based on a finite-state machine, whose behavior is | |
1441 | approximated below. | |
1f205a47 UD |
1442 | |
1443 | @itemize @bullet | |
1444 | @item | |
f4a36548 FW |
1445 | The input strings are each split into sequences of non-digits and |
1446 | digits. These sequences can be empty at the beginning and end of the | |
1447 | string. Digits are determined by the @code{isdigit} function and are | |
1448 | thus subject to the current locale. | |
1f205a47 UD |
1449 | |
1450 | @item | |
f4a36548 FW |
1451 | Comparison starts with a (possibly empty) non-digit sequence. The first |
1452 | non-equal sequences of non-digits or digits determines the outcome of | |
1453 | the comparison. | |
1f205a47 UD |
1454 | |
1455 | @item | |
f4a36548 FW |
1456 | Corresponding non-digit sequences in both strings are compared |
1457 | lexicographically if their lengths are equal. If the lengths differ, | |
1458 | the shorter non-digit sequence is extended with the input string | |
1459 | character immediately following it (which may be the null terminator), | |
1460 | the other sequence is truncated to be of the same (extended) length, and | |
1461 | these two sequences are compared lexicographically. In the last case, | |
1462 | the sequence comparison determines the result of the function because | |
1463 | the extension character (or some character before it) is necessarily | |
1464 | different from the character at the same offset in the other input | |
1465 | string. | |
1466 | ||
1467 | @item | |
1468 | For two sequences of digits, the number of leading zeros is counted (which | |
1469 | can be zero). If the count differs, the string with more leading zeros | |
1470 | in the digit sequence is considered smaller than the other string. | |
1471 | ||
1472 | @item | |
1473 | If the two sequences of digits have no leading zeros, they are compared | |
1474 | as integers, that is, the string with the longer digit sequence is | |
1475 | deemed larger, and if both sequences are of equal length, they are | |
1476 | compared lexicographically. | |
1477 | ||
1478 | @item | |
1479 | If both digit sequences start with a zero and have an equal number of | |
1480 | leading zeros, they are compared lexicographically if their lengths are | |
1481 | the same. If the lengths differ, the shorter sequence is extended with | |
1482 | the following character in its input string, and the other sequence is | |
1483 | truncated to the same length, and both sequences are compared | |
1484 | lexicographically (similar to the non-digit sequence case above). | |
1f205a47 UD |
1485 | @end itemize |
1486 | ||
f4a36548 FW |
1487 | The treatment of leading zeros and the tie-breaking extension characters |
1488 | (which in effect propagate across non-digit/digit sequence boundaries) | |
1489 | differs from other version-comparison algorithms. | |
1490 | ||
1f205a47 UD |
1491 | @smallexample |
1492 | strverscmp ("no digit", "no digit") | |
0bc93a2f | 1493 | @result{} 0 /* @r{same behavior as strcmp.} */ |
1f205a47 UD |
1494 | strverscmp ("item#99", "item#100") |
1495 | @result{} <0 /* @r{same prefix, but 99 < 100.} */ | |
1496 | strverscmp ("alpha1", "alpha001") | |
f4a36548 | 1497 | @result{} >0 /* @r{different number of leading zeros (0 and 2).} */ |
1f205a47 | 1498 | strverscmp ("part1_f012", "part1_f01") |
f4a36548 | 1499 | @result{} >0 /* @r{lexicographical comparison with leading zeros.} */ |
1f205a47 | 1500 | strverscmp ("foo.009", "foo.0") |
f4a36548 | 1501 | @result{} <0 /* @r{different number of leading zeros (2 and 1).} */ |
1f205a47 UD |
1502 | @end smallexample |
1503 | ||
1f205a47 UD |
1504 | @code{strverscmp} is a GNU extension. |
1505 | @end deftypefun | |
1506 | ||
28f540f4 | 1507 | @deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) |
d08a7e4c | 1508 | @standards{BSD, string.h} |
11087373 | 1509 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
1510 | This is an obsolete alias for @code{memcmp}, derived from BSD. |
1511 | @end deftypefun | |
1512 | ||
b4012b75 | 1513 | @node Collation Functions |
28f540f4 RM |
1514 | @section Collation Functions |
1515 | ||
1516 | @cindex collating strings | |
1517 | @cindex string collation functions | |
1518 | ||
1519 | In some locales, the conventions for lexicographic ordering differ from | |
1520 | the strict numeric ordering of character codes. For example, in Spanish | |
1521 | most glyphs with diacritical marks such as accents are not considered | |
a5177499 BS |
1522 | distinct letters for the purposes of collation. On the other hand, in |
1523 | Czech the two-character sequence @samp{ch} is treated as a single letter | |
1524 | that is collated between @samp{h} and @samp{i}. | |
28f540f4 RM |
1525 | |
1526 | You can use the functions @code{strcoll} and @code{strxfrm} (declared in | |
8a2f1f5b UD |
1527 | the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm} |
1528 | (declared in the headers file @file{wchar}) to compare strings using a | |
1529 | collation ordering appropriate for the current locale. The locale used | |
1530 | by these functions in particular can be specified by setting the locale | |
1531 | for the @code{LC_COLLATE} category; see @ref{Locales}. | |
28f540f4 | 1532 | @pindex string.h |
8a2f1f5b | 1533 | @pindex wchar.h |
28f540f4 RM |
1534 | |
1535 | In the standard C locale, the collation sequence for @code{strcoll} is | |
8a2f1f5b UD |
1536 | the same as that for @code{strcmp}. Similarly, @code{wcscoll} and |
1537 | @code{wcscmp} are the same in this situation. | |
28f540f4 RM |
1538 | |
1539 | Effectively, the way these functions work is by applying a mapping to | |
2cc4b9cc PE |
1540 | transform the characters in a multibyte string to a byte |
1541 | sequence that represents | |
28f540f4 RM |
1542 | the string's position in the collating sequence of the current locale. |
1543 | Comparing two such byte sequences in a simple fashion is equivalent to | |
1544 | comparing the strings with the locale's collating sequence. | |
1545 | ||
8a2f1f5b UD |
1546 | The functions @code{strcoll} and @code{wcscoll} perform this translation |
1547 | implicitly, in order to do one comparison. By contrast, @code{strxfrm} | |
1548 | and @code{wcsxfrm} perform the mapping explicitly. If you are making | |
1549 | multiple comparisons using the same string or set of strings, it is | |
1550 | likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to | |
1551 | transform all the strings just once, and subsequently compare the | |
1552 | transformed strings with @code{strcmp} or @code{wcscmp}. | |
28f540f4 | 1553 | |
28f540f4 | 1554 | @deftypefun int strcoll (const char *@var{s1}, const char *@var{s2}) |
d08a7e4c | 1555 | @standards{ISO, string.h} |
11087373 AO |
1556 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
1557 | @c Calls strcoll_l with the current locale, which dereferences only the | |
1558 | @c LC_COLLATE data pointer. | |
28f540f4 RM |
1559 | The @code{strcoll} function is similar to @code{strcmp} but uses the |
1560 | collating sequence of the current locale for collation (the | |
2cc4b9cc | 1561 | @code{LC_COLLATE} locale). The arguments are multibyte strings. |
28f540f4 RM |
1562 | @end deftypefun |
1563 | ||
8a2f1f5b | 1564 | @deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) |
d08a7e4c | 1565 | @standards{ISO, wchar.h} |
11087373 AO |
1566 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
1567 | @c Same as strcoll, but calling wcscoll_l. | |
8a2f1f5b UD |
1568 | The @code{wcscoll} function is similar to @code{wcscmp} but uses the |
1569 | collating sequence of the current locale for collation (the | |
1570 | @code{LC_COLLATE} locale). | |
1571 | @end deftypefun | |
1572 | ||
28f540f4 RM |
1573 | Here is an example of sorting an array of strings, using @code{strcoll} |
1574 | to compare them. The actual sort algorithm is not written here; it | |
1575 | comes from @code{qsort} (@pxref{Array Sort Function}). The job of the | |
1576 | code shown here is to say how to compare the strings while sorting them. | |
1577 | (Later on in this section, we will show a way to do this more | |
1578 | efficiently using @code{strxfrm}.) | |
1579 | ||
1580 | @smallexample | |
1581 | /* @r{This is the comparison function used with @code{qsort}.} */ | |
1582 | ||
1583 | int | |
e39745ff | 1584 | compare_elements (const void *v1, const void *v2) |
28f540f4 | 1585 | @{ |
e39745ff | 1586 | char * const *p1 = v1; |
a9f5ce09 | 1587 | char * const *p2 = v2; |
e39745ff | 1588 | |
28f540f4 RM |
1589 | return strcoll (*p1, *p2); |
1590 | @} | |
1591 | ||
1592 | /* @r{This is the entry point---the function to sort} | |
1593 | @r{strings using the locale's collating sequence.} */ | |
1594 | ||
1595 | void | |
1596 | sort_strings (char **array, int nstrings) | |
1597 | @{ | |
1598 | /* @r{Sort @code{temp_array} by comparing the strings.} */ | |
9fc19e48 UD |
1599 | qsort (array, nstrings, |
1600 | sizeof (char *), compare_elements); | |
28f540f4 RM |
1601 | @} |
1602 | @end smallexample | |
1603 | ||
1604 | @cindex converting string to collation order | |
8a2f1f5b | 1605 | @deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
d08a7e4c | 1606 | @standards{ISO, string.h} |
11087373 | 1607 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2cc4b9cc PE |
1608 | The function @code{strxfrm} transforms the multibyte string |
1609 | @var{from} using the | |
8a2f1f5b | 1610 | collation transformation determined by the locale currently selected for |
28f540f4 | 1611 | collation, and stores the transformed string in the array @var{to}. Up |
2cc4b9cc | 1612 | to @var{size} bytes (including a terminating null byte) are |
28f540f4 RM |
1613 | stored. |
1614 | ||
1615 | The behavior is undefined if the strings @var{to} and @var{from} | |
0a13c9e9 | 1616 | overlap; see @ref{Copying Strings and Arrays}. |
28f540f4 RM |
1617 | |
1618 | The return value is the length of the entire transformed string. This | |
1619 | value is not affected by the value of @var{size}, but if it is greater | |
a5113b14 UD |
1620 | or equal than @var{size}, it means that the transformed string did not |
1621 | entirely fit in the array @var{to}. In this case, only as much of the | |
1622 | string as actually fits was stored. To get the whole transformed | |
1623 | string, call @code{strxfrm} again with a bigger output array. | |
28f540f4 RM |
1624 | |
1625 | The transformed string may be longer than the original string, and it | |
1626 | may also be shorter. | |
1627 | ||
2cc4b9cc PE |
1628 | If @var{size} is zero, no bytes are stored in @var{to}. In this |
1629 | case, @code{strxfrm} simply returns the number of bytes that would | |
28f540f4 | 1630 | be the length of the transformed string. This is useful for determining |
8a2f1f5b UD |
1631 | what size the allocated array should be. It does not matter what |
1632 | @var{to} is if @var{size} is zero; @var{to} may even be a null pointer. | |
1633 | @end deftypefun | |
1634 | ||
8a2f1f5b | 1635 | @deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size}) |
d08a7e4c | 1636 | @standards{ISO, wchar.h} |
11087373 | 1637 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2cc4b9cc | 1638 | The function @code{wcsxfrm} transforms wide string @var{wfrom} |
8a2f1f5b UD |
1639 | using the collation transformation determined by the locale currently |
1640 | selected for collation, and stores the transformed string in the array | |
1641 | @var{wto}. Up to @var{size} wide characters (including a terminating null | |
2cc4b9cc | 1642 | wide character) are stored. |
8a2f1f5b UD |
1643 | |
1644 | The behavior is undefined if the strings @var{wto} and @var{wfrom} | |
0a13c9e9 | 1645 | overlap; see @ref{Copying Strings and Arrays}. |
8a2f1f5b | 1646 | |
2cc4b9cc | 1647 | The return value is the length of the entire transformed wide |
8a2f1f5b UD |
1648 | string. This value is not affected by the value of @var{size}, but if |
1649 | it is greater or equal than @var{size}, it means that the transformed | |
2cc4b9cc PE |
1650 | wide string did not entirely fit in the array @var{wto}. In |
1651 | this case, only as much of the wide string as actually fits | |
1652 | was stored. To get the whole transformed wide string, call | |
8a2f1f5b UD |
1653 | @code{wcsxfrm} again with a bigger output array. |
1654 | ||
2cc4b9cc PE |
1655 | The transformed wide string may be longer than the original |
1656 | wide string, and it may also be shorter. | |
8a2f1f5b | 1657 | |
2cc4b9cc | 1658 | If @var{size} is zero, no wide characters are stored in @var{to}. In this |
8a2f1f5b | 1659 | case, @code{wcsxfrm} simply returns the number of wide characters that |
2cc4b9cc | 1660 | would be the length of the transformed wide string. This is |
8a2f1f5b UD |
1661 | useful for determining what size the allocated array should be (remember |
1662 | to multiply with @code{sizeof (wchar_t)}). It does not matter what | |
1663 | @var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer. | |
28f540f4 RM |
1664 | @end deftypefun |
1665 | ||
1666 | Here is an example of how you can use @code{strxfrm} when | |
1667 | you plan to do many comparisons. It does the same thing as the previous | |
1668 | example, but much faster, because it has to transform each string only | |
1669 | once, no matter how many times it is compared with other strings. Even | |
1670 | the time needed to allocate and free storage is much less than the time | |
1671 | we save, when there are many strings. | |
1672 | ||
1673 | @smallexample | |
1674 | struct sorter @{ char *input; char *transformed; @}; | |
1675 | ||
1676 | /* @r{This is the comparison function used with @code{qsort}} | |
1677 | @r{to sort an array of @code{struct sorter}.} */ | |
1678 | ||
1679 | int | |
e39745ff | 1680 | compare_elements (const void *v1, const void *v2) |
28f540f4 | 1681 | @{ |
e39745ff AJ |
1682 | const struct sorter *p1 = v1; |
1683 | const struct sorter *p2 = v2; | |
1684 | ||
28f540f4 RM |
1685 | return strcmp (p1->transformed, p2->transformed); |
1686 | @} | |
1687 | ||
1688 | /* @r{This is the entry point---the function to sort} | |
1689 | @r{strings using the locale's collating sequence.} */ | |
1690 | ||
1691 | void | |
1692 | sort_strings_fast (char **array, int nstrings) | |
1693 | @{ | |
1694 | struct sorter temp_array[nstrings]; | |
1695 | int i; | |
1696 | ||
1697 | /* @r{Set up @code{temp_array}. Each element contains} | |
1698 | @r{one input string and its transformed string.} */ | |
1699 | for (i = 0; i < nstrings; i++) | |
1700 | @{ | |
1701 | size_t length = strlen (array[i]) * 2; | |
a5113b14 | 1702 | char *transformed; |
f2ea0f5b | 1703 | size_t transformed_length; |
28f540f4 RM |
1704 | |
1705 | temp_array[i].input = array[i]; | |
1706 | ||
a5113b14 UD |
1707 | /* @r{First try a buffer perhaps big enough.} */ |
1708 | transformed = (char *) xmalloc (length); | |
1709 | ||
1710 | /* @r{Transform @code{array[i]}.} */ | |
1711 | transformed_length = strxfrm (transformed, array[i], length); | |
1712 | ||
1713 | /* @r{If the buffer was not large enough, resize it} | |
1714 | @r{and try again.} */ | |
1715 | if (transformed_length >= length) | |
28f540f4 | 1716 | @{ |
a5113b14 | 1717 | /* @r{Allocate the needed space. +1 for terminating} |
2cc4b9cc | 1718 | @r{@code{'\0'} byte.} */ |
bdc674d9 PE |
1719 | transformed = xrealloc (transformed, |
1720 | transformed_length + 1); | |
a5113b14 UD |
1721 | |
1722 | /* @r{The return value is not interesting because we know} | |
1723 | @r{how long the transformed string is.} */ | |
dd7d45e8 UD |
1724 | (void) strxfrm (transformed, array[i], |
1725 | transformed_length + 1); | |
28f540f4 | 1726 | @} |
a5113b14 UD |
1727 | |
1728 | temp_array[i].transformed = transformed; | |
28f540f4 RM |
1729 | @} |
1730 | ||
1731 | /* @r{Sort @code{temp_array} by comparing transformed strings.} */ | |
89e691f2 AM |
1732 | qsort (temp_array, nstrings, |
1733 | sizeof (struct sorter), compare_elements); | |
28f540f4 RM |
1734 | |
1735 | /* @r{Put the elements back in the permanent array} | |
1736 | @r{in their sorted order.} */ | |
1737 | for (i = 0; i < nstrings; i++) | |
1738 | array[i] = temp_array[i].input; | |
1739 | ||
1740 | /* @r{Free the strings we allocated.} */ | |
1741 | for (i = 0; i < nstrings; i++) | |
1742 | free (temp_array[i].transformed); | |
1743 | @} | |
1744 | @end smallexample | |
1745 | ||
8a2f1f5b UD |
1746 | The interesting part of this code for the wide character version would |
1747 | look like this: | |
1748 | ||
1749 | @smallexample | |
1750 | void | |
1751 | sort_strings_fast (wchar_t **array, int nstrings) | |
1752 | @{ | |
1753 | @dots{} | |
1754 | /* @r{Transform @code{array[i]}.} */ | |
1755 | transformed_length = wcsxfrm (transformed, array[i], length); | |
1756 | ||
1757 | /* @r{If the buffer was not large enough, resize it} | |
1758 | @r{and try again.} */ | |
1759 | if (transformed_length >= length) | |
1760 | @{ | |
1761 | /* @r{Allocate the needed space. +1 for terminating} | |
2cc4b9cc | 1762 | @r{@code{L'\0'} wide character.} */ |
bdc674d9 PE |
1763 | transformed = xreallocarray (transformed, |
1764 | transformed_length + 1, | |
1765 | sizeof *transformed); | |
8a2f1f5b UD |
1766 | |
1767 | /* @r{The return value is not interesting because we know} | |
1768 | @r{how long the transformed string is.} */ | |
1769 | (void) wcsxfrm (transformed, array[i], | |
1770 | transformed_length + 1); | |
1771 | @} | |
1772 | @dots{} | |
1773 | @end smallexample | |
1774 | ||
1775 | @noindent | |
1776 | Note the additional multiplication with @code{sizeof (wchar_t)} in the | |
1777 | @code{realloc} call. | |
1778 | ||
1779 | @strong{Compatibility Note:} The string collation functions are a new | |
976780fd | 1780 | feature of @w{ISO C90}. Older C dialects have no equivalent feature. |
8a2f1f5b UD |
1781 | The wide character versions were introduced in @w{Amendment 1} to @w{ISO |
1782 | C90}. | |
28f540f4 | 1783 | |
b4012b75 | 1784 | @node Search Functions |
28f540f4 RM |
1785 | @section Search Functions |
1786 | ||
1787 | This section describes library functions which perform various kinds | |
1788 | of searching operations on strings and arrays. These functions are | |
1789 | declared in the header file @file{string.h}. | |
1790 | @pindex string.h | |
1791 | @cindex search functions (for strings) | |
1792 | @cindex string search functions | |
1793 | ||
28f540f4 | 1794 | @deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size}) |
d08a7e4c | 1795 | @standards{ISO, string.h} |
11087373 | 1796 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
1797 | This function finds the first occurrence of the byte @var{c} (converted |
1798 | to an @code{unsigned char}) in the initial @var{size} bytes of the | |
1799 | object beginning at @var{block}. The return value is a pointer to the | |
1800 | located byte, or a null pointer if no match was found. | |
1801 | @end deftypefun | |
1802 | ||
8a2f1f5b | 1803 | @deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size}) |
d08a7e4c | 1804 | @standards{ISO, wchar.h} |
11087373 | 1805 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
1806 | This function finds the first occurrence of the wide character @var{wc} |
1807 | in the initial @var{size} wide characters of the object beginning at | |
1808 | @var{block}. The return value is a pointer to the located wide | |
1809 | character, or a null pointer if no match was found. | |
1810 | @end deftypefun | |
1811 | ||
87b56f36 | 1812 | @deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c}) |
d08a7e4c | 1813 | @standards{GNU, string.h} |
11087373 | 1814 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
87b56f36 UD |
1815 | Often the @code{memchr} function is used with the knowledge that the |
1816 | byte @var{c} is available in the memory block specified by the | |
1817 | parameters. But this means that the @var{size} parameter is not really | |
1818 | needed and that the tests performed with it at runtime (to check whether | |
1819 | the end of the block is reached) are not needed. | |
1820 | ||
1821 | The @code{rawmemchr} function exists for just this situation which is | |
1822 | surprisingly frequent. The interface is similar to @code{memchr} except | |
1823 | that the @var{size} parameter is missing. The function will look beyond | |
1824 | the end of the block pointed to by @var{block} in case the programmer | |
6be569a4 | 1825 | made an error in assuming that the byte @var{c} is present in the block. |
87b56f36 UD |
1826 | In this case the result is unspecified. Otherwise the return value is a |
1827 | pointer to the located byte. | |
1828 | ||
32c7acd4 | 1829 | When looking for the end of a string, use @code{strchr}. |
87b56f36 UD |
1830 | |
1831 | This function is a GNU extension. | |
1832 | @end deftypefun | |
1833 | ||
ca747856 | 1834 | @deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size}) |
d08a7e4c | 1835 | @standards{GNU, string.h} |
11087373 | 1836 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
ca747856 RM |
1837 | The function @code{memrchr} is like @code{memchr}, except that it searches |
1838 | backwards from the end of the block defined by @var{block} and @var{size} | |
1839 | (instead of forwards from the front). | |
4efcb713 UD |
1840 | |
1841 | This function is a GNU extension. | |
a2d63612 | 1842 | @end deftypefun |
ca747856 | 1843 | |
28f540f4 | 1844 | @deftypefun {char *} strchr (const char *@var{string}, int @var{c}) |
d08a7e4c | 1845 | @standards{ISO, string.h} |
11087373 | 1846 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
2cc4b9cc PE |
1847 | The @code{strchr} function finds the first occurrence of the byte |
1848 | @var{c} (converted to a @code{char}) in the string | |
28f540f4 | 1849 | beginning at @var{string}. The return value is a pointer to the located |
2cc4b9cc | 1850 | byte, or a null pointer if no match was found. |
28f540f4 RM |
1851 | |
1852 | For example, | |
1853 | @smallexample | |
1854 | strchr ("hello, world", 'l') | |
1855 | @result{} "llo, world" | |
1856 | strchr ("hello, world", '?') | |
1857 | @result{} NULL | |
a5113b14 | 1858 | @end smallexample |
28f540f4 | 1859 | |
2cc4b9cc | 1860 | The terminating null byte is considered to be part of the string, |
28f540f4 | 1861 | so you can use this function get a pointer to the end of a string by |
2cc4b9cc | 1862 | specifying zero as the value of the @var{c} argument. |
0520adde FB |
1863 | |
1864 | When @code{strchr} returns a null pointer, it does not let you know | |
2cc4b9cc | 1865 | the position of the terminating null byte it has found. If you |
0520adde FB |
1866 | need that information, it is better (but less portable) to use |
1867 | @code{strchrnul} than to search for it a second time. | |
8a2f1f5b UD |
1868 | @end deftypefun |
1869 | ||
f801cf7b | 1870 | @deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, wchar_t @var{wc}) |
d08a7e4c | 1871 | @standards{ISO, wchar.h} |
11087373 | 1872 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b | 1873 | The @code{wcschr} function finds the first occurrence of the wide |
2cc4b9cc | 1874 | character @var{wc} in the wide string |
8a2f1f5b UD |
1875 | beginning at @var{wstring}. The return value is a pointer to the |
1876 | located wide character, or a null pointer if no match was found. | |
1877 | ||
2cc4b9cc PE |
1878 | The terminating null wide character is considered to be part of the wide |
1879 | string, so you can use this function get a pointer to the end | |
1880 | of a wide string by specifying a null wide character as the | |
8a2f1f5b UD |
1881 | value of the @var{wc} argument. It would be better (but less portable) |
1882 | to use @code{wcschrnul} in this case, though. | |
28f540f4 RM |
1883 | @end deftypefun |
1884 | ||
0e4ee106 | 1885 | @deftypefun {char *} strchrnul (const char *@var{string}, int @var{c}) |
d08a7e4c | 1886 | @standards{GNU, string.h} |
11087373 | 1887 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
0e4ee106 | 1888 | @code{strchrnul} is the same as @code{strchr} except that if it does |
2cc4b9cc PE |
1889 | not find the byte, it returns a pointer to string's terminating |
1890 | null byte rather than a null pointer. | |
8a2f1f5b UD |
1891 | |
1892 | This function is a GNU extension. | |
1893 | @end deftypefun | |
1894 | ||
8a2f1f5b | 1895 | @deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc}) |
d08a7e4c | 1896 | @standards{GNU, wchar.h} |
11087373 | 1897 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b | 1898 | @code{wcschrnul} is the same as @code{wcschr} except that if it does not |
2cc4b9cc | 1899 | find the wide character, it returns a pointer to the wide string's |
8a2f1f5b UD |
1900 | terminating null wide character rather than a null pointer. |
1901 | ||
1902 | This function is a GNU extension. | |
28f540f4 RM |
1903 | @end deftypefun |
1904 | ||
ec28fc7c | 1905 | One useful, but unusual, use of the @code{strchr} |
2cc4b9cc | 1906 | function is when one wants to have a pointer pointing to the null byte |
ee2752ea UD |
1907 | terminating a string. This is often written in this way: |
1908 | ||
1909 | @smallexample | |
1910 | s += strlen (s); | |
1911 | @end smallexample | |
1912 | ||
1913 | @noindent | |
1914 | This is almost optimal but the addition operation duplicated a bit of | |
1915 | the work already done in the @code{strlen} function. A better solution | |
1916 | is this: | |
1917 | ||
1918 | @smallexample | |
1919 | s = strchr (s, '\0'); | |
1920 | @end smallexample | |
1921 | ||
1922 | There is no restriction on the second parameter of @code{strchr} so it | |
2cc4b9cc | 1923 | could very well also be zero. Those readers thinking very |
ee2752ea | 1924 | hard about this might now point out that the @code{strchr} function is |
8c474db5 | 1925 | more expensive than the @code{strlen} function since we have two abort |
1f77f049 | 1926 | criteria. This is right. But in @theglibc{} the implementation of |
0e4ee106 | 1927 | @code{strchr} is optimized in a special way so that @code{strchr} |
8c474db5 | 1928 | actually is faster. |
ee2752ea | 1929 | |
28f540f4 | 1930 | @deftypefun {char *} strrchr (const char *@var{string}, int @var{c}) |
d08a7e4c | 1931 | @standards{ISO, string.h} |
11087373 | 1932 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
1933 | The function @code{strrchr} is like @code{strchr}, except that it searches |
1934 | backwards from the end of the string @var{string} (instead of forwards | |
1935 | from the front). | |
1936 | ||
1937 | For example, | |
1938 | @smallexample | |
1939 | strrchr ("hello, world", 'l') | |
1940 | @result{} "ld" | |
1941 | @end smallexample | |
1942 | @end deftypefun | |
1943 | ||
4315f45c | 1944 | @deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{wc}) |
d08a7e4c | 1945 | @standards{ISO, wchar.h} |
11087373 | 1946 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
1947 | The function @code{wcsrchr} is like @code{wcschr}, except that it searches |
1948 | backwards from the end of the string @var{wstring} (instead of forwards | |
1949 | from the front). | |
1950 | @end deftypefun | |
1951 | ||
28f540f4 | 1952 | @deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle}) |
d08a7e4c | 1953 | @standards{ISO, string.h} |
11087373 | 1954 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 1955 | This is like @code{strchr}, except that it searches @var{haystack} for a |
2cc4b9cc | 1956 | substring @var{needle} rather than just a single byte. It |
28f540f4 | 1957 | returns a pointer into the string @var{haystack} that is the first |
2cc4b9cc | 1958 | byte of the substring, or a null pointer if no match was found. If |
28f540f4 RM |
1959 | @var{needle} is an empty string, the function returns @var{haystack}. |
1960 | ||
1961 | For example, | |
1962 | @smallexample | |
1963 | strstr ("hello, world", "l") | |
1964 | @result{} "llo, world" | |
1965 | strstr ("hello, world", "wo") | |
1966 | @result{} "world" | |
1967 | @end smallexample | |
1968 | @end deftypefun | |
1969 | ||
8a2f1f5b | 1970 | @deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle}) |
d08a7e4c | 1971 | @standards{ISO, wchar.h} |
11087373 | 1972 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
1973 | This is like @code{wcschr}, except that it searches @var{haystack} for a |
1974 | substring @var{needle} rather than just a single wide character. It | |
1975 | returns a pointer into the string @var{haystack} that is the first wide | |
1976 | character of the substring, or a null pointer if no match was found. If | |
1977 | @var{needle} is an empty string, the function returns @var{haystack}. | |
1978 | @end deftypefun | |
1979 | ||
8a2f1f5b | 1980 | @deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle}) |
d08a7e4c | 1981 | @standards{XPG, wchar.h} |
11087373 | 1982 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
9dcc8f11 | 1983 | @code{wcswcs} is a deprecated alias for @code{wcsstr}. This is the |
8a2f1f5b UD |
1984 | name originally used in the X/Open Portability Guide before the |
1985 | @w{Amendment 1} to @w{ISO C90} was published. | |
1986 | @end deftypefun | |
1987 | ||
28f540f4 | 1988 | |
0e4ee106 | 1989 | @deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle}) |
d08a7e4c | 1990 | @standards{GNU, string.h} |
11087373 AO |
1991 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
1992 | @c There may be multiple calls of strncasecmp, each accessing the locale | |
1993 | @c object independently. | |
0e4ee106 UD |
1994 | This is like @code{strstr}, except that it ignores case in searching for |
1995 | the substring. Like @code{strcasecmp}, it is locale dependent how | |
2cc4b9cc PE |
1996 | uppercase and lowercase characters are related, and arguments are |
1997 | multibyte strings. | |
0e4ee106 UD |
1998 | |
1999 | ||
2000 | For example, | |
2001 | @smallexample | |
d6868416 | 2002 | strcasestr ("hello, world", "L") |
0e4ee106 | 2003 | @result{} "llo, world" |
d6868416 | 2004 | strcasestr ("hello, World", "wo") |
0e4ee106 UD |
2005 | @result{} "World" |
2006 | @end smallexample | |
2007 | @end deftypefun | |
2008 | ||
2009 | ||
63551311 | 2010 | @deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len}) |
d08a7e4c | 2011 | @standards{GNU, string.h} |
11087373 | 2012 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 2013 | This is like @code{strstr}, but @var{needle} and @var{haystack} are byte |
2cc4b9cc | 2014 | arrays rather than strings. @var{needle-len} is the |
28f540f4 | 2015 | length of @var{needle} and @var{haystack-len} is the length of |
0005e54f | 2016 | @var{haystack}. |
28f540f4 RM |
2017 | |
2018 | This function is a GNU extension. | |
2019 | @end deftypefun | |
2020 | ||
28f540f4 | 2021 | @deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset}) |
d08a7e4c | 2022 | @standards{ISO, string.h} |
11087373 | 2023 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 2024 | The @code{strspn} (``string span'') function returns the length of the |
2cc4b9cc | 2025 | initial substring of @var{string} that consists entirely of bytes that |
28f540f4 | 2026 | are members of the set specified by the string @var{skipset}. The order |
2cc4b9cc | 2027 | of the bytes in @var{skipset} is not important. |
28f540f4 RM |
2028 | |
2029 | For example, | |
2030 | @smallexample | |
2031 | strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz") | |
2032 | @result{} 5 | |
2033 | @end smallexample | |
8a2f1f5b | 2034 | |
2cc4b9cc PE |
2035 | In a multibyte string, characters consisting of |
2036 | more than one byte are not treated as single entities. Each byte is treated | |
8a2f1f5b UD |
2037 | separately. The function is not locale-dependent. |
2038 | @end deftypefun | |
2039 | ||
8a2f1f5b | 2040 | @deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset}) |
d08a7e4c | 2041 | @standards{ISO, wchar.h} |
11087373 | 2042 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
2043 | The @code{wcsspn} (``wide character string span'') function returns the |
2044 | length of the initial substring of @var{wstring} that consists entirely | |
2045 | of wide characters that are members of the set specified by the string | |
2046 | @var{skipset}. The order of the wide characters in @var{skipset} is not | |
2047 | important. | |
28f540f4 RM |
2048 | @end deftypefun |
2049 | ||
28f540f4 | 2050 | @deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset}) |
d08a7e4c | 2051 | @standards{ISO, string.h} |
11087373 | 2052 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 2053 | The @code{strcspn} (``string complement span'') function returns the length |
2cc4b9cc | 2054 | of the initial substring of @var{string} that consists entirely of bytes |
28f540f4 | 2055 | that are @emph{not} members of the set specified by the string @var{stopset}. |
2cc4b9cc | 2056 | (In other words, it returns the offset of the first byte in @var{string} |
28f540f4 RM |
2057 | that is a member of the set @var{stopset}.) |
2058 | ||
2059 | For example, | |
2060 | @smallexample | |
2061 | strcspn ("hello, world", " \t\n,.;!?") | |
2062 | @result{} 5 | |
2063 | @end smallexample | |
8a2f1f5b | 2064 | |
2cc4b9cc PE |
2065 | In a multibyte string, characters consisting of |
2066 | more than one byte are not treated as a single entities. Each byte is treated | |
8a2f1f5b UD |
2067 | separately. The function is not locale-dependent. |
2068 | @end deftypefun | |
2069 | ||
8a2f1f5b | 2070 | @deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset}) |
d08a7e4c | 2071 | @standards{ISO, wchar.h} |
11087373 | 2072 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
2073 | The @code{wcscspn} (``wide character string complement span'') function |
2074 | returns the length of the initial substring of @var{wstring} that | |
2075 | consists entirely of wide characters that are @emph{not} members of the | |
2076 | set specified by the string @var{stopset}. (In other words, it returns | |
2cc4b9cc | 2077 | the offset of the first wide character in @var{string} that is a member of |
8a2f1f5b | 2078 | the set @var{stopset}.) |
28f540f4 RM |
2079 | @end deftypefun |
2080 | ||
28f540f4 | 2081 | @deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset}) |
d08a7e4c | 2082 | @standards{ISO, string.h} |
11087373 | 2083 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 2084 | The @code{strpbrk} (``string pointer break'') function is related to |
2cc4b9cc | 2085 | @code{strcspn}, except that it returns a pointer to the first byte |
28f540f4 RM |
2086 | in @var{string} that is a member of the set @var{stopset} instead of the |
2087 | length of the initial substring. It returns a null pointer if no such | |
2cc4b9cc | 2088 | byte from @var{stopset} is found. |
28f540f4 RM |
2089 | |
2090 | @c @group Invalid outside the example. | |
2091 | For example, | |
2092 | ||
2093 | @smallexample | |
2094 | strpbrk ("hello, world", " \t\n,.;!?") | |
2095 | @result{} ", world" | |
2096 | @end smallexample | |
2097 | @c @end group | |
8a2f1f5b | 2098 | |
2cc4b9cc PE |
2099 | In a multibyte string, characters consisting of |
2100 | more than one byte are not treated as single entities. Each byte is treated | |
8a2f1f5b UD |
2101 | separately. The function is not locale-dependent. |
2102 | @end deftypefun | |
2103 | ||
8a2f1f5b | 2104 | @deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset}) |
d08a7e4c | 2105 | @standards{ISO, wchar.h} |
11087373 | 2106 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
2107 | The @code{wcspbrk} (``wide character string pointer break'') function is |
2108 | related to @code{wcscspn}, except that it returns a pointer to the first | |
2109 | wide character in @var{wstring} that is a member of the set | |
2110 | @var{stopset} instead of the length of the initial substring. It | |
2cc4b9cc | 2111 | returns a null pointer if no such wide character from @var{stopset} is found. |
28f540f4 RM |
2112 | @end deftypefun |
2113 | ||
0e4ee106 UD |
2114 | |
2115 | @subsection Compatibility String Search Functions | |
2116 | ||
0e4ee106 | 2117 | @deftypefun {char *} index (const char *@var{string}, int @var{c}) |
d08a7e4c | 2118 | @standards{BSD, string.h} |
11087373 | 2119 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
0e4ee106 UD |
2120 | @code{index} is another name for @code{strchr}; they are exactly the same. |
2121 | New code should always use @code{strchr} since this name is defined in | |
2122 | @w{ISO C} while @code{index} is a BSD invention which never was available | |
2123 | on @w{System V} derived systems. | |
2124 | @end deftypefun | |
2125 | ||
0e4ee106 | 2126 | @deftypefun {char *} rindex (const char *@var{string}, int @var{c}) |
d08a7e4c | 2127 | @standards{BSD, string.h} |
11087373 | 2128 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
0e4ee106 UD |
2129 | @code{rindex} is another name for @code{strrchr}; they are exactly the same. |
2130 | New code should always use @code{strrchr} since this name is defined in | |
2131 | @w{ISO C} while @code{rindex} is a BSD invention which never was available | |
2132 | on @w{System V} derived systems. | |
2133 | @end deftypefun | |
2134 | ||
b4012b75 | 2135 | @node Finding Tokens in a String |
28f540f4 RM |
2136 | @section Finding Tokens in a String |
2137 | ||
28f540f4 RM |
2138 | @cindex tokenizing strings |
2139 | @cindex breaking a string into tokens | |
2140 | @cindex parsing tokens from a string | |
2141 | It's fairly common for programs to have a need to do some simple kinds | |
2142 | of lexical analysis and parsing, such as splitting a command string up | |
2143 | into tokens. You can do this with the @code{strtok} function, declared | |
2144 | in the header file @file{string.h}. | |
2145 | @pindex string.h | |
2146 | ||
8a2f1f5b | 2147 | @deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters}) |
d08a7e4c | 2148 | @standards{ISO, string.h} |
11087373 | 2149 | @safety{@prelim{}@mtunsafe{@mtasurace{:strtok}}@asunsafe{}@acsafe{}} |
28f540f4 RM |
2150 | A string can be split into tokens by making a series of calls to the |
2151 | function @code{strtok}. | |
2152 | ||
2153 | The string to be split up is passed as the @var{newstring} argument on | |
2154 | the first call only. The @code{strtok} function uses this to set up | |
2155 | some internal state information. Subsequent calls to get additional | |
2156 | tokens from the same string are indicated by passing a null pointer as | |
2157 | the @var{newstring} argument. Calling @code{strtok} with another | |
2158 | non-null @var{newstring} argument reinitializes the state information. | |
2159 | It is guaranteed that no other library function ever calls @code{strtok} | |
2160 | behind your back (which would mess up this internal state information). | |
2161 | ||
2162 | The @var{delimiters} argument is a string that specifies a set of delimiters | |
2cc4b9cc PE |
2163 | that may surround the token being extracted. All the initial bytes |
2164 | that are members of this set are discarded. The first byte that is | |
28f540f4 RM |
2165 | @emph{not} a member of this set of delimiters marks the beginning of the |
2166 | next token. The end of the token is found by looking for the next | |
2cc4b9cc PE |
2167 | byte that is a member of the delimiter set. This byte in the |
2168 | original string @var{newstring} is overwritten by a null byte, and the | |
28f540f4 RM |
2169 | pointer to the beginning of the token in @var{newstring} is returned. |
2170 | ||
2171 | On the next call to @code{strtok}, the searching begins at the next | |
2cc4b9cc | 2172 | byte beyond the one that marked the end of the previous token. |
28f540f4 RM |
2173 | Note that the set of delimiters @var{delimiters} do not have to be the |
2174 | same on every call in a series of calls to @code{strtok}. | |
2175 | ||
2176 | If the end of the string @var{newstring} is reached, or if the remainder of | |
2cc4b9cc | 2177 | string consists only of delimiter bytes, @code{strtok} returns |
28f540f4 | 2178 | a null pointer. |
8a2f1f5b | 2179 | |
2cc4b9cc PE |
2180 | In a multibyte string, characters consisting of |
2181 | more than one byte are not treated as single entities. Each byte is treated | |
8a2f1f5b UD |
2182 | separately. The function is not locale-dependent. |
2183 | @end deftypefun | |
2184 | ||
1acd4371 | 2185 | @deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const wchar_t *@var{delimiters}, wchar_t **@var{save_ptr}) |
d08a7e4c | 2186 | @standards{ISO, wchar.h} |
11087373 | 2187 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
8a2f1f5b UD |
2188 | A string can be split into tokens by making a series of calls to the |
2189 | function @code{wcstok}. | |
2190 | ||
2191 | The string to be split up is passed as the @var{newstring} argument on | |
2192 | the first call only. The @code{wcstok} function uses this to set up | |
2193 | some internal state information. Subsequent calls to get additional | |
2cc4b9cc | 2194 | tokens from the same wide string are indicated by passing a |
1acd4371 AO |
2195 | null pointer as the @var{newstring} argument, which causes the pointer |
2196 | previously stored in @var{save_ptr} to be used instead. | |
8a2f1f5b | 2197 | |
2cc4b9cc | 2198 | The @var{delimiters} argument is a wide string that specifies |
8a2f1f5b UD |
2199 | a set of delimiters that may surround the token being extracted. All |
2200 | the initial wide characters that are members of this set are discarded. | |
2201 | The first wide character that is @emph{not} a member of this set of | |
2202 | delimiters marks the beginning of the next token. The end of the token | |
2203 | is found by looking for the next wide character that is a member of the | |
2cc4b9cc | 2204 | delimiter set. This wide character in the original wide |
1acd4371 AO |
2205 | string @var{newstring} is overwritten by a null wide character, the |
2206 | pointer past the overwritten wide character is saved in @var{save_ptr}, | |
2207 | and the pointer to the beginning of the token in @var{newstring} is | |
2208 | returned. | |
8a2f1f5b UD |
2209 | |
2210 | On the next call to @code{wcstok}, the searching begins at the next | |
2211 | wide character beyond the one that marked the end of the previous token. | |
2212 | Note that the set of delimiters @var{delimiters} do not have to be the | |
2213 | same on every call in a series of calls to @code{wcstok}. | |
2214 | ||
2cc4b9cc | 2215 | If the end of the wide string @var{newstring} is reached, or |
8a2f1f5b UD |
2216 | if the remainder of string consists only of delimiter wide characters, |
2217 | @code{wcstok} returns a null pointer. | |
28f540f4 RM |
2218 | @end deftypefun |
2219 | ||
8a2f1f5b UD |
2220 | @strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string |
2221 | they is parsing, you should always copy the string to a temporary buffer | |
0a13c9e9 PE |
2222 | before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying Strings |
2223 | and Arrays}). If you allow @code{strtok} or @code{wcstok} to modify | |
8a2f1f5b UD |
2224 | a string that came from another part of your program, you are asking for |
2225 | trouble; that string might be used for other purposes after | |
2226 | @code{strtok} or @code{wcstok} has modified it, and it would not have | |
2227 | the expected value. | |
28f540f4 RM |
2228 | |
2229 | The string that you are operating on might even be a constant. Then | |
8a2f1f5b UD |
2230 | when @code{strtok} or @code{wcstok} tries to modify it, your program |
2231 | will get a fatal signal for writing in read-only memory. @xref{Program | |
2232 | Error Signals}. Even if the operation of @code{strtok} or @code{wcstok} | |
2233 | would not require a modification of the string (e.g., if there is | |
1f77f049 | 2234 | exactly one token) the string can (and in the @glibcadj{} case will) be |
8a2f1f5b | 2235 | modified. |
28f540f4 RM |
2236 | |
2237 | This is a special case of a general principle: if a part of a program | |
2238 | does not have as its purpose the modification of a certain data | |
2239 | structure, then it is error-prone to modify the data structure | |
2240 | temporarily. | |
2241 | ||
1acd4371 | 2242 | The function @code{strtok} is not reentrant, whereas @code{wcstok} is. |
8a2f1f5b UD |
2243 | @xref{Nonreentrancy}, for a discussion of where and why reentrancy is |
2244 | important. | |
28f540f4 RM |
2245 | |
2246 | Here is a simple example showing the use of @code{strtok}. | |
2247 | ||
2248 | @comment Yes, this example has been tested. | |
2249 | @smallexample | |
2250 | #include <string.h> | |
2251 | #include <stddef.h> | |
2252 | ||
2253 | @dots{} | |
2254 | ||
5649a1d6 | 2255 | const char string[] = "words separated by spaces -- and, punctuation!"; |
28f540f4 | 2256 | const char delimiters[] = " .,;:!-"; |
5649a1d6 | 2257 | char *token, *cp; |
28f540f4 RM |
2258 | |
2259 | @dots{} | |
2260 | ||
5649a1d6 UD |
2261 | cp = strdupa (string); /* Make writable copy. */ |
2262 | token = strtok (cp, delimiters); /* token => "words" */ | |
28f540f4 RM |
2263 | token = strtok (NULL, delimiters); /* token => "separated" */ |
2264 | token = strtok (NULL, delimiters); /* token => "by" */ | |
2265 | token = strtok (NULL, delimiters); /* token => "spaces" */ | |
2266 | token = strtok (NULL, delimiters); /* token => "and" */ | |
2267 | token = strtok (NULL, delimiters); /* token => "punctuation" */ | |
2268 | token = strtok (NULL, delimiters); /* token => NULL */ | |
2269 | @end smallexample | |
a5113b14 | 2270 | |
1f77f049 | 2271 | @Theglibc{} contains two more functions for tokenizing a string |
2cc4b9cc PE |
2272 | which overcome the limitation of non-reentrancy. They are not |
2273 | available available for wide strings. | |
a5113b14 | 2274 | |
a5113b14 | 2275 | @deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr}) |
d08a7e4c | 2276 | @standards{POSIX, string.h} |
11087373 | 2277 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
dd7d45e8 UD |
2278 | Just like @code{strtok}, this function splits the string into several |
2279 | tokens which can be accessed by successive calls to @code{strtok_r}. | |
1acd4371 AO |
2280 | The difference is that, as in @code{wcstok}, the information about the |
2281 | next token is stored in the space pointed to by the third argument, | |
2282 | @var{save_ptr}, which is a pointer to a string pointer. Calling | |
2283 | @code{strtok_r} with a null pointer for @var{newstring} and leaving | |
2284 | @var{save_ptr} between the calls unchanged does the job without | |
2285 | hindering reentrancy. | |
a5113b14 | 2286 | |
976780fd | 2287 | This function is defined in POSIX.1 and can be found on many systems |
a5113b14 UD |
2288 | which support multi-threading. |
2289 | @end deftypefun | |
2290 | ||
a5113b14 | 2291 | @deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter}) |
d08a7e4c | 2292 | @standards{BSD, string.h} |
11087373 | 2293 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
0050ad5f UD |
2294 | This function has a similar functionality as @code{strtok_r} with the |
2295 | @var{newstring} argument replaced by the @var{save_ptr} argument. The | |
2296 | initialization of the moving pointer has to be done by the user. | |
2297 | Successive calls to @code{strsep} move the pointer along the tokens | |
2298 | separated by @var{delimiter}, returning the address of the next token | |
2299 | and updating @var{string_ptr} to point to the beginning of the next | |
2300 | token. | |
2301 | ||
2302 | One difference between @code{strsep} and @code{strtok_r} is that if the | |
2cc4b9cc PE |
2303 | input string contains more than one byte from @var{delimiter} in a |
2304 | row @code{strsep} returns an empty string for each pair of bytes | |
0050ad5f UD |
2305 | from @var{delimiter}. This means that a program normally should test |
2306 | for @code{strsep} returning an empty string before processing it. | |
9afc8a59 | 2307 | |
a5113b14 UD |
2308 | This function was introduced in 4.3BSD and therefore is widely available. |
2309 | @end deftypefun | |
2310 | ||
2311 | Here is how the above example looks like when @code{strsep} is used. | |
2312 | ||
2313 | @comment Yes, this example has been tested. | |
2314 | @smallexample | |
2315 | #include <string.h> | |
2316 | #include <stddef.h> | |
2317 | ||
2318 | @dots{} | |
2319 | ||
5649a1d6 | 2320 | const char string[] = "words separated by spaces -- and, punctuation!"; |
a5113b14 UD |
2321 | const char delimiters[] = " .,;:!-"; |
2322 | char *running; | |
2323 | char *token; | |
2324 | ||
2325 | @dots{} | |
2326 | ||
5649a1d6 | 2327 | running = strdupa (string); |
a5113b14 UD |
2328 | token = strsep (&running, delimiters); /* token => "words" */ |
2329 | token = strsep (&running, delimiters); /* token => "separated" */ | |
2330 | token = strsep (&running, delimiters); /* token => "by" */ | |
2331 | token = strsep (&running, delimiters); /* token => "spaces" */ | |
9afc8a59 UD |
2332 | token = strsep (&running, delimiters); /* token => "" */ |
2333 | token = strsep (&running, delimiters); /* token => "" */ | |
2334 | token = strsep (&running, delimiters); /* token => "" */ | |
a5113b14 | 2335 | token = strsep (&running, delimiters); /* token => "and" */ |
9afc8a59 | 2336 | token = strsep (&running, delimiters); /* token => "" */ |
a5113b14 | 2337 | token = strsep (&running, delimiters); /* token => "punctuation" */ |
9afc8a59 | 2338 | token = strsep (&running, delimiters); /* token => "" */ |
a5113b14 UD |
2339 | token = strsep (&running, delimiters); /* token => NULL */ |
2340 | @end smallexample | |
b4012b75 | 2341 | |
ec28fc7c | 2342 | @deftypefun {char *} basename (const char *@var{filename}) |
d08a7e4c | 2343 | @standards{GNU, string.h} |
11087373 | 2344 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
ec28fc7c | 2345 | The GNU version of the @code{basename} function returns the last |
9442cd75 | 2346 | component of the path in @var{filename}. This function is the preferred |
ec28fc7c UD |
2347 | usage, since it does not modify the argument, @var{filename}, and |
2348 | respects trailing slashes. The prototype for @code{basename} can be | |
ef48b196 | 2349 | found in @file{string.h}. Note, this function is overridden by the XPG |
ec28fc7c UD |
2350 | version, if @file{libgen.h} is included. |
2351 | ||
2352 | Example of using GNU @code{basename}: | |
2353 | ||
2354 | @smallexample | |
2355 | #include <string.h> | |
2356 | ||
2357 | int | |
2358 | main (int argc, char *argv[]) | |
2359 | @{ | |
2360 | char *prog = basename (argv[0]); | |
2361 | ||
2362 | if (argc < 2) | |
2363 | @{ | |
2364 | fprintf (stderr, "Usage %s <arg>\n", prog); | |
2365 | exit (1); | |
2366 | @} | |
2367 | ||
2368 | @dots{} | |
2369 | @} | |
2370 | @end smallexample | |
2371 | ||
2372 | @strong{Portability Note:} This function may produce different results | |
2373 | on different systems. | |
2374 | ||
2375 | @end deftypefun | |
2376 | ||
af85ebcd | 2377 | @deftypefun {char *} basename (char *@var{path}) |
d08a7e4c | 2378 | @standards{XPG, libgen.h} |
11087373 | 2379 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
cf822e3c | 2380 | This is the standard XPG defined @code{basename}. It is similar in |
ec28fc7c | 2381 | spirit to the GNU version, but may modify the @var{path} by removing |
2cc4b9cc PE |
2382 | trailing '/' bytes. If the @var{path} is made up entirely of '/' |
2383 | bytes, then "/" will be returned. Also, if @var{path} is | |
ec28fc7c | 2384 | @code{NULL} or an empty string, then "." is returned. The prototype for |
e4a5f77d | 2385 | the XPG version can be found in @file{libgen.h}. |
ec28fc7c UD |
2386 | |
2387 | Example of using XPG @code{basename}: | |
2388 | ||
2389 | @smallexample | |
2390 | #include <libgen.h> | |
2391 | ||
2392 | int | |
2393 | main (int argc, char *argv[]) | |
2394 | @{ | |
2395 | char *prog; | |
2396 | char *path = strdupa (argv[0]); | |
2397 | ||
2398 | prog = basename (path); | |
2399 | ||
2400 | if (argc < 2) | |
2401 | @{ | |
2402 | fprintf (stderr, "Usage %s <arg>\n", prog); | |
2403 | exit (1); | |
2404 | @} | |
2405 | ||
2406 | @dots{} | |
2407 | ||
2408 | @} | |
2409 | @end smallexample | |
2410 | @end deftypefun | |
2411 | ||
ec28fc7c | 2412 | @deftypefun {char *} dirname (char *@var{path}) |
d08a7e4c | 2413 | @standards{XPG, libgen.h} |
11087373 | 2414 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
ec28fc7c UD |
2415 | The @code{dirname} function is the compliment to the XPG version of |
2416 | @code{basename}. It returns the parent directory of the file specified | |
2417 | by @var{path}. If @var{path} is @code{NULL}, an empty string, or | |
2cc4b9cc | 2418 | contains no '/' bytes, then "." is returned. The prototype for this |
ec28fc7c UD |
2419 | function can be found in @file{libgen.h}. |
2420 | @end deftypefun | |
0e4ee106 | 2421 | |
ea1bd74d ZW |
2422 | @node Erasing Sensitive Data |
2423 | @section Erasing Sensitive Data | |
2424 | ||
2425 | Sensitive data, such as cryptographic keys, should be erased from | |
2426 | memory after use, to reduce the risk that a bug will expose it to the | |
2427 | outside world. However, compiler optimizations may determine that an | |
2428 | erasure operation is ``unnecessary,'' and remove it from the generated | |
2429 | code, because no @emph{correct} program could access the variable or | |
2430 | heap object containing the sensitive data after it's deallocated. | |
2431 | Since erasure is a precaution against bugs, this optimization is | |
2432 | inappropriate. | |
2433 | ||
2434 | The function @code{explicit_bzero} erases a block of memory, and | |
2435 | guarantees that the compiler will not remove the erasure as | |
2436 | ``unnecessary.'' | |
2437 | ||
2438 | @smallexample | |
2439 | @group | |
2440 | #include <string.h> | |
2441 | ||
2442 | extern void encrypt (const char *key, const char *in, | |
2443 | char *out, size_t n); | |
2444 | extern void genkey (const char *phrase, char *key); | |
2445 | ||
2446 | void encrypt_with_phrase (const char *phrase, const char *in, | |
2447 | char *out, size_t n) | |
2448 | @{ | |
2449 | char key[16]; | |
2450 | genkey (phrase, key); | |
2451 | encrypt (key, in, out, n); | |
2452 | explicit_bzero (key, 16); | |
2453 | @} | |
2454 | @end group | |
2455 | @end smallexample | |
2456 | ||
2457 | @noindent | |
2458 | In this example, if @code{memset}, @code{bzero}, or a hand-written | |
2459 | loop had been used, the compiler might remove them as ``unnecessary.'' | |
2460 | ||
2461 | @strong{Warning:} @code{explicit_bzero} does not guarantee that | |
2462 | sensitive data is @emph{completely} erased from the computer's memory. | |
2463 | There may be copies in temporary storage areas, such as registers and | |
2464 | ``scratch'' stack space; since these are invisible to the source code, | |
2465 | a library function cannot erase them. | |
2466 | ||
2467 | Also, @code{explicit_bzero} only operates on RAM. If a sensitive data | |
2468 | object never needs to have its address taken other than to call | |
2469 | @code{explicit_bzero}, it might be stored entirely in CPU registers | |
2470 | @emph{until} the call to @code{explicit_bzero}. Then it will be | |
2471 | copied into RAM, the copy will be erased, and the original will remain | |
2472 | intact. Data in RAM is more likely to be exposed by a bug than data | |
2473 | in registers, so this creates a brief window where the data is at | |
2474 | greater risk of exposure than it would have been if the program didn't | |
2475 | try to erase it at all. | |
2476 | ||
2477 | Declaring sensitive variables as @code{volatile} will make both the | |
2478 | above problems @emph{worse}; a @code{volatile} variable will be stored | |
2479 | in memory for its entire lifetime, and the compiler will make | |
2480 | @emph{more} copies of it than it would otherwise have. Attempting to | |
2481 | erase a normal variable ``by hand'' through a | |
2482 | @code{volatile}-qualified pointer doesn't work at all---because the | |
2483 | variable itself is not @code{volatile}, some compilers will ignore the | |
2484 | qualification on the pointer and remove the erasure anyway. | |
2485 | ||
2486 | Having said all that, in most situations, using @code{explicit_bzero} | |
2487 | is better than not using it. At present, the only way to do a more | |
2488 | thorough job is to write the entire sensitive operation in assembly | |
2489 | language. We anticipate that future compilers will recognize calls to | |
2490 | @code{explicit_bzero} and take appropriate steps to erase all the | |
8394b8c4 | 2491 | copies of the affected data, wherever they may be. |
ea1bd74d | 2492 | |
ea1bd74d | 2493 | @deftypefun void explicit_bzero (void *@var{block}, size_t @var{len}) |
d08a7e4c | 2494 | @standards{BSD, string.h} |
ea1bd74d ZW |
2495 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
2496 | ||
2497 | @code{explicit_bzero} writes zero into @var{len} bytes of memory | |
2498 | beginning at @var{block}, just as @code{bzero} would. The zeroes are | |
2499 | always written, even if the compiler could determine that this is | |
2500 | ``unnecessary'' because no correct program could read them back. | |
2501 | ||
2502 | @strong{Note:} The @emph{only} optimization that @code{explicit_bzero} | |
2503 | disables is removal of ``unnecessary'' writes to memory. The compiler | |
2504 | can perform all the other optimizations that it could for a call to | |
2505 | @code{memset}. For instance, it may replace the function call with | |
2506 | inline memory writes, and it may assume that @var{block} cannot be a | |
2507 | null pointer. | |
2508 | ||
2509 | @strong{Portability Note:} This function first appeared in OpenBSD 5.5 | |
2510 | and has not been standardized. Other systems may provide the same | |
2511 | functionality under a different name, such as @code{explicit_memset}, | |
2512 | @code{memset_s}, or @code{SecureZeroMemory}. | |
2513 | ||
2514 | @Theglibc{} declares this function in @file{string.h}, but on other | |
2515 | systems it may be in @file{strings.h} instead. | |
2516 | @end deftypefun | |
2517 | ||
b10a0acc ZW |
2518 | |
2519 | @node Shuffling Bytes | |
2520 | @section Shuffling Bytes | |
0e4ee106 UD |
2521 | |
2522 | The function below addresses the perennial programming quandary: ``How do | |
2523 | I take good data in string form and painlessly turn it into garbage?'' | |
b10a0acc ZW |
2524 | This is not a difficult thing to code for oneself, but the authors of |
2525 | @theglibc{} wish to make it as convenient as possible. | |
0e4ee106 | 2526 | |
b10a0acc ZW |
2527 | To @emph{erase} data, use @code{explicit_bzero} (@pxref{Erasing |
2528 | Sensitive Data}); to obfuscate it reversibly, use @code{memfrob} | |
2529 | (@pxref{Obfuscating Data}). | |
0e4ee106 | 2530 | |
ec28fc7c | 2531 | @deftypefun {char *} strfry (char *@var{string}) |
d08a7e4c | 2532 | @standards{GNU, string.h} |
11087373 AO |
2533 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
2534 | @c Calls initstate_r, time, getpid, strlen, and random_r. | |
0e4ee106 | 2535 | |
b10a0acc ZW |
2536 | @code{strfry} performs an in-place shuffle on @var{string}. Each |
2537 | character is swapped to a position selected at random, within the | |
2538 | portion of the string starting with the character's original position. | |
2539 | (This is the Fisher-Yates algorithm for unbiased shuffling.) | |
2540 | ||
2541 | Calling @code{strfry} will not disturb any of the random number | |
2542 | generators that have global state (@pxref{Pseudo-Random Numbers}). | |
0e4ee106 UD |
2543 | |
2544 | The return value of @code{strfry} is always @var{string}. | |
2545 | ||
1f77f049 | 2546 | @strong{Portability Note:} This function is unique to @theglibc{}. |
b10a0acc | 2547 | It is declared in @file{string.h}. |
0e4ee106 UD |
2548 | @end deftypefun |
2549 | ||
2550 | ||
b10a0acc ZW |
2551 | @node Obfuscating Data |
2552 | @section Obfuscating Data | |
0e4ee106 UD |
2553 | @cindex Rot13 |
2554 | ||
b10a0acc ZW |
2555 | The @code{memfrob} function reversibly obfuscates an array of binary |
2556 | data. This is not true encryption; the obfuscated data still bears a | |
2557 | clear relationship to the original, and no secret key is required to | |
2558 | undo the obfuscation. It is analogous to the ``Rot13'' cipher used on | |
2559 | Usenet for obscuring offensive jokes, spoilers for works of fiction, | |
2560 | and so on, but it can be applied to arbitrary binary data. | |
0e4ee106 | 2561 | |
b10a0acc ZW |
2562 | Programs that need true encryption---a transformation that completely |
2563 | obscures the original and cannot be reversed without knowledge of a | |
2564 | secret key---should use a dedicated cryptography library, such as | |
2565 | @uref{https://www.gnu.org/software/libgcrypt/,,libgcrypt}. | |
2566 | ||
2567 | Programs that need to @emph{destroy} data should use | |
2568 | @code{explicit_bzero} (@pxref{Erasing Sensitive Data}), or possibly | |
2569 | @code{strfry} (@pxref{Shuffling Bytes}). | |
0e4ee106 | 2570 | |
0e4ee106 | 2571 | @deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length}) |
d08a7e4c | 2572 | @standards{GNU, string.h} |
11087373 | 2573 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
0e4ee106 | 2574 | |
b10a0acc ZW |
2575 | The function @code{memfrob} obfuscates @var{length} bytes of data |
2576 | beginning at @var{mem}, in place. Each byte is bitwise xor-ed with | |
2577 | the binary pattern 00101010 (hexadecimal 0x2A). The return value is | |
2578 | always @var{mem}. | |
0e4ee106 | 2579 | |
b10a0acc ZW |
2580 | @code{memfrob} a second time on the same data returns it to |
2581 | its original state. | |
0e4ee106 | 2582 | |
1f77f049 | 2583 | @strong{Portability Note:} This function is unique to @theglibc{}. |
b10a0acc | 2584 | It is declared in @file{string.h}. |
0e4ee106 UD |
2585 | @end deftypefun |
2586 | ||
b4012b75 UD |
2587 | @node Encode Binary Data |
2588 | @section Encode Binary Data | |
2589 | ||
2590 | To store or transfer binary data in environments which only support text | |
2591 | one has to encode the binary data by mapping the input bytes to | |
2cc4b9cc | 2592 | bytes in the range allowed for storing or transferring. SVID |
dd7d45e8 UD |
2593 | systems (and nowadays XPG compliant systems) provide minimal support for |
2594 | this task. | |
b4012b75 | 2595 | |
b4012b75 | 2596 | @deftypefun {char *} l64a (long int @var{n}) |
d08a7e4c | 2597 | @standards{XPG, stdlib.h} |
11087373 | 2598 | @safety{@prelim{}@mtunsafe{@mtasurace{:l64a}}@asunsafe{}@acsafe{}} |
2cc4b9cc PE |
2599 | This function encodes a 32-bit input value using bytes from the |
2600 | basic character set. It returns a pointer to a 7 byte buffer which | |
dd7d45e8 UD |
2601 | contains an encoded version of @var{n}. To encode a series of bytes the |
2602 | user must copy the returned string to a destination buffer. It returns | |
2603 | the empty string if @var{n} is zero, which is somewhat bizarre but | |
2604 | mandated by the standard.@* | |
2605 | @strong{Warning:} Since a static buffer is used this function should not | |
5649a1d6 | 2606 | be used in multi-threaded programs. There is no thread-safe alternative |
dd7d45e8 UD |
2607 | to this function in the C library.@* |
2608 | @strong{Compatibility Note:} The XPG standard states that the return | |
2609 | value of @code{l64a} is undefined if @var{n} is negative. In the GNU | |
2610 | implementation, @code{l64a} treats its argument as unsigned, so it will | |
2611 | return a sensible encoding for any nonzero @var{n}; however, portable | |
2612 | programs should not rely on this. | |
b4012b75 | 2613 | |
dd7d45e8 UD |
2614 | To encode a large buffer @code{l64a} must be called in a loop, once for |
2615 | each 32-bit word of the buffer. For example, one could do something | |
2616 | like this: | |
5649a1d6 UD |
2617 | |
2618 | @smallexample | |
2619 | char * | |
2620 | encode (const void *buf, size_t len) | |
2621 | @{ | |
2622 | /* @r{We know in advance how long the buffer has to be.} */ | |
2623 | unsigned char *in = (unsigned char *) buf; | |
2624 | char *out = malloc (6 + ((len + 3) / 4) * 6 + 1); | |
290639c3 | 2625 | char *cp = out, *p; |
5649a1d6 UD |
2626 | |
2627 | /* @r{Encode the length.} */ | |
dd7d45e8 | 2628 | /* @r{Using `htonl' is necessary so that the data can be} |
290639c3 UD |
2629 | @r{decoded even on machines with different byte order.} |
2630 | @r{`l64a' can return a string shorter than 6 bytes, so } | |
2631 | @r{we pad it with encoding of 0 (}'.'@r{) at the end by } | |
2632 | @r{hand.} */ | |
dd7d45e8 | 2633 | |
290639c3 UD |
2634 | p = stpcpy (cp, l64a (htonl (len))); |
2635 | cp = mempcpy (p, "......", 6 - (p - cp)); | |
5649a1d6 UD |
2636 | |
2637 | while (len > 3) | |
2638 | @{ | |
2639 | unsigned long int n = *in++; | |
2640 | n = (n << 8) | *in++; | |
2641 | n = (n << 8) | *in++; | |
2642 | n = (n << 8) | *in++; | |
2643 | len -= 4; | |
290639c3 UD |
2644 | p = stpcpy (cp, l64a (htonl (n))); |
2645 | cp = mempcpy (p, "......", 6 - (p - cp)); | |
5649a1d6 UD |
2646 | @} |
2647 | if (len > 0) | |
2648 | @{ | |
2649 | unsigned long int n = *in++; | |
2650 | if (--len > 0) | |
2651 | @{ | |
2652 | n = (n << 8) | *in++; | |
2653 | if (--len > 0) | |
2654 | n = (n << 8) | *in; | |
2655 | @} | |
290639c3 | 2656 | cp = stpcpy (cp, l64a (htonl (n))); |
5649a1d6 UD |
2657 | @} |
2658 | *cp = '\0'; | |
2659 | return out; | |
2660 | @} | |
2661 | @end smallexample | |
2662 | ||
2663 | It is strange that the library does not provide the complete | |
dd7d45e8 UD |
2664 | functionality needed but so be it. |
2665 | ||
2666 | @end deftypefun | |
5649a1d6 | 2667 | |
b4012b75 UD |
2668 | To decode data produced with @code{l64a} the following function should be |
2669 | used. | |
2670 | ||
2671 | @deftypefun {long int} a64l (const char *@var{string}) | |
d08a7e4c | 2672 | @standards{XPG, stdlib.h} |
11087373 | 2673 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b4012b75 | 2674 | The parameter @var{string} should contain a string which was produced by |
2cc4b9cc PE |
2675 | a call to @code{l64a}. The function processes at least 6 bytes of |
2676 | this string, and decodes the bytes it finds according to the table | |
2677 | below. It stops decoding when it finds a byte not in the table, | |
dd7d45e8 | 2678 | rather like @code{atoi}; if you have a buffer which has been broken into |
2cc4b9cc | 2679 | lines, you must be careful to skip over the end-of-line bytes. |
dd7d45e8 UD |
2680 | |
2681 | The decoded number is returned as a @code{long int} value. | |
b4012b75 | 2682 | @end deftypefun |
b13927da | 2683 | |
dd7d45e8 | 2684 | The @code{l64a} and @code{a64l} functions use a base 64 encoding, in |
2cc4b9cc | 2685 | which each byte of an encoded string represents six bits of an |
dd7d45e8 UD |
2686 | input word. These symbols are used for the base 64 digits: |
2687 | ||
2688 | @multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} | |
2689 | @item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7 | |
2690 | @item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1} | |
2691 | @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5} | |
2692 | @item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9} | |
2693 | @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D} | |
2694 | @item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H} | |
2695 | @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L} | |
2696 | @item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P} | |
2697 | @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T} | |
2698 | @item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X} | |
2699 | @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b} | |
2700 | @item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f} | |
2701 | @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j} | |
2702 | @item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n} | |
2703 | @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r} | |
2704 | @item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v} | |
2705 | @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z} | |
2706 | @end multitable | |
2707 | ||
2708 | This encoding scheme is not standard. There are some other encoding | |
2709 | methods which are much more widely used (UU encoding, MIME encoding). | |
2710 | Generally, it is better to use one of these encodings. | |
2711 | ||
b13927da UD |
2712 | @node Argz and Envz Vectors |
2713 | @section Argz and Envz Vectors | |
2714 | ||
5649a1d6 | 2715 | @cindex argz vectors (string vectors) |
2cc4b9cc PE |
2716 | @cindex string vectors, null-byte separated |
2717 | @cindex argument vectors, null-byte separated | |
b13927da | 2718 | @dfn{argz vectors} are vectors of strings in a contiguous block of |
2cc4b9cc | 2719 | memory, each element separated from its neighbors by null bytes |
b13927da UD |
2720 | (@code{'\0'}). |
2721 | ||
5649a1d6 | 2722 | @cindex envz vectors (environment vectors) |
2cc4b9cc | 2723 | @cindex environment vectors, null-byte separated |
b13927da | 2724 | @dfn{Envz vectors} are an extension of argz vectors where each element is a |
2cc4b9cc | 2725 | name-value pair, separated by a @code{'='} byte (as in a Unix |
b13927da UD |
2726 | environment). |
2727 | ||
2728 | @menu | |
2729 | * Argz Functions:: Operations on argz vectors. | |
2730 | * Envz Functions:: Additional operations on environment vectors. | |
2731 | @end menu | |
2732 | ||
2733 | @node Argz Functions, Envz Functions, , Argz and Envz Vectors | |
2734 | @subsection Argz Functions | |
2735 | ||
2736 | Each argz vector is represented by a pointer to the first element, of | |
2737 | type @code{char *}, and a size, of type @code{size_t}, both of which can | |
2738 | be initialized to @code{0} to represent an empty argz vector. All argz | |
2739 | functions accept either a pointer and a size argument, or pointers to | |
2740 | them, if they will be modified. | |
2741 | ||
2742 | The argz functions use @code{malloc}/@code{realloc} to allocate/grow | |
f0f308c1 | 2743 | argz vectors, and so any argz vector created using these functions may |
b13927da UD |
2744 | be freed by using @code{free}; conversely, any argz function that may |
2745 | grow a string expects that string to have been allocated using | |
2746 | @code{malloc} (those argz functions that only examine their arguments or | |
2747 | modify them in place will work on any sort of memory). | |
2748 | @xref{Unconstrained Allocation}. | |
2749 | ||
2750 | All argz functions that do memory allocation have a return type of | |
2751 | @code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an | |
2752 | allocation error occurs. | |
2753 | ||
2754 | @pindex argz.h | |
2755 | These functions are declared in the standard include file @file{argz.h}. | |
2756 | ||
2757 | @deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len}) | |
d08a7e4c | 2758 | @standards{GNU, argz.h} |
11087373 | 2759 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
5649a1d6 | 2760 | The @code{argz_create} function converts the Unix-style argument vector |
b13927da UD |
2761 | @var{argv} (a vector of pointers to normal C strings, terminated by |
2762 | @code{(char *)0}; @pxref{Program Arguments}) into an argz vector with | |
2763 | the same elements, which is returned in @var{argz} and @var{argz_len}. | |
2764 | @end deftypefun | |
2765 | ||
2766 | @deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len}) | |
d08a7e4c | 2767 | @standards{GNU, argz.h} |
11087373 | 2768 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2cc4b9cc | 2769 | The @code{argz_create_sep} function converts the string |
b13927da | 2770 | @var{string} into an argz vector (returned in @var{argz} and |
49c091e5 | 2771 | @var{argz_len}) by splitting it into elements at every occurrence of the |
2cc4b9cc | 2772 | byte @var{sep}. |
b13927da UD |
2773 | @end deftypefun |
2774 | ||
f0f308c1 | 2775 | @deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{argz_len}) |
d08a7e4c | 2776 | @standards{GNU, argz.h} |
11087373 | 2777 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b13927da UD |
2778 | Returns the number of elements in the argz vector @var{argz} and |
2779 | @var{argz_len}. | |
2780 | @end deftypefun | |
2781 | ||
8ded91fb | 2782 | @deftypefun {void} argz_extract (const char *@var{argz}, size_t @var{argz_len}, char **@var{argv}) |
d08a7e4c | 2783 | @standards{GNU, argz.h} |
11087373 | 2784 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b13927da | 2785 | The @code{argz_extract} function converts the argz vector @var{argz} and |
5649a1d6 | 2786 | @var{argz_len} into a Unix-style argument vector stored in @var{argv}, |
b13927da UD |
2787 | by putting pointers to every element in @var{argz} into successive |
2788 | positions in @var{argv}, followed by a terminator of @code{0}. | |
2789 | @var{Argv} must be pre-allocated with enough space to hold all the | |
2790 | elements in @var{argz} plus the terminating @code{(char *)0} | |
2791 | (@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)} | |
2792 | bytes should be enough). Note that the string pointers stored into | |
2793 | @var{argv} point into @var{argz}---they are not copies---and so | |
2794 | @var{argz} must be copied if it will be changed while @var{argv} is | |
2795 | still active. This function is useful for passing the elements in | |
2796 | @var{argz} to an exec function (@pxref{Executing a File}). | |
2797 | @end deftypefun | |
2798 | ||
2799 | @deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep}) | |
d08a7e4c | 2800 | @standards{GNU, argz.h} |
11087373 | 2801 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b13927da | 2802 | The @code{argz_stringify} converts @var{argz} into a normal string with |
2cc4b9cc | 2803 | the elements separated by the byte @var{sep}, by replacing each |
b13927da UD |
2804 | @code{'\0'} inside @var{argz} (except the last one, which terminates the |
2805 | string) with @var{sep}. This is handy for printing @var{argz} in a | |
2806 | readable manner. | |
2807 | @end deftypefun | |
2808 | ||
2809 | @deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}) | |
d08a7e4c | 2810 | @standards{GNU, argz.h} |
11087373 AO |
2811 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2812 | @c Calls strlen and argz_append. | |
b13927da UD |
2813 | The @code{argz_add} function adds the string @var{str} to the end of the |
2814 | argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and | |
2815 | @code{*@var{argz_len}} accordingly. | |
2816 | @end deftypefun | |
2817 | ||
2818 | @deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim}) | |
d08a7e4c | 2819 | @standards{GNU, argz.h} |
11087373 | 2820 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
b13927da | 2821 | The @code{argz_add_sep} function is similar to @code{argz_add}, but |
49c091e5 | 2822 | @var{str} is split into separate elements in the result at occurrences of |
2cc4b9cc | 2823 | the byte @var{delim}. This is useful, for instance, for |
5649a1d6 | 2824 | adding the components of a Unix search path to an argz vector, by using |
b13927da UD |
2825 | a value of @code{':'} for @var{delim}. |
2826 | @end deftypefun | |
2827 | ||
2828 | @deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len}) | |
d08a7e4c | 2829 | @standards{GNU, argz.h} |
11087373 | 2830 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
b13927da UD |
2831 | The @code{argz_append} function appends @var{buf_len} bytes starting at |
2832 | @var{buf} to the argz vector @code{*@var{argz}}, reallocating | |
2833 | @code{*@var{argz}} to accommodate it, and adding @var{buf_len} to | |
2834 | @code{*@var{argz_len}}. | |
2835 | @end deftypefun | |
2836 | ||
30aa5785 | 2837 | @deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry}) |
d08a7e4c | 2838 | @standards{GNU, argz.h} |
11087373 AO |
2839 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2840 | @c Calls free if no argument is left. | |
b13927da UD |
2841 | If @var{entry} points to the beginning of one of the elements in the |
2842 | argz vector @code{*@var{argz}}, the @code{argz_delete} function will | |
2843 | remove this entry and reallocate @code{*@var{argz}}, modifying | |
2844 | @code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as | |
2845 | destructive argz functions usually reallocate their argz argument, | |
2846 | pointers into argz vectors such as @var{entry} will then become invalid. | |
2847 | @end deftypefun | |
2848 | ||
2849 | @deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry}) | |
d08a7e4c | 2850 | @standards{GNU, argz.h} |
11087373 AO |
2851 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2852 | @c Calls argz_add or realloc and memmove. | |
b13927da UD |
2853 | The @code{argz_insert} function inserts the string @var{entry} into the |
2854 | argz vector @code{*@var{argz}} at a point just before the existing | |
2855 | element pointed to by @var{before}, reallocating @code{*@var{argz}} and | |
2856 | updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before} | |
2857 | is @code{0}, @var{entry} is added to the end instead (as if by | |
2858 | @code{argz_add}). Since the first element is in fact the same as | |
2859 | @code{*@var{argz}}, passing in @code{*@var{argz}} as the value of | |
2860 | @var{before} will result in @var{entry} being inserted at the beginning. | |
2861 | @end deftypefun | |
2862 | ||
8ded91fb | 2863 | @deftypefun {char *} argz_next (const char *@var{argz}, size_t @var{argz_len}, const char *@var{entry}) |
d08a7e4c | 2864 | @standards{GNU, argz.h} |
11087373 | 2865 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b13927da UD |
2866 | The @code{argz_next} function provides a convenient way of iterating |
2867 | over the elements in the argz vector @var{argz}. It returns a pointer | |
2868 | to the next element in @var{argz} after the element @var{entry}, or | |
2869 | @code{0} if there are no elements following @var{entry}. If @var{entry} | |
2870 | is @code{0}, the first element of @var{argz} is returned. | |
2871 | ||
2872 | This behavior suggests two styles of iteration: | |
2873 | ||
2874 | @smallexample | |
2875 | char *entry = 0; | |
2876 | while ((entry = argz_next (@var{argz}, @var{argz_len}, entry))) | |
2877 | @var{action}; | |
2878 | @end smallexample | |
2879 | ||
2880 | (the double parentheses are necessary to make some C compilers shut up | |
2881 | about what they consider a questionable @code{while}-test) and: | |
2882 | ||
2883 | @smallexample | |
2884 | char *entry; | |
2885 | for (entry = @var{argz}; | |
2886 | entry; | |
2887 | entry = argz_next (@var{argz}, @var{argz_len}, entry)) | |
2888 | @var{action}; | |
2889 | @end smallexample | |
2890 | ||
2891 | Note that the latter depends on @var{argz} having a value of @code{0} if | |
2892 | it is empty (rather than a pointer to an empty block of memory); this | |
2893 | invariant is maintained for argz vectors created by the functions here. | |
2894 | @end deftypefun | |
2895 | ||
d705269e | 2896 | @deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}}) |
d08a7e4c | 2897 | @standards{GNU, argz.h} |
11087373 | 2898 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
49c091e5 | 2899 | Replace any occurrences of the string @var{str} in @var{argz} with |
d705269e UD |
2900 | @var{with}, reallocating @var{argz} as necessary. If |
2901 | @var{replace_count} is non-zero, @code{*@var{replace_count}} will be | |
f0f308c1 | 2902 | incremented by the number of replacements performed. |
d705269e UD |
2903 | @end deftypefun |
2904 | ||
b13927da UD |
2905 | @node Envz Functions, , Argz Functions, Argz and Envz Vectors |
2906 | @subsection Envz Functions | |
2907 | ||
2908 | Envz vectors are just argz vectors with additional constraints on the form | |
2909 | of each element; as such, argz functions can also be used on them, where it | |
2910 | makes sense. | |
2911 | ||
2912 | Each element in an envz vector is a name-value pair, separated by a @code{'='} | |
2cc4b9cc | 2913 | byte; if multiple @code{'='} bytes are present in an element, those |
b13927da | 2914 | after the first are considered part of the value, and treated like all other |
2cc4b9cc | 2915 | non-@code{'\0'} bytes. |
b13927da | 2916 | |
2cc4b9cc | 2917 | If @emph{no} @code{'='} bytes are present in an element, that element is |
b13927da UD |
2918 | considered the name of a ``null'' entry, as distinct from an entry with an |
2919 | empty value: @code{envz_get} will return @code{0} if given the name of null | |
2920 | entry, whereas an entry with an empty value would result in a value of | |
2921 | @code{""}; @code{envz_entry} will still find such entries, however. Null | |
f0f308c1 | 2922 | entries can be removed with the @code{envz_strip} function. |
b13927da UD |
2923 | |
2924 | As with argz functions, envz functions that may allocate memory (and thus | |
2925 | fail) have a return type of @code{error_t}, and return either @code{0} or | |
2926 | @code{ENOMEM}. | |
2927 | ||
2928 | @pindex envz.h | |
2929 | These functions are declared in the standard include file @file{envz.h}. | |
2930 | ||
2931 | @deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) | |
d08a7e4c | 2932 | @standards{GNU, envz.h} |
11087373 | 2933 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b13927da UD |
2934 | The @code{envz_entry} function finds the entry in @var{envz} with the name |
2935 | @var{name}, and returns a pointer to the whole entry---that is, the argz | |
2cc4b9cc | 2936 | element which begins with @var{name} followed by a @code{'='} byte. If |
b13927da UD |
2937 | there is no entry with that name, @code{0} is returned. |
2938 | @end deftypefun | |
2939 | ||
2940 | @deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) | |
d08a7e4c | 2941 | @standards{GNU, envz.h} |
11087373 | 2942 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b13927da UD |
2943 | The @code{envz_get} function finds the entry in @var{envz} with the name |
2944 | @var{name} (like @code{envz_entry}), and returns a pointer to the value | |
2945 | portion of that entry (following the @code{'='}). If there is no entry with | |
2946 | that name (or only a null entry), @code{0} is returned. | |
2947 | @end deftypefun | |
2948 | ||
2949 | @deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value}) | |
d08a7e4c | 2950 | @standards{GNU, envz.h} |
11087373 AO |
2951 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
2952 | @c Calls envz_remove, which calls enz_entry and argz_delete, and then | |
2953 | @c argz_add or equivalent code that reallocs and appends name=value. | |
b13927da UD |
2954 | The @code{envz_add} function adds an entry to @code{*@var{envz}} |
2955 | (updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name | |
2956 | @var{name}, and value @var{value}. If an entry with the same name | |
2957 | already exists in @var{envz}, it is removed first. If @var{value} is | |
f0f308c1 | 2958 | @code{0}, then the new entry will be the special null type of entry |
b13927da UD |
2959 | (mentioned above). |
2960 | @end deftypefun | |
2961 | ||
2962 | @deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override}) | |
d08a7e4c | 2963 | @standards{GNU, envz.h} |
11087373 | 2964 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
b13927da UD |
2965 | The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz}, |
2966 | as if with @code{envz_add}, updating @code{*@var{envz}} and | |
2967 | @code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2} | |
2968 | will supersede those with the same name in @var{envz}, otherwise not. | |
2969 | ||
2970 | Null entries are treated just like other entries in this respect, so a null | |
2971 | entry in @var{envz} can prevent an entry of the same name in @var{envz2} from | |
2972 | being added to @var{envz}, if @var{override} is false. | |
2973 | @end deftypefun | |
2974 | ||
2975 | @deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len}) | |
d08a7e4c | 2976 | @standards{GNU, envz.h} |
11087373 | 2977 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
b13927da UD |
2978 | The @code{envz_strip} function removes any null entries from @var{envz}, |
2979 | updating @code{*@var{envz}} and @code{*@var{envz_len}}. | |
2980 | @end deftypefun | |
11087373 | 2981 | |
920d7012 | 2982 | @deftypefun {void} envz_remove (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}) |
d08a7e4c | 2983 | @standards{GNU, envz.h} |
654055e0 | 2984 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
920d7012 SP |
2985 | The @code{envz_remove} function removes an entry named @var{name} from |
2986 | @var{envz}, updating @code{*@var{envz}} and @code{*@var{envz_len}}. | |
2987 | @end deftypefun | |
2988 | ||
11087373 AO |
2989 | @c FIXME this are undocumented: |
2990 | @c strcasecmp_l @safety{@mtsafe{}@assafe{}@acsafe{}} see strcasecmp |