]>
Commit | Line | Data |
---|---|---|
28f540f4 RM |
1 | @node String and Array Utilities, Extended Characters, Character Handling, Top |
2 | @chapter String and Array Utilities | |
3 | ||
4 | Operations on strings (or arrays of characters) are an important part of | |
5 | many programs. The GNU C library provides an extensive set of string | |
6 | utility functions, including functions for copying, concatenating, | |
7 | comparing, and searching strings. Many of these functions can also | |
8 | operate on arbitrary regions of storage; for example, the @code{memcpy} | |
a5113b14 | 9 | function can be used to copy the contents of any kind of array. |
28f540f4 RM |
10 | |
11 | It's fairly common for beginning C programmers to ``reinvent the wheel'' | |
12 | by duplicating this functionality in their own code, but it pays to | |
13 | become familiar with the library functions and to make use of them, | |
14 | since this offers benefits in maintenance, efficiency, and portability. | |
15 | ||
16 | For instance, you could easily compare one string to another in two | |
17 | lines of C code, but if you use the built-in @code{strcmp} function, | |
18 | you're less likely to make a mistake. And, since these library | |
19 | functions are typically highly optimized, your program may run faster | |
20 | too. | |
21 | ||
22 | @menu | |
23 | * Representation of Strings:: Introduction to basic concepts. | |
24 | * String/Array Conventions:: Whether to use a string function or an | |
25 | arbitrary array function. | |
26 | * String Length:: Determining the length of a string. | |
27 | * Copying and Concatenation:: Functions to copy the contents of strings | |
28 | and arrays. | |
29 | * String/Array Comparison:: Functions for byte-wise and character-wise | |
30 | comparison. | |
31 | * Collation Functions:: Functions for collating strings. | |
32 | * Search Functions:: Searching for a specific element or substring. | |
33 | * Finding Tokens in a String:: Splitting a string into tokens by looking | |
34 | for delimiters. | |
b4012b75 | 35 | * Encode Binary Data:: Encoding and Decoding of Binary Data. |
b13927da | 36 | * Argz and Envz Vectors:: Null-separated string vectors. |
28f540f4 RM |
37 | @end menu |
38 | ||
b4012b75 | 39 | @node Representation of Strings |
28f540f4 RM |
40 | @section Representation of Strings |
41 | @cindex string, representation of | |
42 | ||
43 | This section is a quick summary of string concepts for beginning C | |
44 | programmers. It describes how character strings are represented in C | |
45 | and some common pitfalls. If you are already familiar with this | |
46 | material, you can skip this section. | |
47 | ||
48 | @cindex string | |
49 | @cindex null character | |
50 | A @dfn{string} is an array of @code{char} objects. But string-valued | |
51 | variables are usually declared to be pointers of type @code{char *}. | |
52 | Such variables do not include space for the text of a string; that has | |
53 | to be stored somewhere else---in an array variable, a string constant, | |
54 | or dynamically allocated memory (@pxref{Memory Allocation}). It's up to | |
55 | you to store the address of the chosen memory space into the pointer | |
56 | variable. Alternatively you can store a @dfn{null pointer} in the | |
57 | pointer variable. The null pointer does not point anywhere, so | |
58 | attempting to reference the string it points to gets an error. | |
59 | ||
60 | By convention, a @dfn{null character}, @code{'\0'}, marks the end of a | |
61 | string. For example, in testing to see whether the @code{char *} | |
62 | variable @var{p} points to a null character marking the end of a string, | |
63 | you can write @code{!*@var{p}} or @code{*@var{p} == '\0'}. | |
64 | ||
65 | A null character is quite different conceptually from a null pointer, | |
66 | although both are represented by the integer @code{0}. | |
67 | ||
68 | @cindex string literal | |
69 | @dfn{String literals} appear in C program source as strings of | |
f65fd747 | 70 | characters between double-quote characters (@samp{"}). In @w{ISO C}, |
28f540f4 RM |
71 | string literals can also be formed by @dfn{string concatenation}: |
72 | @code{"a" "b"} is the same as @code{"ab"}. Modification of string | |
73 | literals is not allowed by the GNU C compiler, because literals | |
74 | are placed in read-only storage. | |
75 | ||
76 | Character arrays that are declared @code{const} cannot be modified | |
77 | either. It's generally good style to declare non-modifiable string | |
78 | pointers to be of type @code{const char *}, since this often allows the | |
79 | C compiler to detect accidental modifications as well as providing some | |
80 | amount of documentation about what your program intends to do with the | |
81 | string. | |
82 | ||
83 | The amount of memory allocated for the character array may extend past | |
84 | the null character that normally marks the end of the string. In this | |
85 | document, the term @dfn{allocation size} is always used to refer to the | |
86 | total amount of memory allocated for the string, while the term | |
87 | @dfn{length} refers to the number of characters up to (but not | |
88 | including) the terminating null character. | |
89 | @cindex length of string | |
90 | @cindex allocation size of string | |
91 | @cindex size of string | |
92 | @cindex string length | |
93 | @cindex string allocation | |
94 | ||
95 | A notorious source of program bugs is trying to put more characters in a | |
96 | string than fit in its allocated size. When writing code that extends | |
97 | strings or moves characters into a pre-allocated array, you should be | |
98 | very careful to keep track of the length of the text and make explicit | |
99 | checks for overflowing the array. Many of the library functions | |
100 | @emph{do not} do this for you! Remember also that you need to allocate | |
101 | an extra byte to hold the null character that marks the end of the | |
102 | string. | |
103 | ||
b4012b75 | 104 | @node String/Array Conventions |
28f540f4 RM |
105 | @section String and Array Conventions |
106 | ||
107 | This chapter describes both functions that work on arbitrary arrays or | |
108 | blocks of memory, and functions that are specific to null-terminated | |
109 | arrays of characters. | |
110 | ||
111 | Functions that operate on arbitrary blocks of memory have names | |
112 | beginning with @samp{mem} (such as @code{memcpy}) and invariably take an | |
113 | argument which specifies the size (in bytes) of the block of memory to | |
114 | operate on. The array arguments and return values for these functions | |
115 | have type @code{void *}, and as a matter of style, the elements of these | |
116 | arrays are referred to as ``bytes''. You can pass any kind of pointer | |
117 | to these functions, and the @code{sizeof} operator is useful in | |
118 | computing the value for the size argument. | |
119 | ||
120 | In contrast, functions that operate specifically on strings have names | |
121 | beginning with @samp{str} (such as @code{strcpy}) and look for a null | |
122 | character to terminate the string instead of requiring an explicit size | |
123 | argument to be passed. (Some of these functions accept a specified | |
124 | maximum length, but they also check for premature termination with a | |
125 | null character.) The array arguments and return values for these | |
126 | functions have type @code{char *}, and the array elements are referred | |
127 | to as ``characters''. | |
128 | ||
129 | In many cases, there are both @samp{mem} and @samp{str} versions of a | |
130 | function. The one that is more appropriate to use depends on the exact | |
131 | situation. When your program is manipulating arbitrary arrays or blocks of | |
132 | storage, then you should always use the @samp{mem} functions. On the | |
133 | other hand, when you are manipulating null-terminated strings it is | |
134 | usually more convenient to use the @samp{str} functions, unless you | |
135 | already know the length of the string in advance. | |
136 | ||
b4012b75 | 137 | @node String Length |
28f540f4 RM |
138 | @section String Length |
139 | ||
140 | You can get the length of a string using the @code{strlen} function. | |
141 | This function is declared in the header file @file{string.h}. | |
142 | @pindex string.h | |
143 | ||
144 | @comment string.h | |
f65fd747 | 145 | @comment ISO |
28f540f4 RM |
146 | @deftypefun size_t strlen (const char *@var{s}) |
147 | The @code{strlen} function returns the length of the null-terminated | |
148 | string @var{s}. (In other words, it returns the offset of the terminating | |
149 | null character within the array.) | |
150 | ||
151 | For example, | |
152 | @smallexample | |
153 | strlen ("hello, world") | |
154 | @result{} 12 | |
155 | @end smallexample | |
156 | ||
157 | When applied to a character array, the @code{strlen} function returns | |
158 | the length of the string stored there, not its allocation size. You can | |
159 | get the allocation size of the character array that holds a string using | |
160 | the @code{sizeof} operator: | |
161 | ||
162 | @smallexample | |
a5113b14 | 163 | char string[32] = "hello, world"; |
28f540f4 RM |
164 | sizeof (string) |
165 | @result{} 32 | |
166 | strlen (string) | |
167 | @result{} 12 | |
168 | @end smallexample | |
169 | @end deftypefun | |
170 | ||
4547c1a4 UD |
171 | @comment string.h |
172 | @comment GNU | |
173 | @deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen}) | |
174 | The @code{strnlen} function returns the length of the null-terminated | |
175 | string @var{s} is this length is smaller than @var{maxlen}. Otherwise | |
176 | it returns @var{maxlen}. Therefore this function is equivalent to | |
177 | @code{(strlen (@var{s}) < n ? strlen (@var{s}) : @var{maxlen})} but it | |
f2ea0f5b | 178 | is more efficient. |
4547c1a4 UD |
179 | |
180 | @smallexample | |
181 | char string[32] = "hello, world"; | |
182 | strnlen (string, 32) | |
183 | @result{} 12 | |
184 | strnlen (string, 5) | |
185 | @result{} 5 | |
186 | @end smallexample | |
187 | ||
188 | This function is a GNU extension. | |
189 | @end deftypefun | |
190 | ||
b4012b75 | 191 | @node Copying and Concatenation |
28f540f4 RM |
192 | @section Copying and Concatenation |
193 | ||
194 | You can use the functions described in this section to copy the contents | |
195 | of strings and arrays, or to append the contents of one string to | |
196 | another. These functions are declared in the header file | |
197 | @file{string.h}. | |
198 | @pindex string.h | |
199 | @cindex copying strings and arrays | |
200 | @cindex string copy functions | |
201 | @cindex array copy functions | |
202 | @cindex concatenating strings | |
203 | @cindex string concatenation functions | |
204 | ||
205 | A helpful way to remember the ordering of the arguments to the functions | |
206 | in this section is that it corresponds to an assignment expression, with | |
207 | the destination array specified to the left of the source array. All | |
208 | of these functions return the address of the destination array. | |
209 | ||
210 | Most of these functions do not work properly if the source and | |
211 | destination arrays overlap. For example, if the beginning of the | |
212 | destination array overlaps the end of the source array, the original | |
213 | contents of that part of the source array may get overwritten before it | |
214 | is copied. Even worse, in the case of the string functions, the null | |
215 | character marking the end of the string may be lost, and the copy | |
216 | function might get stuck in a loop trashing all the memory allocated to | |
217 | your program. | |
218 | ||
219 | All functions that have problems copying between overlapping arrays are | |
220 | explicitly identified in this manual. In addition to functions in this | |
221 | section, there are a few others like @code{sprintf} (@pxref{Formatted | |
222 | Output Functions}) and @code{scanf} (@pxref{Formatted Input | |
223 | Functions}). | |
224 | ||
225 | @comment string.h | |
f65fd747 | 226 | @comment ISO |
28f540f4 RM |
227 | @deftypefun {void *} memcpy (void *@var{to}, const void *@var{from}, size_t @var{size}) |
228 | The @code{memcpy} function copies @var{size} bytes from the object | |
229 | beginning at @var{from} into the object beginning at @var{to}. The | |
230 | behavior of this function is undefined if the two arrays @var{to} and | |
231 | @var{from} overlap; use @code{memmove} instead if overlapping is possible. | |
232 | ||
233 | The value returned by @code{memcpy} is the value of @var{to}. | |
234 | ||
235 | Here is an example of how you might use @code{memcpy} to copy the | |
236 | contents of an array: | |
237 | ||
238 | @smallexample | |
239 | struct foo *oldarray, *newarray; | |
240 | int arraysize; | |
241 | @dots{} | |
242 | memcpy (new, old, arraysize * sizeof (struct foo)); | |
243 | @end smallexample | |
244 | @end deftypefun | |
245 | ||
4547c1a4 UD |
246 | @comment string.h |
247 | @comment GNU | |
248 | @deftypefun {void *} mempcpy (void *@var{to}, const void *@var{from}, size_t @var{size}) | |
249 | The @code{mempcpy} function is nearly identical to the @code{memcpy} | |
f2ea0f5b | 250 | function. It copies @var{size} bytes from the object beginning at |
4547c1a4 UD |
251 | @code{from} into the object pointed to by @var{to}. But instead of |
252 | returning the value of @code{to} it returns a pointer to the byte | |
253 | following the last written byte in the object beginning at @var{to}. | |
254 | I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}. | |
255 | ||
256 | This function is useful in situations where a number of objects shall be | |
257 | copied to consecutive memory positions. | |
258 | ||
259 | @smallexample | |
260 | void * | |
261 | combine (void *o1, size_t s1, void *o2, size_t s2) | |
262 | @{ | |
263 | void *result = malloc (s1 + s2); | |
264 | if (result != NULL) | |
265 | mempcpy (mempcpy (result, o1, s1), o2, s2); | |
266 | return result; | |
267 | @} | |
268 | @end smallexample | |
269 | ||
270 | This function is a GNU extension. | |
271 | @end deftypefun | |
272 | ||
28f540f4 | 273 | @comment string.h |
f65fd747 | 274 | @comment ISO |
28f540f4 RM |
275 | @deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size}) |
276 | @code{memmove} copies the @var{size} bytes at @var{from} into the | |
277 | @var{size} bytes at @var{to}, even if those two blocks of space | |
278 | overlap. In the case of overlap, @code{memmove} is careful to copy the | |
279 | original values of the bytes in the block at @var{from}, including those | |
280 | bytes which also belong to the block at @var{to}. | |
281 | @end deftypefun | |
282 | ||
283 | @comment string.h | |
284 | @comment SVID | |
285 | @deftypefun {void *} memccpy (void *@var{to}, const void *@var{from}, int @var{c}, size_t @var{size}) | |
286 | This function copies no more than @var{size} bytes from @var{from} to | |
287 | @var{to}, stopping if a byte matching @var{c} is found. The return | |
288 | value is a pointer into @var{to} one byte past where @var{c} was copied, | |
289 | or a null pointer if no byte matching @var{c} appeared in the first | |
290 | @var{size} bytes of @var{from}. | |
291 | @end deftypefun | |
292 | ||
293 | @comment string.h | |
f65fd747 | 294 | @comment ISO |
28f540f4 RM |
295 | @deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size}) |
296 | This function copies the value of @var{c} (converted to an | |
297 | @code{unsigned char}) into each of the first @var{size} bytes of the | |
298 | object beginning at @var{block}. It returns the value of @var{block}. | |
299 | @end deftypefun | |
300 | ||
301 | @comment string.h | |
f65fd747 | 302 | @comment ISO |
28f540f4 RM |
303 | @deftypefun {char *} strcpy (char *@var{to}, const char *@var{from}) |
304 | This copies characters from the string @var{from} (up to and including | |
305 | the terminating null character) into the string @var{to}. Like | |
306 | @code{memcpy}, this function has undefined results if the strings | |
307 | overlap. The return value is the value of @var{to}. | |
308 | @end deftypefun | |
309 | ||
310 | @comment string.h | |
f65fd747 | 311 | @comment ISO |
28f540f4 RM |
312 | @deftypefun {char *} strncpy (char *@var{to}, const char *@var{from}, size_t @var{size}) |
313 | This function is similar to @code{strcpy} but always copies exactly | |
314 | @var{size} characters into @var{to}. | |
315 | ||
316 | If the length of @var{from} is more than @var{size}, then @code{strncpy} | |
317 | copies just the first @var{size} characters. Note that in this case | |
318 | there is no null terminator written into @var{to}. | |
319 | ||
320 | If the length of @var{from} is less than @var{size}, then @code{strncpy} | |
321 | copies all of @var{from}, followed by enough null characters to add up | |
322 | to @var{size} characters in all. This behavior is rarely useful, but it | |
f65fd747 | 323 | is specified by the @w{ISO C} standard. |
28f540f4 RM |
324 | |
325 | The behavior of @code{strncpy} is undefined if the strings overlap. | |
326 | ||
327 | Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs | |
328 | relating to writing past the end of the allocated space for @var{to}. | |
329 | However, it can also make your program much slower in one common case: | |
330 | copying a string which is probably small into a potentially large buffer. | |
331 | In this case, @var{size} may be large, and when it is, @code{strncpy} will | |
332 | waste a considerable amount of time copying null characters. | |
333 | @end deftypefun | |
334 | ||
335 | @comment string.h | |
336 | @comment SVID | |
337 | @deftypefun {char *} strdup (const char *@var{s}) | |
338 | This function copies the null-terminated string @var{s} into a newly | |
339 | allocated string. The string is allocated using @code{malloc}; see | |
340 | @ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space | |
341 | for the new string, @code{strdup} returns a null pointer. Otherwise it | |
342 | returns a pointer to the new string. | |
343 | @end deftypefun | |
344 | ||
706074a5 UD |
345 | @comment string.h |
346 | @comment GNU | |
347 | @deftypefun {char *} strndup (const char *@var{s}, size_t @var{size}) | |
348 | This function is similar to @code{strdup} but always copies at most | |
349 | @var{size} characters into the newly allocated string. | |
350 | ||
351 | If the length of @var{s} is more than @var{size}, then @code{strndup} | |
352 | copies just the first @var{size} characters and adds a closing null | |
353 | terminator. Otherwise all characters are copied and the string is | |
354 | terminated. | |
355 | ||
356 | This function is different to @code{strncpy} in that it always | |
357 | terminates the destination string. | |
358 | @end deftypefun | |
359 | ||
28f540f4 RM |
360 | @comment string.h |
361 | @comment Unknown origin | |
362 | @deftypefun {char *} stpcpy (char *@var{to}, const char *@var{from}) | |
363 | This function is like @code{strcpy}, except that it returns a pointer to | |
364 | the end of the string @var{to} (that is, the address of the terminating | |
365 | null character) rather than the beginning. | |
366 | ||
367 | For example, this program uses @code{stpcpy} to concatenate @samp{foo} | |
368 | and @samp{bar} to produce @samp{foobar}, which it then prints. | |
369 | ||
370 | @smallexample | |
371 | @include stpcpy.c.texi | |
372 | @end smallexample | |
373 | ||
f65fd747 | 374 | This function is not part of the ISO or POSIX standards, and is not |
28f540f4 RM |
375 | customary on Unix systems, but we did not invent it either. Perhaps it |
376 | comes from MS-DOG. | |
377 | ||
378 | Its behavior is undefined if the strings overlap. | |
379 | @end deftypefun | |
380 | ||
706074a5 UD |
381 | @comment string.h |
382 | @comment GNU | |
383 | @deftypefun {char *} stpncpy (char *@var{to}, const char *@var{from}, size_t @var{size}) | |
384 | This function is similar to @code{stpcpy} but copies always exactly | |
385 | @var{size} characters into @var{to}. | |
386 | ||
387 | If the length of @var{from} is more then @var{size}, then @code{stpncpy} | |
388 | copies just the first @var{size} characters and returns a pointer to the | |
389 | character directly following the one which was copied last. Note that in | |
390 | this case there is no null terminator written into @var{to}. | |
391 | ||
392 | If the length of @var{from} is less than @var{size}, then @code{stpncpy} | |
393 | copies all of @var{from}, followed by enough null characters to add up | |
394 | to @var{size} characters in all. This behaviour is rarely useful, but it | |
395 | is implemented to be useful in contexts where this behaviour of the | |
396 | @code{strncpy} is used. @code{stpncpy} returns a pointer to the | |
397 | @emph{first} written null character. | |
398 | ||
f65fd747 | 399 | This function is not part of ISO or POSIX but was found useful while |
706074a5 UD |
400 | developing GNU C Library itself. |
401 | ||
402 | Its behaviour is undefined if the strings overlap. | |
403 | @end deftypefun | |
404 | ||
405 | @comment string.h | |
406 | @comment GNU | |
26b4d766 | 407 | @deftypefn {Macro} {char *} strdupa (const char *@var{s}) |
706074a5 UD |
408 | This function is similar to @code{strdup} but allocates the new string |
409 | using @code{alloca} instead of @code{malloc} | |
410 | @pxref{Variable Size Automatic}. This means of course the returned | |
411 | string has the same limitations as any block of memory allocated using | |
412 | @code{alloca}. | |
413 | ||
414 | For obvious reasons @code{strdupa} is implemented only as a macro. I.e., | |
40a55d20 | 415 | you cannot get the address of this function. Despite this limitation |
706074a5 UD |
416 | it is a useful function. The following code shows a situation where |
417 | using @code{malloc} would be a lot more expensive. | |
418 | ||
419 | @smallexample | |
420 | @include strdupa.c.texi | |
421 | @end smallexample | |
422 | ||
423 | Please note that calling @code{strtok} using @var{path} directly is | |
40a55d20 | 424 | invalid. |
706074a5 UD |
425 | |
426 | This function is only available if GNU CC is used. | |
26b4d766 | 427 | @end deftypefn |
706074a5 UD |
428 | |
429 | @comment string.h | |
430 | @comment GNU | |
26b4d766 | 431 | @deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size}) |
706074a5 UD |
432 | This function is similar to @code{strndup} but like @code{strdupa} it |
433 | allocates the new string using @code{alloca} | |
434 | @pxref{Variable Size Automatic}. The same advantages and limitations | |
435 | of @code{strdupa} are valid for @code{strndupa}, too. | |
436 | ||
437 | This function is implemented only as a macro which means one cannot | |
438 | get the address of it. | |
439 | ||
440 | @code{strndupa} is only available if GNU CC is used. | |
26b4d766 | 441 | @end deftypefn |
706074a5 | 442 | |
28f540f4 | 443 | @comment string.h |
f65fd747 | 444 | @comment ISO |
28f540f4 RM |
445 | @deftypefun {char *} strcat (char *@var{to}, const char *@var{from}) |
446 | The @code{strcat} function is similar to @code{strcpy}, except that the | |
447 | characters from @var{from} are concatenated or appended to the end of | |
448 | @var{to}, instead of overwriting it. That is, the first character from | |
449 | @var{from} overwrites the null character marking the end of @var{to}. | |
450 | ||
451 | An equivalent definition for @code{strcat} would be: | |
452 | ||
453 | @smallexample | |
454 | char * | |
455 | strcat (char *to, const char *from) | |
456 | @{ | |
457 | strcpy (to + strlen (to), from); | |
458 | return to; | |
459 | @} | |
460 | @end smallexample | |
461 | ||
462 | This function has undefined results if the strings overlap. | |
463 | @end deftypefun | |
464 | ||
465 | @comment string.h | |
f65fd747 | 466 | @comment ISO |
28f540f4 RM |
467 | @deftypefun {char *} strncat (char *@var{to}, const char *@var{from}, size_t @var{size}) |
468 | This function is like @code{strcat} except that not more than @var{size} | |
469 | characters from @var{from} are appended to the end of @var{to}. A | |
470 | single null character is also always appended to @var{to}, so the total | |
471 | allocated size of @var{to} must be at least @code{@var{size} + 1} bytes | |
472 | longer than its initial length. | |
473 | ||
474 | The @code{strncat} function could be implemented like this: | |
475 | ||
476 | @smallexample | |
477 | @group | |
478 | char * | |
479 | strncat (char *to, const char *from, size_t size) | |
480 | @{ | |
481 | strncpy (to + strlen (to), from, size); | |
482 | return to; | |
483 | @} | |
484 | @end group | |
485 | @end smallexample | |
486 | ||
487 | The behavior of @code{strncat} is undefined if the strings overlap. | |
488 | @end deftypefun | |
489 | ||
490 | Here is an example showing the use of @code{strncpy} and @code{strncat}. | |
491 | Notice how, in the call to @code{strncat}, the @var{size} parameter | |
492 | is computed to avoid overflowing the character array @code{buffer}. | |
493 | ||
494 | @smallexample | |
495 | @include strncat.c.texi | |
496 | @end smallexample | |
497 | ||
498 | @noindent | |
499 | The output produced by this program looks like: | |
500 | ||
501 | @smallexample | |
502 | hello | |
503 | hello, wo | |
504 | @end smallexample | |
505 | ||
506 | @comment string.h | |
507 | @comment BSD | |
508 | @deftypefun {void *} bcopy (void *@var{from}, const void *@var{to}, size_t @var{size}) | |
509 | This is a partially obsolete alternative for @code{memmove}, derived from | |
510 | BSD. Note that it is not quite equivalent to @code{memmove}, because the | |
511 | arguments are not in the same order. | |
512 | @end deftypefun | |
513 | ||
514 | @comment string.h | |
515 | @comment BSD | |
516 | @deftypefun {void *} bzero (void *@var{block}, size_t @var{size}) | |
517 | This is a partially obsolete alternative for @code{memset}, derived from | |
518 | BSD. Note that it is not as general as @code{memset}, because the only | |
519 | value it can store is zero. | |
520 | @end deftypefun | |
521 | ||
b4012b75 | 522 | @node String/Array Comparison |
28f540f4 RM |
523 | @section String/Array Comparison |
524 | @cindex comparing strings and arrays | |
525 | @cindex string comparison functions | |
526 | @cindex array comparison functions | |
527 | @cindex predicates on strings | |
528 | @cindex predicates on arrays | |
529 | ||
530 | You can use the functions in this section to perform comparisons on the | |
531 | contents of strings and arrays. As well as checking for equality, these | |
532 | functions can also be used as the ordering functions for sorting | |
533 | operations. @xref{Searching and Sorting}, for an example of this. | |
534 | ||
535 | Unlike most comparison operations in C, the string comparison functions | |
536 | return a nonzero value if the strings are @emph{not} equivalent rather | |
537 | than if they are. The sign of the value indicates the relative ordering | |
538 | of the first characters in the strings that are not equivalent: a | |
539 | negative value indicates that the first string is ``less'' than the | |
a5113b14 | 540 | second, while a positive value indicates that the first string is |
28f540f4 RM |
541 | ``greater''. |
542 | ||
543 | The most common use of these functions is to check only for equality. | |
544 | This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}. | |
545 | ||
546 | All of these functions are declared in the header file @file{string.h}. | |
547 | @pindex string.h | |
548 | ||
549 | @comment string.h | |
f65fd747 | 550 | @comment ISO |
28f540f4 RM |
551 | @deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) |
552 | The function @code{memcmp} compares the @var{size} bytes of memory | |
553 | beginning at @var{a1} against the @var{size} bytes of memory beginning | |
554 | at @var{a2}. The value returned has the same sign as the difference | |
555 | between the first differing pair of bytes (interpreted as @code{unsigned | |
556 | char} objects, then promoted to @code{int}). | |
557 | ||
558 | If the contents of the two blocks are equal, @code{memcmp} returns | |
559 | @code{0}. | |
560 | @end deftypefun | |
561 | ||
562 | On arbitrary arrays, the @code{memcmp} function is mostly useful for | |
563 | testing equality. It usually isn't meaningful to do byte-wise ordering | |
564 | comparisons on arrays of things other than bytes. For example, a | |
565 | byte-wise comparison on the bytes that make up floating-point numbers | |
566 | isn't likely to tell you anything about the relationship between the | |
567 | values of the floating-point numbers. | |
568 | ||
569 | You should also be careful about using @code{memcmp} to compare objects | |
570 | that can contain ``holes'', such as the padding inserted into structure | |
571 | objects to enforce alignment requirements, extra space at the end of | |
572 | unions, and extra characters at the ends of strings whose length is less | |
573 | than their allocated size. The contents of these ``holes'' are | |
574 | indeterminate and may cause strange behavior when performing byte-wise | |
575 | comparisons. For more predictable results, perform an explicit | |
576 | component-wise comparison. | |
577 | ||
578 | For example, given a structure type definition like: | |
579 | ||
580 | @smallexample | |
581 | struct foo | |
582 | @{ | |
583 | unsigned char tag; | |
584 | union | |
585 | @{ | |
586 | double f; | |
587 | long i; | |
588 | char *p; | |
589 | @} value; | |
590 | @}; | |
591 | @end smallexample | |
592 | ||
593 | @noindent | |
594 | you are better off writing a specialized comparison function to compare | |
595 | @code{struct foo} objects instead of comparing them with @code{memcmp}. | |
596 | ||
597 | @comment string.h | |
f65fd747 | 598 | @comment ISO |
28f540f4 RM |
599 | @deftypefun int strcmp (const char *@var{s1}, const char *@var{s2}) |
600 | The @code{strcmp} function compares the string @var{s1} against | |
601 | @var{s2}, returning a value that has the same sign as the difference | |
602 | between the first differing pair of characters (interpreted as | |
603 | @code{unsigned char} objects, then promoted to @code{int}). | |
604 | ||
605 | If the two strings are equal, @code{strcmp} returns @code{0}. | |
606 | ||
607 | A consequence of the ordering used by @code{strcmp} is that if @var{s1} | |
608 | is an initial substring of @var{s2}, then @var{s1} is considered to be | |
609 | ``less than'' @var{s2}. | |
610 | @end deftypefun | |
611 | ||
612 | @comment string.h | |
613 | @comment BSD | |
614 | @deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2}) | |
4547c1a4 UD |
615 | This function is like @code{strcmp}, except that differences in case are |
616 | ignored. How uppercase and lowercase character are related is | |
617 | determined by the currently selected locale. In the standard @code{"C"} | |
618 | locale the characters @"A and @"a do not match but in a locale which | |
f2ea0f5b | 619 | regards this characters as parts of the alphabet they do match. |
28f540f4 RM |
620 | |
621 | @code{strcasecmp} is derived from BSD. | |
622 | @end deftypefun | |
623 | ||
624 | @comment string.h | |
625 | @comment BSD | |
626 | @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n}) | |
627 | This function is like @code{strncmp}, except that differences in case | |
4547c1a4 UD |
628 | are ignored. Like for @code{strcasecmp} it is locale dependent how |
629 | uppercase and lowercase character are related. | |
28f540f4 RM |
630 | |
631 | @code{strncasecmp} is a GNU extension. | |
632 | @end deftypefun | |
633 | ||
634 | @comment string.h | |
f65fd747 | 635 | @comment ISO |
28f540f4 RM |
636 | @deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size}) |
637 | This function is the similar to @code{strcmp}, except that no more than | |
638 | @var{size} characters are compared. In other words, if the two strings are | |
639 | the same in their first @var{size} characters, the return value is zero. | |
640 | @end deftypefun | |
641 | ||
642 | Here are some examples showing the use of @code{strcmp} and @code{strncmp}. | |
643 | These examples assume the use of the ASCII character set. (If some | |
644 | other character set---say, EBCDIC---is used instead, then the glyphs | |
645 | are associated with different numeric codes, and the return values | |
646 | and ordering may differ.) | |
647 | ||
648 | @smallexample | |
649 | strcmp ("hello", "hello") | |
650 | @result{} 0 /* @r{These two strings are the same.} */ | |
651 | strcmp ("hello", "Hello") | |
652 | @result{} 32 /* @r{Comparisons are case-sensitive.} */ | |
653 | strcmp ("hello", "world") | |
654 | @result{} -15 /* @r{The character @code{'h'} comes before @code{'w'}.} */ | |
655 | strcmp ("hello", "hello, world") | |
656 | @result{} -44 /* @r{Comparing a null character against a comma.} */ | |
6952e59e | 657 | strncmp ("hello", "hello, world", 5) |
28f540f4 RM |
658 | @result{} 0 /* @r{The initial 5 characters are the same.} */ |
659 | strncmp ("hello, world", "hello, stupid world!!!", 5) | |
660 | @result{} 0 /* @r{The initial 5 characters are the same.} */ | |
661 | @end smallexample | |
662 | ||
1f205a47 UD |
663 | @comment string.h |
664 | @comment GNU | |
665 | @deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2}) | |
666 | The @code{strverscmp} function compares the string @var{s1} against | |
667 | @var{s2}, considering them as holding indices/version numbers. Return | |
668 | value follows the same conventions as found in the @code{strverscmp} | |
669 | function. In fact, if @var{s1} and @var{s2} contain no digits, | |
670 | @code{strverscmp} behaves like @code{strcmp}. | |
671 | ||
f2ea0f5b | 672 | Basically, we compare strings normally (character by character), until |
1f205a47 UD |
673 | we find a digit in each string - then we enter a special comparison |
674 | mode, where each sequence of digit is taken as a whole. If we reach the | |
675 | end of these two parts without noticing a difference, we return to the | |
676 | standard comparison mode. There are two types of numeric parts: | |
f2ea0f5b | 677 | "integral" and "fractional" (those begin with a '0'). The types |
1f205a47 UD |
678 | of the numeric parts affect the way we sort them: |
679 | ||
680 | @itemize @bullet | |
681 | @item | |
682 | integral/integral: we compare values as you would expect. | |
683 | ||
684 | @item | |
f2ea0f5b | 685 | fractional/integral: the fractional part is less than the integral one. |
1f205a47 UD |
686 | Again, no surprise. |
687 | ||
688 | @item | |
f2ea0f5b UD |
689 | fractional/fractional: the things become a bit more complex. |
690 | If the common prefix contains only leading zeroes, the longest part is less | |
691 | than the other one; else the comparison behaves normally. | |
1f205a47 UD |
692 | @end itemize |
693 | ||
694 | @smallexample | |
695 | strverscmp ("no digit", "no digit") | |
696 | @result{} 0 /* @r{same behaviour as strverscmp.} */ | |
697 | strverscmp ("item#99", "item#100") | |
698 | @result{} <0 /* @r{same prefix, but 99 < 100.} */ | |
699 | strverscmp ("alpha1", "alpha001") | |
f2ea0f5b | 700 | @result{} >0 /* @r{fractional part inferior to integral one.} */ |
1f205a47 | 701 | strverscmp ("part1_f012", "part1_f01") |
f2ea0f5b | 702 | @result{} >0 /* @r{two fractional parts.} */ |
1f205a47 UD |
703 | strverscmp ("foo.009", "foo.0") |
704 | @result{} <0 /* @r{idem, but with leading zeroes only.} */ | |
705 | @end smallexample | |
706 | ||
f2ea0f5b | 707 | This function is especially useful when dealing with filename sorting, |
1f205a47 UD |
708 | because filenames frequently hold indices/version numbers. |
709 | ||
710 | @code{strverscmp} is a GNU extension. | |
711 | @end deftypefun | |
712 | ||
28f540f4 RM |
713 | @comment string.h |
714 | @comment BSD | |
715 | @deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) | |
716 | This is an obsolete alias for @code{memcmp}, derived from BSD. | |
717 | @end deftypefun | |
718 | ||
b4012b75 | 719 | @node Collation Functions |
28f540f4 RM |
720 | @section Collation Functions |
721 | ||
722 | @cindex collating strings | |
723 | @cindex string collation functions | |
724 | ||
725 | In some locales, the conventions for lexicographic ordering differ from | |
726 | the strict numeric ordering of character codes. For example, in Spanish | |
727 | most glyphs with diacritical marks such as accents are not considered | |
728 | distinct letters for the purposes of collation. On the other hand, the | |
729 | two-character sequence @samp{ll} is treated as a single letter that is | |
730 | collated immediately after @samp{l}. | |
731 | ||
732 | You can use the functions @code{strcoll} and @code{strxfrm} (declared in | |
733 | the header file @file{string.h}) to compare strings using a collation | |
734 | ordering appropriate for the current locale. The locale used by these | |
735 | functions in particular can be specified by setting the locale for the | |
736 | @code{LC_COLLATE} category; see @ref{Locales}. | |
737 | @pindex string.h | |
738 | ||
739 | In the standard C locale, the collation sequence for @code{strcoll} is | |
740 | the same as that for @code{strcmp}. | |
741 | ||
742 | Effectively, the way these functions work is by applying a mapping to | |
743 | transform the characters in a string to a byte sequence that represents | |
744 | the string's position in the collating sequence of the current locale. | |
745 | Comparing two such byte sequences in a simple fashion is equivalent to | |
746 | comparing the strings with the locale's collating sequence. | |
747 | ||
748 | The function @code{strcoll} performs this translation implicitly, in | |
749 | order to do one comparison. By contrast, @code{strxfrm} performs the | |
750 | mapping explicitly. If you are making multiple comparisons using the | |
751 | same string or set of strings, it is likely to be more efficient to use | |
752 | @code{strxfrm} to transform all the strings just once, and subsequently | |
753 | compare the transformed strings with @code{strcmp}. | |
754 | ||
755 | @comment string.h | |
f65fd747 | 756 | @comment ISO |
28f540f4 RM |
757 | @deftypefun int strcoll (const char *@var{s1}, const char *@var{s2}) |
758 | The @code{strcoll} function is similar to @code{strcmp} but uses the | |
759 | collating sequence of the current locale for collation (the | |
760 | @code{LC_COLLATE} locale). | |
761 | @end deftypefun | |
762 | ||
763 | Here is an example of sorting an array of strings, using @code{strcoll} | |
764 | to compare them. The actual sort algorithm is not written here; it | |
765 | comes from @code{qsort} (@pxref{Array Sort Function}). The job of the | |
766 | code shown here is to say how to compare the strings while sorting them. | |
767 | (Later on in this section, we will show a way to do this more | |
768 | efficiently using @code{strxfrm}.) | |
769 | ||
770 | @smallexample | |
771 | /* @r{This is the comparison function used with @code{qsort}.} */ | |
772 | ||
773 | int | |
774 | compare_elements (char **p1, char **p2) | |
775 | @{ | |
776 | return strcoll (*p1, *p2); | |
777 | @} | |
778 | ||
779 | /* @r{This is the entry point---the function to sort} | |
780 | @r{strings using the locale's collating sequence.} */ | |
781 | ||
782 | void | |
783 | sort_strings (char **array, int nstrings) | |
784 | @{ | |
785 | /* @r{Sort @code{temp_array} by comparing the strings.} */ | |
786 | qsort (array, sizeof (char *), | |
787 | nstrings, compare_elements); | |
788 | @} | |
789 | @end smallexample | |
790 | ||
791 | @cindex converting string to collation order | |
792 | @comment string.h | |
f65fd747 | 793 | @comment ISO |
28f540f4 RM |
794 | @deftypefun size_t strxfrm (char *@var{to}, const char *@var{from}, size_t @var{size}) |
795 | The function @code{strxfrm} transforms @var{string} using the collation | |
796 | transformation determined by the locale currently selected for | |
797 | collation, and stores the transformed string in the array @var{to}. Up | |
798 | to @var{size} characters (including a terminating null character) are | |
799 | stored. | |
800 | ||
801 | The behavior is undefined if the strings @var{to} and @var{from} | |
802 | overlap; see @ref{Copying and Concatenation}. | |
803 | ||
804 | The return value is the length of the entire transformed string. This | |
805 | value is not affected by the value of @var{size}, but if it is greater | |
a5113b14 UD |
806 | or equal than @var{size}, it means that the transformed string did not |
807 | entirely fit in the array @var{to}. In this case, only as much of the | |
808 | string as actually fits was stored. To get the whole transformed | |
809 | string, call @code{strxfrm} again with a bigger output array. | |
28f540f4 RM |
810 | |
811 | The transformed string may be longer than the original string, and it | |
812 | may also be shorter. | |
813 | ||
814 | If @var{size} is zero, no characters are stored in @var{to}. In this | |
815 | case, @code{strxfrm} simply returns the number of characters that would | |
816 | be the length of the transformed string. This is useful for determining | |
817 | what size string to allocate. It does not matter what @var{to} is if | |
818 | @var{size} is zero; @var{to} may even be a null pointer. | |
819 | @end deftypefun | |
820 | ||
821 | Here is an example of how you can use @code{strxfrm} when | |
822 | you plan to do many comparisons. It does the same thing as the previous | |
823 | example, but much faster, because it has to transform each string only | |
824 | once, no matter how many times it is compared with other strings. Even | |
825 | the time needed to allocate and free storage is much less than the time | |
826 | we save, when there are many strings. | |
827 | ||
828 | @smallexample | |
829 | struct sorter @{ char *input; char *transformed; @}; | |
830 | ||
831 | /* @r{This is the comparison function used with @code{qsort}} | |
832 | @r{to sort an array of @code{struct sorter}.} */ | |
833 | ||
834 | int | |
835 | compare_elements (struct sorter *p1, struct sorter *p2) | |
836 | @{ | |
837 | return strcmp (p1->transformed, p2->transformed); | |
838 | @} | |
839 | ||
840 | /* @r{This is the entry point---the function to sort} | |
841 | @r{strings using the locale's collating sequence.} */ | |
842 | ||
843 | void | |
844 | sort_strings_fast (char **array, int nstrings) | |
845 | @{ | |
846 | struct sorter temp_array[nstrings]; | |
847 | int i; | |
848 | ||
849 | /* @r{Set up @code{temp_array}. Each element contains} | |
850 | @r{one input string and its transformed string.} */ | |
851 | for (i = 0; i < nstrings; i++) | |
852 | @{ | |
853 | size_t length = strlen (array[i]) * 2; | |
a5113b14 | 854 | char *transformed; |
f2ea0f5b | 855 | size_t transformed_length; |
28f540f4 RM |
856 | |
857 | temp_array[i].input = array[i]; | |
858 | ||
a5113b14 UD |
859 | /* @r{First try a buffer perhaps big enough.} */ |
860 | transformed = (char *) xmalloc (length); | |
861 | ||
862 | /* @r{Transform @code{array[i]}.} */ | |
863 | transformed_length = strxfrm (transformed, array[i], length); | |
864 | ||
865 | /* @r{If the buffer was not large enough, resize it} | |
866 | @r{and try again.} */ | |
867 | if (transformed_length >= length) | |
28f540f4 | 868 | @{ |
a5113b14 UD |
869 | /* @r{Allocate the needed space. +1 for terminating} |
870 | @r{@code{NUL} character.} */ | |
871 | transformed = (char *) xrealloc (transformed, | |
872 | transformed_length + 1); | |
873 | ||
874 | /* @r{The return value is not interesting because we know} | |
875 | @r{how long the transformed string is.} */ | |
876 | (void) strxfrm (transformed, array[i], transformed_length + 1); | |
28f540f4 | 877 | @} |
a5113b14 UD |
878 | |
879 | temp_array[i].transformed = transformed; | |
28f540f4 RM |
880 | @} |
881 | ||
882 | /* @r{Sort @code{temp_array} by comparing transformed strings.} */ | |
883 | qsort (temp_array, sizeof (struct sorter), | |
884 | nstrings, compare_elements); | |
885 | ||
886 | /* @r{Put the elements back in the permanent array} | |
887 | @r{in their sorted order.} */ | |
888 | for (i = 0; i < nstrings; i++) | |
889 | array[i] = temp_array[i].input; | |
890 | ||
891 | /* @r{Free the strings we allocated.} */ | |
892 | for (i = 0; i < nstrings; i++) | |
893 | free (temp_array[i].transformed); | |
894 | @} | |
895 | @end smallexample | |
896 | ||
897 | @strong{Compatibility Note:} The string collation functions are a new | |
b4012b75 | 898 | feature of @w{ISO C 89}. Older C dialects have no equivalent feature. |
28f540f4 | 899 | |
b4012b75 | 900 | @node Search Functions |
28f540f4 RM |
901 | @section Search Functions |
902 | ||
903 | This section describes library functions which perform various kinds | |
904 | of searching operations on strings and arrays. These functions are | |
905 | declared in the header file @file{string.h}. | |
906 | @pindex string.h | |
907 | @cindex search functions (for strings) | |
908 | @cindex string search functions | |
909 | ||
910 | @comment string.h | |
f65fd747 | 911 | @comment ISO |
28f540f4 RM |
912 | @deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size}) |
913 | This function finds the first occurrence of the byte @var{c} (converted | |
914 | to an @code{unsigned char}) in the initial @var{size} bytes of the | |
915 | object beginning at @var{block}. The return value is a pointer to the | |
916 | located byte, or a null pointer if no match was found. | |
917 | @end deftypefun | |
918 | ||
919 | @comment string.h | |
f65fd747 | 920 | @comment ISO |
28f540f4 RM |
921 | @deftypefun {char *} strchr (const char *@var{string}, int @var{c}) |
922 | The @code{strchr} function finds the first occurrence of the character | |
923 | @var{c} (converted to a @code{char}) in the null-terminated string | |
924 | beginning at @var{string}. The return value is a pointer to the located | |
925 | character, or a null pointer if no match was found. | |
926 | ||
927 | For example, | |
928 | @smallexample | |
929 | strchr ("hello, world", 'l') | |
930 | @result{} "llo, world" | |
931 | strchr ("hello, world", '?') | |
932 | @result{} NULL | |
a5113b14 | 933 | @end smallexample |
28f540f4 RM |
934 | |
935 | The terminating null character is considered to be part of the string, | |
936 | so you can use this function get a pointer to the end of a string by | |
937 | specifying a null character as the value of the @var{c} argument. | |
938 | @end deftypefun | |
939 | ||
940 | @comment string.h | |
941 | @comment BSD | |
942 | @deftypefun {char *} index (const char *@var{string}, int @var{c}) | |
943 | @code{index} is another name for @code{strchr}; they are exactly the same. | |
5649a1d6 UD |
944 | New code should always use @code{strchr} since this name is defined in |
945 | @w{ISO C} while @code{index} is a BSD invention which never was available | |
946 | on @w{System V} derived systems. | |
28f540f4 RM |
947 | @end deftypefun |
948 | ||
949 | @comment string.h | |
f65fd747 | 950 | @comment ISO |
28f540f4 RM |
951 | @deftypefun {char *} strrchr (const char *@var{string}, int @var{c}) |
952 | The function @code{strrchr} is like @code{strchr}, except that it searches | |
953 | backwards from the end of the string @var{string} (instead of forwards | |
954 | from the front). | |
955 | ||
956 | For example, | |
957 | @smallexample | |
958 | strrchr ("hello, world", 'l') | |
959 | @result{} "ld" | |
960 | @end smallexample | |
961 | @end deftypefun | |
962 | ||
963 | @comment string.h | |
964 | @comment BSD | |
965 | @deftypefun {char *} rindex (const char *@var{string}, int @var{c}) | |
966 | @code{rindex} is another name for @code{strrchr}; they are exactly the same. | |
5649a1d6 UD |
967 | New code should always use @code{strrchr} since this name is defined in |
968 | @w{ISO C} while @code{rindex} is a BSD invention which never was available | |
969 | on @w{System V} derived systems. | |
28f540f4 RM |
970 | @end deftypefun |
971 | ||
972 | @comment string.h | |
f65fd747 | 973 | @comment ISO |
28f540f4 RM |
974 | @deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle}) |
975 | This is like @code{strchr}, except that it searches @var{haystack} for a | |
976 | substring @var{needle} rather than just a single character. It | |
977 | returns a pointer into the string @var{haystack} that is the first | |
978 | character of the substring, or a null pointer if no match was found. If | |
979 | @var{needle} is an empty string, the function returns @var{haystack}. | |
980 | ||
981 | For example, | |
982 | @smallexample | |
983 | strstr ("hello, world", "l") | |
984 | @result{} "llo, world" | |
985 | strstr ("hello, world", "wo") | |
986 | @result{} "world" | |
987 | @end smallexample | |
988 | @end deftypefun | |
989 | ||
990 | ||
991 | @comment string.h | |
992 | @comment GNU | |
63551311 | 993 | @deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len}) |
28f540f4 RM |
994 | This is like @code{strstr}, but @var{needle} and @var{haystack} are byte |
995 | arrays rather than null-terminated strings. @var{needle-len} is the | |
996 | length of @var{needle} and @var{haystack-len} is the length of | |
997 | @var{haystack}.@refill | |
998 | ||
999 | This function is a GNU extension. | |
1000 | @end deftypefun | |
1001 | ||
1002 | @comment string.h | |
f65fd747 | 1003 | @comment ISO |
28f540f4 RM |
1004 | @deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset}) |
1005 | The @code{strspn} (``string span'') function returns the length of the | |
1006 | initial substring of @var{string} that consists entirely of characters that | |
1007 | are members of the set specified by the string @var{skipset}. The order | |
1008 | of the characters in @var{skipset} is not important. | |
1009 | ||
1010 | For example, | |
1011 | @smallexample | |
1012 | strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz") | |
1013 | @result{} 5 | |
1014 | @end smallexample | |
1015 | @end deftypefun | |
1016 | ||
1017 | @comment string.h | |
f65fd747 | 1018 | @comment ISO |
28f540f4 RM |
1019 | @deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset}) |
1020 | The @code{strcspn} (``string complement span'') function returns the length | |
1021 | of the initial substring of @var{string} that consists entirely of characters | |
1022 | that are @emph{not} members of the set specified by the string @var{stopset}. | |
1023 | (In other words, it returns the offset of the first character in @var{string} | |
1024 | that is a member of the set @var{stopset}.) | |
1025 | ||
1026 | For example, | |
1027 | @smallexample | |
1028 | strcspn ("hello, world", " \t\n,.;!?") | |
1029 | @result{} 5 | |
1030 | @end smallexample | |
1031 | @end deftypefun | |
1032 | ||
1033 | @comment string.h | |
f65fd747 | 1034 | @comment ISO |
28f540f4 RM |
1035 | @deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset}) |
1036 | The @code{strpbrk} (``string pointer break'') function is related to | |
1037 | @code{strcspn}, except that it returns a pointer to the first character | |
1038 | in @var{string} that is a member of the set @var{stopset} instead of the | |
1039 | length of the initial substring. It returns a null pointer if no such | |
1040 | character from @var{stopset} is found. | |
1041 | ||
1042 | @c @group Invalid outside the example. | |
1043 | For example, | |
1044 | ||
1045 | @smallexample | |
1046 | strpbrk ("hello, world", " \t\n,.;!?") | |
1047 | @result{} ", world" | |
1048 | @end smallexample | |
1049 | @c @end group | |
1050 | @end deftypefun | |
1051 | ||
b4012b75 | 1052 | @node Finding Tokens in a String |
28f540f4 RM |
1053 | @section Finding Tokens in a String |
1054 | ||
28f540f4 RM |
1055 | @cindex tokenizing strings |
1056 | @cindex breaking a string into tokens | |
1057 | @cindex parsing tokens from a string | |
1058 | It's fairly common for programs to have a need to do some simple kinds | |
1059 | of lexical analysis and parsing, such as splitting a command string up | |
1060 | into tokens. You can do this with the @code{strtok} function, declared | |
1061 | in the header file @file{string.h}. | |
1062 | @pindex string.h | |
1063 | ||
1064 | @comment string.h | |
f65fd747 | 1065 | @comment ISO |
28f540f4 RM |
1066 | @deftypefun {char *} strtok (char *@var{newstring}, const char *@var{delimiters}) |
1067 | A string can be split into tokens by making a series of calls to the | |
1068 | function @code{strtok}. | |
1069 | ||
1070 | The string to be split up is passed as the @var{newstring} argument on | |
1071 | the first call only. The @code{strtok} function uses this to set up | |
1072 | some internal state information. Subsequent calls to get additional | |
1073 | tokens from the same string are indicated by passing a null pointer as | |
1074 | the @var{newstring} argument. Calling @code{strtok} with another | |
1075 | non-null @var{newstring} argument reinitializes the state information. | |
1076 | It is guaranteed that no other library function ever calls @code{strtok} | |
1077 | behind your back (which would mess up this internal state information). | |
1078 | ||
1079 | The @var{delimiters} argument is a string that specifies a set of delimiters | |
1080 | that may surround the token being extracted. All the initial characters | |
1081 | that are members of this set are discarded. The first character that is | |
1082 | @emph{not} a member of this set of delimiters marks the beginning of the | |
1083 | next token. The end of the token is found by looking for the next | |
1084 | character that is a member of the delimiter set. This character in the | |
1085 | original string @var{newstring} is overwritten by a null character, and the | |
1086 | pointer to the beginning of the token in @var{newstring} is returned. | |
1087 | ||
1088 | On the next call to @code{strtok}, the searching begins at the next | |
1089 | character beyond the one that marked the end of the previous token. | |
1090 | Note that the set of delimiters @var{delimiters} do not have to be the | |
1091 | same on every call in a series of calls to @code{strtok}. | |
1092 | ||
1093 | If the end of the string @var{newstring} is reached, or if the remainder of | |
1094 | string consists only of delimiter characters, @code{strtok} returns | |
1095 | a null pointer. | |
1096 | @end deftypefun | |
1097 | ||
1098 | @strong{Warning:} Since @code{strtok} alters the string it is parsing, | |
1099 | you always copy the string to a temporary buffer before parsing it with | |
1100 | @code{strtok}. If you allow @code{strtok} to modify a string that came | |
1101 | from another part of your program, you are asking for trouble; that | |
1102 | string may be part of a data structure that could be used for other | |
1103 | purposes during the parsing, when alteration by @code{strtok} makes the | |
1104 | data structure temporarily inaccurate. | |
1105 | ||
1106 | The string that you are operating on might even be a constant. Then | |
1107 | when @code{strtok} tries to modify it, your program will get a fatal | |
1108 | signal for writing in read-only memory. @xref{Program Error Signals}. | |
1109 | ||
1110 | This is a special case of a general principle: if a part of a program | |
1111 | does not have as its purpose the modification of a certain data | |
1112 | structure, then it is error-prone to modify the data structure | |
1113 | temporarily. | |
1114 | ||
1115 | The function @code{strtok} is not reentrant. @xref{Nonreentrancy}, for | |
1116 | a discussion of where and why reentrancy is important. | |
1117 | ||
1118 | Here is a simple example showing the use of @code{strtok}. | |
1119 | ||
1120 | @comment Yes, this example has been tested. | |
1121 | @smallexample | |
1122 | #include <string.h> | |
1123 | #include <stddef.h> | |
1124 | ||
1125 | @dots{} | |
1126 | ||
5649a1d6 | 1127 | const char string[] = "words separated by spaces -- and, punctuation!"; |
28f540f4 | 1128 | const char delimiters[] = " .,;:!-"; |
5649a1d6 | 1129 | char *token, *cp; |
28f540f4 RM |
1130 | |
1131 | @dots{} | |
1132 | ||
5649a1d6 UD |
1133 | cp = strdupa (string); /* Make writable copy. */ |
1134 | token = strtok (cp, delimiters); /* token => "words" */ | |
28f540f4 RM |
1135 | token = strtok (NULL, delimiters); /* token => "separated" */ |
1136 | token = strtok (NULL, delimiters); /* token => "by" */ | |
1137 | token = strtok (NULL, delimiters); /* token => "spaces" */ | |
1138 | token = strtok (NULL, delimiters); /* token => "and" */ | |
1139 | token = strtok (NULL, delimiters); /* token => "punctuation" */ | |
1140 | token = strtok (NULL, delimiters); /* token => NULL */ | |
1141 | @end smallexample | |
a5113b14 UD |
1142 | |
1143 | The GNU C library contains two more functions for tokenizing a string | |
1144 | which overcome the limitation of non-reentrancy. | |
1145 | ||
1146 | @comment string.h | |
1147 | @comment POSIX | |
1148 | @deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr}) | |
1149 | Just like @code{strtok} this function splits the string into several | |
1150 | tokens which can be accessed be successive calls to @code{strtok_r}. | |
1151 | The difference is that the information about the next token is not set | |
1152 | up in some internal state information. Instead the caller has to | |
1153 | provide another argument @var{save_ptr} which is a pointer to a string | |
1154 | pointer. Calling @code{strtok_r} with a null pointer for | |
1155 | @var{newstring} and leaving @var{save_ptr} between the calls unchanged | |
1156 | does the job without limiting reentrancy. | |
1157 | ||
5649a1d6 | 1158 | This function is defined in POSIX-1 and can be found on many systems |
a5113b14 UD |
1159 | which support multi-threading. |
1160 | @end deftypefun | |
1161 | ||
1162 | @comment string.h | |
1163 | @comment BSD | |
1164 | @deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter}) | |
1165 | A second reentrant approach is to avoid the additional first argument. | |
1166 | The initialization of the moving pointer has to be done by the user. | |
1167 | Successive calls of @code{strsep} move the pointer along the tokens | |
1168 | separated by @var{delimiter}, returning the address of the next token | |
1169 | and updating @var{string_ptr} to point to the beginning of the next | |
1170 | token. | |
1171 | ||
1172 | This function was introduced in 4.3BSD and therefore is widely available. | |
1173 | @end deftypefun | |
1174 | ||
1175 | Here is how the above example looks like when @code{strsep} is used. | |
1176 | ||
1177 | @comment Yes, this example has been tested. | |
1178 | @smallexample | |
1179 | #include <string.h> | |
1180 | #include <stddef.h> | |
1181 | ||
1182 | @dots{} | |
1183 | ||
5649a1d6 | 1184 | const char string[] = "words separated by spaces -- and, punctuation!"; |
a5113b14 UD |
1185 | const char delimiters[] = " .,;:!-"; |
1186 | char *running; | |
1187 | char *token; | |
1188 | ||
1189 | @dots{} | |
1190 | ||
5649a1d6 | 1191 | running = strdupa (string); |
a5113b14 UD |
1192 | token = strsep (&running, delimiters); /* token => "words" */ |
1193 | token = strsep (&running, delimiters); /* token => "separated" */ | |
1194 | token = strsep (&running, delimiters); /* token => "by" */ | |
1195 | token = strsep (&running, delimiters); /* token => "spaces" */ | |
1196 | token = strsep (&running, delimiters); /* token => "and" */ | |
1197 | token = strsep (&running, delimiters); /* token => "punctuation" */ | |
1198 | token = strsep (&running, delimiters); /* token => NULL */ | |
1199 | @end smallexample | |
b4012b75 UD |
1200 | |
1201 | @node Encode Binary Data | |
1202 | @section Encode Binary Data | |
1203 | ||
1204 | To store or transfer binary data in environments which only support text | |
1205 | one has to encode the binary data by mapping the input bytes to | |
1206 | characters in the range allowed for storing or transfering. SVID | |
1207 | systems (and nowadays XPG compliant systems) have such a function in the | |
1208 | C library. | |
1209 | ||
1210 | @comment stdlib.h | |
1211 | @comment XPG | |
1212 | @deftypefun {char *} l64a (long int @var{n}) | |
1213 | This function encodes an input value with 32 bits using characters from | |
1214 | the basic character set. Groups of 6 bits are encoded using the | |
1215 | following table: | |
1216 | ||
1217 | @multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} | |
1218 | @item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7 | |
1219 | @item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1} | |
1220 | @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5} | |
1221 | @item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9} | |
1222 | @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D} | |
1223 | @item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H} | |
1224 | @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L} | |
1225 | @item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P} | |
1226 | @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T} | |
1227 | @item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X} | |
1228 | @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b} | |
1229 | @item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f} | |
1230 | @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j} | |
1231 | @item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n} | |
1232 | @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r} | |
1233 | @item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v} | |
1234 | @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z} | |
1235 | @end multitable | |
1236 | ||
1237 | The function returns a pointer to a static buffer which contains the | |
1238 | string representing of the encoding of @var{n}. To encoded a series of | |
1239 | bytes the use should append the new string to the destination buffer. | |
1240 | @emph{Warning:} Since a static buffer is used this function should not | |
5649a1d6 | 1241 | be used in multi-threaded programs. There is no thread-safe alternative |
b4012b75 UD |
1242 | to this function in the C library. |
1243 | @end deftypefun | |
1244 | ||
5649a1d6 UD |
1245 | Alone the @code{l64a} function is not usable. To encode arbitrary |
1246 | sequences of bytes one needs some more code and this could look like | |
1247 | this: | |
1248 | ||
1249 | @smallexample | |
1250 | char * | |
1251 | encode (const void *buf, size_t len) | |
1252 | @{ | |
1253 | /* @r{We know in advance how long the buffer has to be.} */ | |
1254 | unsigned char *in = (unsigned char *) buf; | |
1255 | char *out = malloc (6 + ((len + 3) / 4) * 6 + 1); | |
1256 | char *cp = out; | |
1257 | ||
1258 | /* @r{Encode the length.} */ | |
1259 | memcpy (cp, l64a (len), 6); | |
1260 | cp += 6; | |
1261 | ||
1262 | while (len > 3) | |
1263 | @{ | |
1264 | unsigned long int n = *in++; | |
1265 | n = (n << 8) | *in++; | |
1266 | n = (n << 8) | *in++; | |
1267 | n = (n << 8) | *in++; | |
1268 | len -= 4; | |
1269 | /* @r{Using `htonl' is necessary so that the data can be} | |
1270 | @r{decoded even on machines with different byte order.} */ | |
1271 | memcpy (cp, l64a (htonl (n)), 6); | |
1272 | cp += 6; | |
1273 | @} | |
1274 | if (len > 0) | |
1275 | @{ | |
1276 | unsigned long int n = *in++; | |
1277 | if (--len > 0) | |
1278 | @{ | |
1279 | n = (n << 8) | *in++; | |
1280 | if (--len > 0) | |
1281 | n = (n << 8) | *in; | |
1282 | @} | |
1283 | memcpy (cp, l64a (htonl (n)), 6); | |
1284 | cp += 6; | |
1285 | @} | |
1286 | *cp = '\0'; | |
1287 | return out; | |
1288 | @} | |
1289 | @end smallexample | |
1290 | ||
1291 | It is strange that the library does not provide the complete | |
1292 | functionality needed but so be it. There are some other encoding | |
1293 | methods which are much more widely used (UU encoding, Base64 encoding). | |
1294 | Generally, it is better to use one of these encodings. | |
1295 | ||
b4012b75 UD |
1296 | To decode data produced with @code{l64a} the following function should be |
1297 | used. | |
1298 | ||
5649a1d6 UD |
1299 | @comment stdlib.h |
1300 | @comment XPG | |
b4012b75 UD |
1301 | @deftypefun {long int} a64l (const char *@var{string}) |
1302 | The parameter @var{string} should contain a string which was produced by | |
1303 | a call to @code{l64a}. The function processes the next 6 characters and | |
1304 | decodes the characters it finds according to the table above. | |
1305 | Characters not in the conversion table are simply ignored. This is | |
1306 | useful for breaking the information in lines in which case the end of | |
1307 | line characters are simply ignored. | |
1308 | ||
1309 | The decoded number is returned at the end as a @code{long int} value. | |
1310 | Consecutive calls to this function are possible but the caller must make | |
1311 | sure the buffer pointer is update after each call to @code{a64l} since | |
1312 | this function does not modify the buffer pointer. Every call consumes 6 | |
1313 | characters. | |
1314 | @end deftypefun | |
b13927da UD |
1315 | |
1316 | @node Argz and Envz Vectors | |
1317 | @section Argz and Envz Vectors | |
1318 | ||
5649a1d6 | 1319 | @cindex argz vectors (string vectors) |
b13927da UD |
1320 | @cindex string vectors, null-character separated |
1321 | @cindex argument vectors, null-character separated | |
1322 | @dfn{argz vectors} are vectors of strings in a contiguous block of | |
1323 | memory, each element separated from its neighbors by null-characters | |
1324 | (@code{'\0'}). | |
1325 | ||
5649a1d6 | 1326 | @cindex envz vectors (environment vectors) |
b13927da UD |
1327 | @cindex environment vectors, null-character separated |
1328 | @dfn{Envz vectors} are an extension of argz vectors where each element is a | |
5649a1d6 | 1329 | name-value pair, separated by a @code{'='} character (as in a Unix |
b13927da UD |
1330 | environment). |
1331 | ||
1332 | @menu | |
1333 | * Argz Functions:: Operations on argz vectors. | |
1334 | * Envz Functions:: Additional operations on environment vectors. | |
1335 | @end menu | |
1336 | ||
1337 | @node Argz Functions, Envz Functions, , Argz and Envz Vectors | |
1338 | @subsection Argz Functions | |
1339 | ||
1340 | Each argz vector is represented by a pointer to the first element, of | |
1341 | type @code{char *}, and a size, of type @code{size_t}, both of which can | |
1342 | be initialized to @code{0} to represent an empty argz vector. All argz | |
1343 | functions accept either a pointer and a size argument, or pointers to | |
1344 | them, if they will be modified. | |
1345 | ||
1346 | The argz functions use @code{malloc}/@code{realloc} to allocate/grow | |
1347 | argz vectors, and so any argz vector creating using these functions may | |
1348 | be freed by using @code{free}; conversely, any argz function that may | |
1349 | grow a string expects that string to have been allocated using | |
1350 | @code{malloc} (those argz functions that only examine their arguments or | |
1351 | modify them in place will work on any sort of memory). | |
1352 | @xref{Unconstrained Allocation}. | |
1353 | ||
1354 | All argz functions that do memory allocation have a return type of | |
1355 | @code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an | |
1356 | allocation error occurs. | |
1357 | ||
1358 | @pindex argz.h | |
1359 | These functions are declared in the standard include file @file{argz.h}. | |
1360 | ||
5649a1d6 UD |
1361 | @comment argz.h |
1362 | @comment GNU | |
b13927da | 1363 | @deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len}) |
5649a1d6 | 1364 | The @code{argz_create} function converts the Unix-style argument vector |
b13927da UD |
1365 | @var{argv} (a vector of pointers to normal C strings, terminated by |
1366 | @code{(char *)0}; @pxref{Program Arguments}) into an argz vector with | |
1367 | the same elements, which is returned in @var{argz} and @var{argz_len}. | |
1368 | @end deftypefun | |
1369 | ||
5649a1d6 UD |
1370 | @comment argz.h |
1371 | @comment GNU | |
b13927da UD |
1372 | @deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len}) |
1373 | The @code{argz_create_sep} function converts the null-terminated string | |
1374 | @var{string} into an argz vector (returned in @var{argz} and | |
1375 | @var{argz_len}) by splitting it into elements at every occurance of the | |
1376 | character @var{sep}. | |
1377 | @end deftypefun | |
1378 | ||
5649a1d6 UD |
1379 | @comment argz.h |
1380 | @comment GNU | |
b13927da UD |
1381 | @deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{arg_len}) |
1382 | Returns the number of elements in the argz vector @var{argz} and | |
1383 | @var{argz_len}. | |
1384 | @end deftypefun | |
1385 | ||
5649a1d6 UD |
1386 | @comment argz.h |
1387 | @comment GNU | |
b13927da UD |
1388 | @deftypefun {void} argz_extract (char *@var{argz}, size_t @var{argz_len}, char **@var{argv}) |
1389 | The @code{argz_extract} function converts the argz vector @var{argz} and | |
5649a1d6 | 1390 | @var{argz_len} into a Unix-style argument vector stored in @var{argv}, |
b13927da UD |
1391 | by putting pointers to every element in @var{argz} into successive |
1392 | positions in @var{argv}, followed by a terminator of @code{0}. | |
1393 | @var{Argv} must be pre-allocated with enough space to hold all the | |
1394 | elements in @var{argz} plus the terminating @code{(char *)0} | |
1395 | (@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)} | |
1396 | bytes should be enough). Note that the string pointers stored into | |
1397 | @var{argv} point into @var{argz}---they are not copies---and so | |
1398 | @var{argz} must be copied if it will be changed while @var{argv} is | |
1399 | still active. This function is useful for passing the elements in | |
1400 | @var{argz} to an exec function (@pxref{Executing a File}). | |
1401 | @end deftypefun | |
1402 | ||
5649a1d6 UD |
1403 | @comment argz.h |
1404 | @comment GNU | |
b13927da UD |
1405 | @deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep}) |
1406 | The @code{argz_stringify} converts @var{argz} into a normal string with | |
1407 | the elements separated by the character @var{sep}, by replacing each | |
1408 | @code{'\0'} inside @var{argz} (except the last one, which terminates the | |
1409 | string) with @var{sep}. This is handy for printing @var{argz} in a | |
1410 | readable manner. | |
1411 | @end deftypefun | |
1412 | ||
5649a1d6 UD |
1413 | @comment argz.h |
1414 | @comment GNU | |
b13927da UD |
1415 | @deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}) |
1416 | The @code{argz_add} function adds the string @var{str} to the end of the | |
1417 | argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and | |
1418 | @code{*@var{argz_len}} accordingly. | |
1419 | @end deftypefun | |
1420 | ||
5649a1d6 UD |
1421 | @comment argz.h |
1422 | @comment GNU | |
b13927da UD |
1423 | @deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim}) |
1424 | The @code{argz_add_sep} function is similar to @code{argz_add}, but | |
1425 | @var{str} is split into separate elements in the result at occurances of | |
1426 | the character @var{delim}. This is useful, for instance, for | |
5649a1d6 | 1427 | adding the components of a Unix search path to an argz vector, by using |
b13927da UD |
1428 | a value of @code{':'} for @var{delim}. |
1429 | @end deftypefun | |
1430 | ||
5649a1d6 UD |
1431 | @comment argz.h |
1432 | @comment GNU | |
b13927da UD |
1433 | @deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len}) |
1434 | The @code{argz_append} function appends @var{buf_len} bytes starting at | |
1435 | @var{buf} to the argz vector @code{*@var{argz}}, reallocating | |
1436 | @code{*@var{argz}} to accommodate it, and adding @var{buf_len} to | |
1437 | @code{*@var{argz_len}}. | |
1438 | @end deftypefun | |
1439 | ||
5649a1d6 UD |
1440 | @comment argz.h |
1441 | @comment GNU | |
b13927da UD |
1442 | @deftypefun {error_t} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry}) |
1443 | If @var{entry} points to the beginning of one of the elements in the | |
1444 | argz vector @code{*@var{argz}}, the @code{argz_delete} function will | |
1445 | remove this entry and reallocate @code{*@var{argz}}, modifying | |
1446 | @code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as | |
1447 | destructive argz functions usually reallocate their argz argument, | |
1448 | pointers into argz vectors such as @var{entry} will then become invalid. | |
1449 | @end deftypefun | |
1450 | ||
5649a1d6 UD |
1451 | @comment argz.h |
1452 | @comment GNU | |
b13927da UD |
1453 | @deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry}) |
1454 | The @code{argz_insert} function inserts the string @var{entry} into the | |
1455 | argz vector @code{*@var{argz}} at a point just before the existing | |
1456 | element pointed to by @var{before}, reallocating @code{*@var{argz}} and | |
1457 | updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before} | |
1458 | is @code{0}, @var{entry} is added to the end instead (as if by | |
1459 | @code{argz_add}). Since the first element is in fact the same as | |
1460 | @code{*@var{argz}}, passing in @code{*@var{argz}} as the value of | |
1461 | @var{before} will result in @var{entry} being inserted at the beginning. | |
1462 | @end deftypefun | |
1463 | ||
5649a1d6 UD |
1464 | @comment argz.h |
1465 | @comment GNU | |
b13927da UD |
1466 | @deftypefun {char *} argz_next (char *@var{argz}, size_t @var{argz_len}, const char *@var{entry}) |
1467 | The @code{argz_next} function provides a convenient way of iterating | |
1468 | over the elements in the argz vector @var{argz}. It returns a pointer | |
1469 | to the next element in @var{argz} after the element @var{entry}, or | |
1470 | @code{0} if there are no elements following @var{entry}. If @var{entry} | |
1471 | is @code{0}, the first element of @var{argz} is returned. | |
1472 | ||
1473 | This behavior suggests two styles of iteration: | |
1474 | ||
1475 | @smallexample | |
1476 | char *entry = 0; | |
1477 | while ((entry = argz_next (@var{argz}, @var{argz_len}, entry))) | |
1478 | @var{action}; | |
1479 | @end smallexample | |
1480 | ||
1481 | (the double parentheses are necessary to make some C compilers shut up | |
1482 | about what they consider a questionable @code{while}-test) and: | |
1483 | ||
1484 | @smallexample | |
1485 | char *entry; | |
1486 | for (entry = @var{argz}; | |
1487 | entry; | |
1488 | entry = argz_next (@var{argz}, @var{argz_len}, entry)) | |
1489 | @var{action}; | |
1490 | @end smallexample | |
1491 | ||
1492 | Note that the latter depends on @var{argz} having a value of @code{0} if | |
1493 | it is empty (rather than a pointer to an empty block of memory); this | |
1494 | invariant is maintained for argz vectors created by the functions here. | |
1495 | @end deftypefun | |
1496 | ||
d705269e UD |
1497 | @comment argz.h |
1498 | @comment GNU | |
1499 | @deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}}) | |
1500 | Replace any occurances of the string @var{str} in @var{argz} with | |
1501 | @var{with}, reallocating @var{argz} as necessary. If | |
1502 | @var{replace_count} is non-zero, @code{*@var{replace_count}} will be | |
1503 | incremented by number of replacements performed. | |
1504 | @end deftypefun | |
1505 | ||
b13927da UD |
1506 | @node Envz Functions, , Argz Functions, Argz and Envz Vectors |
1507 | @subsection Envz Functions | |
1508 | ||
1509 | Envz vectors are just argz vectors with additional constraints on the form | |
1510 | of each element; as such, argz functions can also be used on them, where it | |
1511 | makes sense. | |
1512 | ||
1513 | Each element in an envz vector is a name-value pair, separated by a @code{'='} | |
1514 | character; if multiple @code{'='} characters are present in an element, those | |
1515 | after the first are considered part of the value, and treated like all other | |
1516 | non-@code{'\0'} characters. | |
1517 | ||
1518 | If @emph{no} @code{'='} characters are present in an element, that element is | |
1519 | considered the name of a ``null'' entry, as distinct from an entry with an | |
1520 | empty value: @code{envz_get} will return @code{0} if given the name of null | |
1521 | entry, whereas an entry with an empty value would result in a value of | |
1522 | @code{""}; @code{envz_entry} will still find such entries, however. Null | |
1523 | entries can be removed with @code{envz_strip} function. | |
1524 | ||
1525 | As with argz functions, envz functions that may allocate memory (and thus | |
1526 | fail) have a return type of @code{error_t}, and return either @code{0} or | |
1527 | @code{ENOMEM}. | |
1528 | ||
1529 | @pindex envz.h | |
1530 | These functions are declared in the standard include file @file{envz.h}. | |
1531 | ||
5649a1d6 UD |
1532 | @comment envz.h |
1533 | @comment GNU | |
b13927da UD |
1534 | @deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) |
1535 | The @code{envz_entry} function finds the entry in @var{envz} with the name | |
1536 | @var{name}, and returns a pointer to the whole entry---that is, the argz | |
1537 | element which begins with @var{name} followed by a @code{'='} character. If | |
1538 | there is no entry with that name, @code{0} is returned. | |
1539 | @end deftypefun | |
1540 | ||
5649a1d6 UD |
1541 | @comment envz.h |
1542 | @comment GNU | |
b13927da UD |
1543 | @deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) |
1544 | The @code{envz_get} function finds the entry in @var{envz} with the name | |
1545 | @var{name} (like @code{envz_entry}), and returns a pointer to the value | |
1546 | portion of that entry (following the @code{'='}). If there is no entry with | |
1547 | that name (or only a null entry), @code{0} is returned. | |
1548 | @end deftypefun | |
1549 | ||
5649a1d6 UD |
1550 | @comment envz.h |
1551 | @comment GNU | |
b13927da UD |
1552 | @deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value}) |
1553 | The @code{envz_add} function adds an entry to @code{*@var{envz}} | |
1554 | (updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name | |
1555 | @var{name}, and value @var{value}. If an entry with the same name | |
1556 | already exists in @var{envz}, it is removed first. If @var{value} is | |
1557 | @code{0}, then the new entry will the special null type of entry | |
1558 | (mentioned above). | |
1559 | @end deftypefun | |
1560 | ||
5649a1d6 UD |
1561 | @comment envz.h |
1562 | @comment GNU | |
b13927da UD |
1563 | @deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override}) |
1564 | The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz}, | |
1565 | as if with @code{envz_add}, updating @code{*@var{envz}} and | |
1566 | @code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2} | |
1567 | will supersede those with the same name in @var{envz}, otherwise not. | |
1568 | ||
1569 | Null entries are treated just like other entries in this respect, so a null | |
1570 | entry in @var{envz} can prevent an entry of the same name in @var{envz2} from | |
1571 | being added to @var{envz}, if @var{override} is false. | |
1572 | @end deftypefun | |
1573 | ||
5649a1d6 UD |
1574 | @comment envz.h |
1575 | @comment GNU | |
b13927da UD |
1576 | @deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len}) |
1577 | The @code{envz_strip} function removes any null entries from @var{envz}, | |
1578 | updating @code{*@var{envz}} and @code{*@var{envz_len}}. | |
1579 | @end deftypefun |