]>
Commit | Line | Data |
---|---|---|
390955cb | 1 | @node String and Array Utilities, Character Set Handling, Character Handling, Top |
7a68c94a | 2 | @c %MENU% Utilities for copying and comparing strings and arrays |
28f540f4 RM |
3 | @chapter String and Array Utilities |
4 | ||
5 | Operations on strings (or arrays of characters) are an important part of | |
1f77f049 | 6 | many programs. @Theglibc{} provides an extensive set of string |
28f540f4 RM |
7 | utility functions, including functions for copying, concatenating, |
8 | comparing, and searching strings. Many of these functions can also | |
9 | operate on arbitrary regions of storage; for example, the @code{memcpy} | |
a5113b14 | 10 | function can be used to copy the contents of any kind of array. |
28f540f4 RM |
11 | |
12 | It's fairly common for beginning C programmers to ``reinvent the wheel'' | |
13 | by duplicating this functionality in their own code, but it pays to | |
14 | become familiar with the library functions and to make use of them, | |
15 | since this offers benefits in maintenance, efficiency, and portability. | |
16 | ||
17 | For instance, you could easily compare one string to another in two | |
18 | lines of C code, but if you use the built-in @code{strcmp} function, | |
19 | you're less likely to make a mistake. And, since these library | |
20 | functions are typically highly optimized, your program may run faster | |
21 | too. | |
22 | ||
23 | @menu | |
24 | * Representation of Strings:: Introduction to basic concepts. | |
25 | * String/Array Conventions:: Whether to use a string function or an | |
26 | arbitrary array function. | |
27 | * String Length:: Determining the length of a string. | |
28 | * Copying and Concatenation:: Functions to copy the contents of strings | |
29 | and arrays. | |
30 | * String/Array Comparison:: Functions for byte-wise and character-wise | |
31 | comparison. | |
32 | * Collation Functions:: Functions for collating strings. | |
33 | * Search Functions:: Searching for a specific element or substring. | |
34 | * Finding Tokens in a String:: Splitting a string into tokens by looking | |
35 | for delimiters. | |
0e4ee106 UD |
36 | * strfry:: Function for flash-cooking a string. |
37 | * Trivial Encryption:: Obscuring data. | |
b4012b75 | 38 | * Encode Binary Data:: Encoding and Decoding of Binary Data. |
b13927da | 39 | * Argz and Envz Vectors:: Null-separated string vectors. |
28f540f4 RM |
40 | @end menu |
41 | ||
b4012b75 | 42 | @node Representation of Strings |
28f540f4 RM |
43 | @section Representation of Strings |
44 | @cindex string, representation of | |
45 | ||
46 | This section is a quick summary of string concepts for beginning C | |
47 | programmers. It describes how character strings are represented in C | |
48 | and some common pitfalls. If you are already familiar with this | |
49 | material, you can skip this section. | |
50 | ||
51 | @cindex string | |
8a2f1f5b | 52 | @cindex multibyte character string |
28f540f4 RM |
53 | A @dfn{string} is an array of @code{char} objects. But string-valued |
54 | variables are usually declared to be pointers of type @code{char *}. | |
55 | Such variables do not include space for the text of a string; that has | |
56 | to be stored somewhere else---in an array variable, a string constant, | |
57 | or dynamically allocated memory (@pxref{Memory Allocation}). It's up to | |
58 | you to store the address of the chosen memory space into the pointer | |
59 | variable. Alternatively you can store a @dfn{null pointer} in the | |
60 | pointer variable. The null pointer does not point anywhere, so | |
61 | attempting to reference the string it points to gets an error. | |
62 | ||
8a2f1f5b UD |
63 | @cindex wide character string |
64 | ``string'' normally refers to multibyte character strings as opposed to | |
65 | wide character strings. Wide character strings are arrays of type | |
66 | @code{wchar_t} and as for multibyte character strings usually pointers | |
67 | of type @code{wchar_t *} are used. | |
68 | ||
69 | @cindex null character | |
70 | @cindex null wide character | |
28f540f4 | 71 | By convention, a @dfn{null character}, @code{'\0'}, marks the end of a |
8a2f1f5b UD |
72 | multibyte character string and the @dfn{null wide character}, |
73 | @code{L'\0'}, marks the end of a wide character string. For example, in | |
74 | testing to see whether the @code{char *} variable @var{p} points to a | |
75 | null character marking the end of a string, you can write | |
76 | @code{!*@var{p}} or @code{*@var{p} == '\0'}. | |
28f540f4 RM |
77 | |
78 | A null character is quite different conceptually from a null pointer, | |
79 | although both are represented by the integer @code{0}. | |
80 | ||
81 | @cindex string literal | |
82 | @dfn{String literals} appear in C program source as strings of | |
8a2f1f5b UD |
83 | characters between double-quote characters (@samp{"}) where the initial |
84 | double-quote character is immediately preceded by a capital @samp{L} | |
85 | (ell) character (as in @code{L"foo"}). In @w{ISO C}, string literals | |
86 | can also be formed by @dfn{string concatenation}: @code{"a" "b"} is the | |
87 | same as @code{"ab"}. For wide character strings one can either use | |
88 | @code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is | |
89 | not allowed by the GNU C compiler, because literals are placed in | |
90 | read-only storage. | |
28f540f4 RM |
91 | |
92 | Character arrays that are declared @code{const} cannot be modified | |
93 | either. It's generally good style to declare non-modifiable string | |
94 | pointers to be of type @code{const char *}, since this often allows the | |
95 | C compiler to detect accidental modifications as well as providing some | |
96 | amount of documentation about what your program intends to do with the | |
97 | string. | |
98 | ||
99 | The amount of memory allocated for the character array may extend past | |
100 | the null character that normally marks the end of the string. In this | |
dd7d45e8 | 101 | document, the term @dfn{allocated size} is always used to refer to the |
28f540f4 RM |
102 | total amount of memory allocated for the string, while the term |
103 | @dfn{length} refers to the number of characters up to (but not | |
104 | including) the terminating null character. | |
105 | @cindex length of string | |
106 | @cindex allocation size of string | |
107 | @cindex size of string | |
108 | @cindex string length | |
109 | @cindex string allocation | |
110 | ||
111 | A notorious source of program bugs is trying to put more characters in a | |
112 | string than fit in its allocated size. When writing code that extends | |
113 | strings or moves characters into a pre-allocated array, you should be | |
114 | very careful to keep track of the length of the text and make explicit | |
115 | checks for overflowing the array. Many of the library functions | |
116 | @emph{do not} do this for you! Remember also that you need to allocate | |
117 | an extra byte to hold the null character that marks the end of the | |
118 | string. | |
119 | ||
8a2f1f5b UD |
120 | @cindex single-byte string |
121 | @cindex multibyte string | |
122 | Originally strings were sequences of bytes where each byte represents a | |
123 | single character. This is still true today if the strings are encoded | |
124 | using a single-byte character encoding. Things are different if the | |
125 | strings are encoded using a multibyte encoding (for more information on | |
126 | encodings see @ref{Extended Char Intro}). There is no difference in | |
127 | the programming interface for these two kind of strings; the programmer | |
128 | has to be aware of this and interpret the byte sequences accordingly. | |
129 | ||
130 | But since there is no separate interface taking care of these | |
131 | differences the byte-based string functions are sometimes hard to use. | |
132 | Since the count parameters of these functions specify bytes a call to | |
133 | @code{strncpy} could cut a multibyte character in the middle and put an | |
134 | incomplete (and therefore unusable) byte sequence in the target buffer. | |
135 | ||
136 | @cindex wide character string | |
137 | To avoid these problems later versions of the @w{ISO C} standard | |
138 | introduce a second set of functions which are operating on @dfn{wide | |
139 | characters} (@pxref{Extended Char Intro}). These functions don't have | |
140 | the problems the single-byte versions have since every wide character is | |
141 | a legal, interpretable value. This does not mean that cutting wide | |
142 | character strings at arbitrary points is without problems. It normally | |
143 | is for alphabet-based languages (except for non-normalized text) but | |
144 | languages based on syllables still have the problem that more than one | |
145 | wide character is necessary to complete a logical unit. This is a | |
146 | higher level problem which the @w{C library} functions are not designed | |
147 | to solve. But it is at least good that no invalid byte sequences can be | |
148 | created. Also, the higher level functions can also much easier operate | |
149 | on wide character than on multibyte characters so that a general advise | |
150 | is to use wide characters internally whenever text is more than simply | |
151 | copied. | |
152 | ||
153 | The remaining of this chapter will discuss the functions for handling | |
154 | wide character strings in parallel with the discussion of the multibyte | |
155 | character strings since there is almost always an exact equivalent | |
156 | available. | |
157 | ||
b4012b75 | 158 | @node String/Array Conventions |
28f540f4 RM |
159 | @section String and Array Conventions |
160 | ||
161 | This chapter describes both functions that work on arbitrary arrays or | |
162 | blocks of memory, and functions that are specific to null-terminated | |
8a2f1f5b | 163 | arrays of characters and wide characters. |
28f540f4 RM |
164 | |
165 | Functions that operate on arbitrary blocks of memory have names | |
8a2f1f5b UD |
166 | beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and |
167 | @code{wmemcpy}) and invariably take an argument which specifies the size | |
168 | (in bytes and wide characters respectively) of the block of memory to | |
28f540f4 | 169 | operate on. The array arguments and return values for these functions |
8a2f1f5b UD |
170 | have type @code{void *} or @code{wchar_t}. As a matter of style, the |
171 | elements of the arrays used with the @samp{mem} functions are referred | |
172 | to as ``bytes''. You can pass any kind of pointer to these functions, | |
173 | and the @code{sizeof} operator is useful in computing the value for the | |
174 | size argument. Parameters to the @samp{wmem} functions must be of type | |
175 | @code{wchar_t *}. These functions are not really usable with anything | |
176 | but arrays of this type. | |
177 | ||
178 | In contrast, functions that operate specifically on strings and wide | |
179 | character strings have names beginning with @samp{str} and @samp{wcs} | |
180 | respectively (such as @code{strcpy} and @code{wcscpy}) and look for a | |
181 | null character to terminate the string instead of requiring an explicit | |
182 | size argument to be passed. (Some of these functions accept a specified | |
28f540f4 RM |
183 | maximum length, but they also check for premature termination with a |
184 | null character.) The array arguments and return values for these | |
8a2f1f5b UD |
185 | functions have type @code{char *} and @code{wchar_t *} respectively, and |
186 | the array elements are referred to as ``characters'' and ``wide | |
187 | characters''. | |
188 | ||
189 | In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs} | |
190 | versions of a function. The one that is more appropriate to use depends | |
191 | on the exact situation. When your program is manipulating arbitrary | |
192 | arrays or blocks of storage, then you should always use the @samp{mem} | |
193 | functions. On the other hand, when you are manipulating null-terminated | |
194 | strings it is usually more convenient to use the @samp{str}/@samp{wcs} | |
195 | functions, unless you already know the length of the string in advance. | |
196 | The @samp{wmem} functions should be used for wide character arrays with | |
197 | known size. | |
198 | ||
199 | @cindex wint_t | |
200 | @cindex parameter promotion | |
201 | Some of the memory and string functions take single characters as | |
202 | arguments. Since a value of type @code{char} is automatically promoted | |
203 | into an value of type @code{int} when used as a parameter, the functions | |
204 | are declared with @code{int} as the type of the parameter in question. | |
205 | In case of the wide character function the situation is similarly: the | |
206 | parameter type for a single wide character is @code{wint_t} and not | |
207 | @code{wchar_t}. This would for many implementations not be necessary | |
208 | since the @code{wchar_t} is large enough to not be automatically | |
209 | promoted, but since the @w{ISO C} standard does not require such a | |
210 | choice of types the @code{wint_t} type is used. | |
28f540f4 | 211 | |
b4012b75 | 212 | @node String Length |
28f540f4 RM |
213 | @section String Length |
214 | ||
215 | You can get the length of a string using the @code{strlen} function. | |
216 | This function is declared in the header file @file{string.h}. | |
217 | @pindex string.h | |
218 | ||
219 | @comment string.h | |
f65fd747 | 220 | @comment ISO |
28f540f4 RM |
221 | @deftypefun size_t strlen (const char *@var{s}) |
222 | The @code{strlen} function returns the length of the null-terminated | |
8a2f1f5b UD |
223 | string @var{s} in bytes. (In other words, it returns the offset of the |
224 | terminating null character within the array.) | |
28f540f4 RM |
225 | |
226 | For example, | |
227 | @smallexample | |
228 | strlen ("hello, world") | |
229 | @result{} 12 | |
230 | @end smallexample | |
231 | ||
232 | When applied to a character array, the @code{strlen} function returns | |
dd7d45e8 UD |
233 | the length of the string stored there, not its allocated size. You can |
234 | get the allocated size of the character array that holds a string using | |
28f540f4 RM |
235 | the @code{sizeof} operator: |
236 | ||
237 | @smallexample | |
a5113b14 | 238 | char string[32] = "hello, world"; |
28f540f4 RM |
239 | sizeof (string) |
240 | @result{} 32 | |
241 | strlen (string) | |
242 | @result{} 12 | |
243 | @end smallexample | |
dd7d45e8 UD |
244 | |
245 | But beware, this will not work unless @var{string} is the character | |
246 | array itself, not a pointer to it. For example: | |
247 | ||
248 | @smallexample | |
249 | char string[32] = "hello, world"; | |
250 | char *ptr = string; | |
251 | sizeof (string) | |
252 | @result{} 32 | |
253 | sizeof (ptr) | |
254 | @result{} 4 /* @r{(on a machine with 4 byte pointers)} */ | |
255 | @end smallexample | |
256 | ||
257 | This is an easy mistake to make when you are working with functions that | |
258 | take string arguments; those arguments are always pointers, not arrays. | |
259 | ||
8a2f1f5b UD |
260 | It must also be noted that for multibyte encoded strings the return |
261 | value does not have to correspond to the number of characters in the | |
262 | string. To get this value the string can be converted to wide | |
263 | characters and @code{wcslen} can be used or something like the following | |
264 | code can be used: | |
265 | ||
266 | @smallexample | |
267 | /* @r{The input is in @code{string}.} | |
268 | @r{The length is expected in @code{n}.} */ | |
269 | @{ | |
270 | mbstate_t t; | |
271 | char *scopy = string; | |
272 | /* In initial state. */ | |
273 | memset (&t, '\0', sizeof (t)); | |
274 | /* Determine number of characters. */ | |
275 | n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t); | |
276 | @} | |
277 | @end smallexample | |
278 | ||
279 | This is cumbersome to do so if the number of characters (as opposed to | |
280 | bytes) is needed often it is better to work with wide characters. | |
281 | @end deftypefun | |
282 | ||
283 | The wide character equivalent is declared in @file{wchar.h}. | |
284 | ||
285 | @comment wchar.h | |
286 | @comment ISO | |
287 | @deftypefun size_t wcslen (const wchar_t *@var{ws}) | |
288 | The @code{wcslen} function is the wide character equivalent to | |
289 | @code{strlen}. The return value is the number of wide characters in the | |
290 | wide character string pointed to by @var{ws} (this is also the offset of | |
291 | the terminating null wide character of @var{ws}). | |
292 | ||
293 | Since there are no multi wide character sequences making up one | |
294 | character the return value is not only the offset in the array, it is | |
295 | also the number of wide characters. | |
296 | ||
297 | This function was introduced in @w{Amendment 1} to @w{ISO C90}. | |
28f540f4 RM |
298 | @end deftypefun |
299 | ||
4547c1a4 UD |
300 | @comment string.h |
301 | @comment GNU | |
302 | @deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen}) | |
8a2f1f5b UD |
303 | The @code{strnlen} function returns the length of the string @var{s} in |
304 | bytes if this length is smaller than @var{maxlen} bytes. Otherwise it | |
305 | returns @var{maxlen}. Therefore this function is equivalent to | |
ebaf36eb JM |
306 | @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})} |
307 | but it | |
8a2f1f5b UD |
308 | is more efficient and works even if the string @var{s} is not |
309 | null-terminated. | |
4547c1a4 UD |
310 | |
311 | @smallexample | |
312 | char string[32] = "hello, world"; | |
313 | strnlen (string, 32) | |
314 | @result{} 12 | |
315 | strnlen (string, 5) | |
316 | @result{} 5 | |
317 | @end smallexample | |
318 | ||
8a2f1f5b UD |
319 | This function is a GNU extension and is declared in @file{string.h}. |
320 | @end deftypefun | |
321 | ||
322 | @comment wchar.h | |
323 | @comment GNU | |
324 | @deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen}) | |
325 | @code{wcsnlen} is the wide character equivalent to @code{strnlen}. The | |
326 | @var{maxlen} parameter specifies the maximum number of wide characters. | |
327 | ||
328 | This function is a GNU extension and is declared in @file{wchar.h}. | |
4547c1a4 UD |
329 | @end deftypefun |
330 | ||
b4012b75 | 331 | @node Copying and Concatenation |
28f540f4 RM |
332 | @section Copying and Concatenation |
333 | ||
334 | You can use the functions described in this section to copy the contents | |
335 | of strings and arrays, or to append the contents of one string to | |
8a2f1f5b UD |
336 | another. The @samp{str} and @samp{mem} functions are declared in the |
337 | header file @file{string.h} while the @samp{wstr} and @samp{wmem} | |
338 | functions are declared in the file @file{wchar.h}. | |
28f540f4 | 339 | @pindex string.h |
8a2f1f5b | 340 | @pindex wchar.h |
28f540f4 RM |
341 | @cindex copying strings and arrays |
342 | @cindex string copy functions | |
343 | @cindex array copy functions | |
344 | @cindex concatenating strings | |
345 | @cindex string concatenation functions | |
346 | ||
347 | A helpful way to remember the ordering of the arguments to the functions | |
348 | in this section is that it corresponds to an assignment expression, with | |
349 | the destination array specified to the left of the source array. All | |
350 | of these functions return the address of the destination array. | |
351 | ||
352 | Most of these functions do not work properly if the source and | |
353 | destination arrays overlap. For example, if the beginning of the | |
354 | destination array overlaps the end of the source array, the original | |
355 | contents of that part of the source array may get overwritten before it | |
356 | is copied. Even worse, in the case of the string functions, the null | |
357 | character marking the end of the string may be lost, and the copy | |
358 | function might get stuck in a loop trashing all the memory allocated to | |
359 | your program. | |
360 | ||
361 | All functions that have problems copying between overlapping arrays are | |
362 | explicitly identified in this manual. In addition to functions in this | |
363 | section, there are a few others like @code{sprintf} (@pxref{Formatted | |
364 | Output Functions}) and @code{scanf} (@pxref{Formatted Input | |
365 | Functions}). | |
366 | ||
367 | @comment string.h | |
f65fd747 | 368 | @comment ISO |
8a2f1f5b | 369 | @deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size}) |
28f540f4 RM |
370 | The @code{memcpy} function copies @var{size} bytes from the object |
371 | beginning at @var{from} into the object beginning at @var{to}. The | |
372 | behavior of this function is undefined if the two arrays @var{to} and | |
373 | @var{from} overlap; use @code{memmove} instead if overlapping is possible. | |
374 | ||
375 | The value returned by @code{memcpy} is the value of @var{to}. | |
376 | ||
377 | Here is an example of how you might use @code{memcpy} to copy the | |
378 | contents of an array: | |
379 | ||
380 | @smallexample | |
381 | struct foo *oldarray, *newarray; | |
382 | int arraysize; | |
383 | @dots{} | |
384 | memcpy (new, old, arraysize * sizeof (struct foo)); | |
385 | @end smallexample | |
386 | @end deftypefun | |
387 | ||
8a2f1f5b UD |
388 | @comment wchar.h |
389 | @comment ISO | |
79827876 | 390 | @deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
8a2f1f5b UD |
391 | The @code{wmemcpy} function copies @var{size} wide characters from the object |
392 | beginning at @var{wfrom} into the object beginning at @var{wto}. The | |
393 | behavior of this function is undefined if the two arrays @var{wto} and | |
394 | @var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible. | |
395 | ||
396 | The following is a possible implementation of @code{wmemcpy} but there | |
397 | are more optimizations possible. | |
398 | ||
399 | @smallexample | |
400 | wchar_t * | |
401 | wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, | |
402 | size_t size) | |
403 | @{ | |
404 | return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t)); | |
405 | @} | |
406 | @end smallexample | |
407 | ||
408 | The value returned by @code{wmemcpy} is the value of @var{wto}. | |
409 | ||
410 | This function was introduced in @w{Amendment 1} to @w{ISO C90}. | |
411 | @end deftypefun | |
412 | ||
4547c1a4 UD |
413 | @comment string.h |
414 | @comment GNU | |
8a2f1f5b | 415 | @deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size}) |
4547c1a4 | 416 | The @code{mempcpy} function is nearly identical to the @code{memcpy} |
f2ea0f5b | 417 | function. It copies @var{size} bytes from the object beginning at |
4547c1a4 | 418 | @code{from} into the object pointed to by @var{to}. But instead of |
976780fd | 419 | returning the value of @var{to} it returns a pointer to the byte |
4547c1a4 UD |
420 | following the last written byte in the object beginning at @var{to}. |
421 | I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}. | |
422 | ||
423 | This function is useful in situations where a number of objects shall be | |
424 | copied to consecutive memory positions. | |
425 | ||
426 | @smallexample | |
427 | void * | |
428 | combine (void *o1, size_t s1, void *o2, size_t s2) | |
429 | @{ | |
430 | void *result = malloc (s1 + s2); | |
431 | if (result != NULL) | |
432 | mempcpy (mempcpy (result, o1, s1), o2, s2); | |
433 | return result; | |
434 | @} | |
435 | @end smallexample | |
436 | ||
437 | This function is a GNU extension. | |
438 | @end deftypefun | |
439 | ||
8a2f1f5b UD |
440 | @comment wchar.h |
441 | @comment GNU | |
442 | @deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) | |
443 | The @code{wmempcpy} function is nearly identical to the @code{wmemcpy} | |
444 | function. It copies @var{size} wide characters from the object | |
445 | beginning at @code{wfrom} into the object pointed to by @var{wto}. But | |
446 | instead of returning the value of @var{wto} it returns a pointer to the | |
447 | wide character following the last written wide character in the object | |
448 | beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}. | |
449 | ||
450 | This function is useful in situations where a number of objects shall be | |
451 | copied to consecutive memory positions. | |
452 | ||
453 | The following is a possible implementation of @code{wmemcpy} but there | |
454 | are more optimizations possible. | |
455 | ||
456 | @smallexample | |
457 | wchar_t * | |
458 | wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, | |
459 | size_t size) | |
460 | @{ | |
461 | return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); | |
462 | @} | |
463 | @end smallexample | |
464 | ||
465 | This function is a GNU extension. | |
466 | @end deftypefun | |
467 | ||
28f540f4 | 468 | @comment string.h |
f65fd747 | 469 | @comment ISO |
28f540f4 RM |
470 | @deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size}) |
471 | @code{memmove} copies the @var{size} bytes at @var{from} into the | |
472 | @var{size} bytes at @var{to}, even if those two blocks of space | |
473 | overlap. In the case of overlap, @code{memmove} is careful to copy the | |
474 | original values of the bytes in the block at @var{from}, including those | |
475 | bytes which also belong to the block at @var{to}. | |
8a2f1f5b UD |
476 | |
477 | The value returned by @code{memmove} is the value of @var{to}. | |
478 | @end deftypefun | |
479 | ||
480 | @comment wchar.h | |
481 | @comment ISO | |
482 | @deftypefun {wchar_t *} wmemmove (wchar *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size}) | |
483 | @code{wmemmove} copies the @var{size} wide characters at @var{wfrom} | |
484 | into the @var{size} wide characters at @var{wto}, even if those two | |
485 | blocks of space overlap. In the case of overlap, @code{memmove} is | |
486 | careful to copy the original values of the wide characters in the block | |
487 | at @var{wfrom}, including those wide characters which also belong to the | |
488 | block at @var{wto}. | |
489 | ||
490 | The following is a possible implementation of @code{wmemcpy} but there | |
491 | are more optimizations possible. | |
492 | ||
493 | @smallexample | |
494 | wchar_t * | |
495 | wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, | |
496 | size_t size) | |
497 | @{ | |
498 | return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); | |
499 | @} | |
500 | @end smallexample | |
501 | ||
502 | The value returned by @code{wmemmove} is the value of @var{wto}. | |
503 | ||
504 | This function is a GNU extension. | |
28f540f4 RM |
505 | @end deftypefun |
506 | ||
507 | @comment string.h | |
508 | @comment SVID | |
8a2f1f5b | 509 | @deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size}) |
28f540f4 RM |
510 | This function copies no more than @var{size} bytes from @var{from} to |
511 | @var{to}, stopping if a byte matching @var{c} is found. The return | |
512 | value is a pointer into @var{to} one byte past where @var{c} was copied, | |
513 | or a null pointer if no byte matching @var{c} appeared in the first | |
514 | @var{size} bytes of @var{from}. | |
515 | @end deftypefun | |
516 | ||
517 | @comment string.h | |
f65fd747 | 518 | @comment ISO |
28f540f4 RM |
519 | @deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size}) |
520 | This function copies the value of @var{c} (converted to an | |
521 | @code{unsigned char}) into each of the first @var{size} bytes of the | |
522 | object beginning at @var{block}. It returns the value of @var{block}. | |
523 | @end deftypefun | |
524 | ||
8a2f1f5b UD |
525 | @comment wchar.h |
526 | @comment ISO | |
527 | @deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size}) | |
528 | This function copies the value of @var{wc} into each of the first | |
529 | @var{size} wide characters of the object beginning at @var{block}. It | |
530 | returns the value of @var{block}. | |
531 | @end deftypefun | |
532 | ||
28f540f4 | 533 | @comment string.h |
f65fd747 | 534 | @comment ISO |
8a2f1f5b | 535 | @deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from}) |
28f540f4 RM |
536 | This copies characters from the string @var{from} (up to and including |
537 | the terminating null character) into the string @var{to}. Like | |
538 | @code{memcpy}, this function has undefined results if the strings | |
539 | overlap. The return value is the value of @var{to}. | |
540 | @end deftypefun | |
541 | ||
8a2f1f5b UD |
542 | @comment wchar.h |
543 | @comment ISO | |
544 | @deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) | |
545 | This copies wide characters from the string @var{wfrom} (up to and | |
546 | including the terminating null wide character) into the string | |
547 | @var{wto}. Like @code{wmemcpy}, this function has undefined results if | |
548 | the strings overlap. The return value is the value of @var{wto}. | |
549 | @end deftypefun | |
550 | ||
28f540f4 | 551 | @comment string.h |
f65fd747 | 552 | @comment ISO |
8a2f1f5b | 553 | @deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
28f540f4 RM |
554 | This function is similar to @code{strcpy} but always copies exactly |
555 | @var{size} characters into @var{to}. | |
556 | ||
557 | If the length of @var{from} is more than @var{size}, then @code{strncpy} | |
558 | copies just the first @var{size} characters. Note that in this case | |
559 | there is no null terminator written into @var{to}. | |
560 | ||
561 | If the length of @var{from} is less than @var{size}, then @code{strncpy} | |
562 | copies all of @var{from}, followed by enough null characters to add up | |
563 | to @var{size} characters in all. This behavior is rarely useful, but it | |
f65fd747 | 564 | is specified by the @w{ISO C} standard. |
28f540f4 RM |
565 | |
566 | The behavior of @code{strncpy} is undefined if the strings overlap. | |
567 | ||
568 | Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs | |
569 | relating to writing past the end of the allocated space for @var{to}. | |
570 | However, it can also make your program much slower in one common case: | |
571 | copying a string which is probably small into a potentially large buffer. | |
572 | In this case, @var{size} may be large, and when it is, @code{strncpy} will | |
573 | waste a considerable amount of time copying null characters. | |
574 | @end deftypefun | |
575 | ||
8a2f1f5b UD |
576 | @comment wchar.h |
577 | @comment ISO | |
578 | @deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) | |
579 | This function is similar to @code{wcscpy} but always copies exactly | |
580 | @var{size} wide characters into @var{wto}. | |
581 | ||
582 | If the length of @var{wfrom} is more than @var{size}, then | |
583 | @code{wcsncpy} copies just the first @var{size} wide characters. Note | |
584 | that in this case there is no null terminator written into @var{wto}. | |
585 | ||
586 | If the length of @var{wfrom} is less than @var{size}, then | |
587 | @code{wcsncpy} copies all of @var{wfrom}, followed by enough null wide | |
588 | characters to add up to @var{size} wide characters in all. This | |
589 | behavior is rarely useful, but it is specified by the @w{ISO C} | |
590 | standard. | |
591 | ||
592 | The behavior of @code{wcsncpy} is undefined if the strings overlap. | |
593 | ||
594 | Using @code{wcsncpy} as opposed to @code{wcscpy} is a way to avoid bugs | |
595 | relating to writing past the end of the allocated space for @var{wto}. | |
596 | However, it can also make your program much slower in one common case: | |
597 | copying a string which is probably small into a potentially large buffer. | |
598 | In this case, @var{size} may be large, and when it is, @code{wcsncpy} will | |
599 | waste a considerable amount of time copying null wide characters. | |
600 | @end deftypefun | |
601 | ||
28f540f4 RM |
602 | @comment string.h |
603 | @comment SVID | |
604 | @deftypefun {char *} strdup (const char *@var{s}) | |
605 | This function copies the null-terminated string @var{s} into a newly | |
606 | allocated string. The string is allocated using @code{malloc}; see | |
607 | @ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space | |
608 | for the new string, @code{strdup} returns a null pointer. Otherwise it | |
609 | returns a pointer to the new string. | |
610 | @end deftypefun | |
611 | ||
8a2f1f5b UD |
612 | @comment wchar.h |
613 | @comment GNU | |
614 | @deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws}) | |
615 | This function copies the null-terminated wide character string @var{ws} | |
616 | into a newly allocated string. The string is allocated using | |
617 | @code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc} | |
618 | cannot allocate space for the new string, @code{wcsdup} returns a null | |
619 | pointer. Otherwise it returns a pointer to the new wide character | |
620 | string. | |
621 | ||
622 | This function is a GNU extension. | |
623 | @end deftypefun | |
624 | ||
706074a5 UD |
625 | @comment string.h |
626 | @comment GNU | |
627 | @deftypefun {char *} strndup (const char *@var{s}, size_t @var{size}) | |
628 | This function is similar to @code{strdup} but always copies at most | |
629 | @var{size} characters into the newly allocated string. | |
630 | ||
631 | If the length of @var{s} is more than @var{size}, then @code{strndup} | |
632 | copies just the first @var{size} characters and adds a closing null | |
633 | terminator. Otherwise all characters are copied and the string is | |
634 | terminated. | |
635 | ||
636 | This function is different to @code{strncpy} in that it always | |
637 | terminates the destination string. | |
738d1a5a UD |
638 | |
639 | @code{strndup} is a GNU extension. | |
706074a5 UD |
640 | @end deftypefun |
641 | ||
28f540f4 RM |
642 | @comment string.h |
643 | @comment Unknown origin | |
8a2f1f5b | 644 | @deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from}) |
28f540f4 RM |
645 | This function is like @code{strcpy}, except that it returns a pointer to |
646 | the end of the string @var{to} (that is, the address of the terminating | |
8a2f1f5b | 647 | null character @code{to + strlen (from)}) rather than the beginning. |
28f540f4 RM |
648 | |
649 | For example, this program uses @code{stpcpy} to concatenate @samp{foo} | |
650 | and @samp{bar} to produce @samp{foobar}, which it then prints. | |
651 | ||
652 | @smallexample | |
653 | @include stpcpy.c.texi | |
654 | @end smallexample | |
655 | ||
f65fd747 | 656 | This function is not part of the ISO or POSIX standards, and is not |
28f540f4 RM |
657 | customary on Unix systems, but we did not invent it either. Perhaps it |
658 | comes from MS-DOG. | |
659 | ||
8a2f1f5b UD |
660 | Its behavior is undefined if the strings overlap. The function is |
661 | declared in @file{string.h}. | |
662 | @end deftypefun | |
663 | ||
664 | @comment wchar.h | |
665 | @comment GNU | |
666 | @deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) | |
667 | This function is like @code{wcscpy}, except that it returns a pointer to | |
668 | the end of the string @var{wto} (that is, the address of the terminating | |
669 | null character @code{wto + strlen (wfrom)}) rather than the beginning. | |
670 | ||
671 | This function is not part of ISO or POSIX but was found useful while | |
1f77f049 | 672 | developing @theglibc{} itself. |
8a2f1f5b UD |
673 | |
674 | The behavior of @code{wcpcpy} is undefined if the strings overlap. | |
675 | ||
676 | @code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}. | |
28f540f4 RM |
677 | @end deftypefun |
678 | ||
706074a5 UD |
679 | @comment string.h |
680 | @comment GNU | |
8a2f1f5b | 681 | @deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
706074a5 UD |
682 | This function is similar to @code{stpcpy} but copies always exactly |
683 | @var{size} characters into @var{to}. | |
684 | ||
685 | If the length of @var{from} is more then @var{size}, then @code{stpncpy} | |
686 | copies just the first @var{size} characters and returns a pointer to the | |
687 | character directly following the one which was copied last. Note that in | |
688 | this case there is no null terminator written into @var{to}. | |
689 | ||
690 | If the length of @var{from} is less than @var{size}, then @code{stpncpy} | |
691 | copies all of @var{from}, followed by enough null characters to add up | |
0bc93a2f AJ |
692 | to @var{size} characters in all. This behavior is rarely useful, but it |
693 | is implemented to be useful in contexts where this behavior of the | |
706074a5 UD |
694 | @code{strncpy} is used. @code{stpncpy} returns a pointer to the |
695 | @emph{first} written null character. | |
696 | ||
f65fd747 | 697 | This function is not part of ISO or POSIX but was found useful while |
1f77f049 | 698 | developing @theglibc{} itself. |
706074a5 | 699 | |
0bc93a2f | 700 | Its behavior is undefined if the strings overlap. The function is |
8a2f1f5b UD |
701 | declared in @file{string.h}. |
702 | @end deftypefun | |
703 | ||
704 | @comment wchar.h | |
705 | @comment GNU | |
706 | @deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) | |
707 | This function is similar to @code{wcpcpy} but copies always exactly | |
708 | @var{wsize} characters into @var{wto}. | |
709 | ||
710 | If the length of @var{wfrom} is more then @var{size}, then | |
711 | @code{wcpncpy} copies just the first @var{size} wide characters and | |
80b54217 UD |
712 | returns a pointer to the wide character directly following the last |
713 | non-null wide character which was copied last. Note that in this case | |
714 | there is no null terminator written into @var{wto}. | |
8a2f1f5b UD |
715 | |
716 | If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy} | |
717 | copies all of @var{wfrom}, followed by enough null characters to add up | |
0bc93a2f AJ |
718 | to @var{size} characters in all. This behavior is rarely useful, but it |
719 | is implemented to be useful in contexts where this behavior of the | |
8a2f1f5b UD |
720 | @code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the |
721 | @emph{first} written null character. | |
722 | ||
723 | This function is not part of ISO or POSIX but was found useful while | |
1f77f049 | 724 | developing @theglibc{} itself. |
8a2f1f5b | 725 | |
0bc93a2f | 726 | Its behavior is undefined if the strings overlap. |
8a2f1f5b UD |
727 | |
728 | @code{wcpncpy} is a GNU extension and is declared in @file{wchar.h}. | |
706074a5 UD |
729 | @end deftypefun |
730 | ||
731 | @comment string.h | |
732 | @comment GNU | |
26b4d766 | 733 | @deftypefn {Macro} {char *} strdupa (const char *@var{s}) |
976780fd | 734 | This macro is similar to @code{strdup} but allocates the new string |
dd7d45e8 UD |
735 | using @code{alloca} instead of @code{malloc} (@pxref{Variable Size |
736 | Automatic}). This means of course the returned string has the same | |
737 | limitations as any block of memory allocated using @code{alloca}. | |
706074a5 | 738 | |
dd7d45e8 | 739 | For obvious reasons @code{strdupa} is implemented only as a macro; |
40a55d20 | 740 | you cannot get the address of this function. Despite this limitation |
706074a5 UD |
741 | it is a useful function. The following code shows a situation where |
742 | using @code{malloc} would be a lot more expensive. | |
743 | ||
744 | @smallexample | |
745 | @include strdupa.c.texi | |
746 | @end smallexample | |
747 | ||
748 | Please note that calling @code{strtok} using @var{path} directly is | |
8a2f1f5b UD |
749 | invalid. It is also not allowed to call @code{strdupa} in the argument |
750 | list of @code{strtok} since @code{strdupa} uses @code{alloca} | |
751 | (@pxref{Variable Size Automatic}) can interfere with the parameter | |
752 | passing. | |
706074a5 UD |
753 | |
754 | This function is only available if GNU CC is used. | |
26b4d766 | 755 | @end deftypefn |
706074a5 UD |
756 | |
757 | @comment string.h | |
758 | @comment GNU | |
26b4d766 | 759 | @deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size}) |
706074a5 UD |
760 | This function is similar to @code{strndup} but like @code{strdupa} it |
761 | allocates the new string using @code{alloca} | |
762 | @pxref{Variable Size Automatic}. The same advantages and limitations | |
763 | of @code{strdupa} are valid for @code{strndupa}, too. | |
764 | ||
dd7d45e8 | 765 | This function is implemented only as a macro, just like @code{strdupa}. |
8a2f1f5b UD |
766 | Just as @code{strdupa} this macro also must not be used inside the |
767 | parameter list in a function call. | |
706074a5 UD |
768 | |
769 | @code{strndupa} is only available if GNU CC is used. | |
26b4d766 | 770 | @end deftypefn |
706074a5 | 771 | |
28f540f4 | 772 | @comment string.h |
f65fd747 | 773 | @comment ISO |
8a2f1f5b | 774 | @deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from}) |
28f540f4 RM |
775 | The @code{strcat} function is similar to @code{strcpy}, except that the |
776 | characters from @var{from} are concatenated or appended to the end of | |
777 | @var{to}, instead of overwriting it. That is, the first character from | |
778 | @var{from} overwrites the null character marking the end of @var{to}. | |
779 | ||
780 | An equivalent definition for @code{strcat} would be: | |
781 | ||
782 | @smallexample | |
783 | char * | |
8a2f1f5b | 784 | strcat (char *restrict to, const char *restrict from) |
28f540f4 RM |
785 | @{ |
786 | strcpy (to + strlen (to), from); | |
787 | return to; | |
788 | @} | |
789 | @end smallexample | |
790 | ||
791 | This function has undefined results if the strings overlap. | |
792 | @end deftypefun | |
793 | ||
8a2f1f5b UD |
794 | @comment wchar.h |
795 | @comment ISO | |
796 | @deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) | |
797 | The @code{wcscat} function is similar to @code{wcscpy}, except that the | |
798 | characters from @var{wfrom} are concatenated or appended to the end of | |
799 | @var{wto}, instead of overwriting it. That is, the first character from | |
800 | @var{wfrom} overwrites the null character marking the end of @var{wto}. | |
801 | ||
802 | An equivalent definition for @code{wcscat} would be: | |
803 | ||
804 | @smallexample | |
805 | wchar_t * | |
806 | wcscat (wchar_t *wto, const wchar_t *wfrom) | |
807 | @{ | |
808 | wcscpy (wto + wcslen (wto), wfrom); | |
809 | return wto; | |
810 | @} | |
811 | @end smallexample | |
812 | ||
813 | This function has undefined results if the strings overlap. | |
814 | @end deftypefun | |
815 | ||
816 | Programmers using the @code{strcat} or @code{wcscat} function (or the | |
817 | following @code{strncat} or @code{wcsncar} functions for that matter) | |
818 | can easily be recognized as lazy and reckless. In almost all situations | |
819 | the lengths of the participating strings are known (it better should be | |
820 | since how can one otherwise ensure the allocated size of the buffer is | |
821 | sufficient?) Or at least, one could know them if one keeps track of the | |
ee2752ea | 822 | results of the various function calls. But then it is very inefficient |
8a2f1f5b UD |
823 | to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the |
824 | end of the destination string so that the actual copying can start. | |
825 | This is a common example: | |
ee2752ea UD |
826 | |
827 | @cindex __va_copy | |
828 | @cindex va_copy | |
829 | @smallexample | |
49c091e5 | 830 | /* @r{This function concatenates arbitrarily many strings. The last} |
ee2752ea UD |
831 | @r{parameter must be @code{NULL}.} */ |
832 | char * | |
8a2f1f5b | 833 | concat (const char *str, @dots{}) |
ee2752ea UD |
834 | @{ |
835 | va_list ap, ap2; | |
836 | size_t total = 1; | |
837 | const char *s; | |
838 | char *result; | |
839 | ||
840 | va_start (ap, str); | |
841 | /* @r{Actually @code{va_copy}, but this is the name more gcc versions} | |
842 | @r{understand.} */ | |
843 | __va_copy (ap2, ap); | |
844 | ||
845 | /* @r{Determine how much space we need.} */ | |
846 | for (s = str; s != NULL; s = va_arg (ap, const char *)) | |
847 | total += strlen (s); | |
848 | ||
849 | va_end (ap); | |
850 | ||
851 | result = (char *) malloc (total); | |
852 | if (result != NULL) | |
853 | @{ | |
854 | result[0] = '\0'; | |
855 | ||
856 | /* @r{Copy the strings.} */ | |
857 | for (s = str; s != NULL; s = va_arg (ap2, const char *)) | |
858 | strcat (result, s); | |
859 | @} | |
860 | ||
861 | va_end (ap2); | |
862 | ||
863 | return result; | |
864 | @} | |
865 | @end smallexample | |
866 | ||
867 | This looks quite simple, especially the second loop where the strings | |
868 | are actually copied. But these innocent lines hide a major performance | |
869 | penalty. Just imagine that ten strings of 100 bytes each have to be | |
870 | concatenated. For the second string we search the already stored 100 | |
871 | bytes for the end of the string so that we can append the next string. | |
872 | For all strings in total the comparisons necessary to find the end of | |
873 | the intermediate results sums up to 5500! If we combine the copying | |
874 | with the search for the allocation we can write this function more | |
49c091e5 | 875 | efficient: |
ee2752ea UD |
876 | |
877 | @smallexample | |
878 | char * | |
8a2f1f5b | 879 | concat (const char *str, @dots{}) |
ee2752ea UD |
880 | @{ |
881 | va_list ap; | |
882 | size_t allocated = 100; | |
883 | char *result = (char *) malloc (allocated); | |
ee2752ea | 884 | |
623281e0 | 885 | if (result != NULL) |
ee2752ea UD |
886 | @{ |
887 | char *newp; | |
623281e0 | 888 | char *wp; |
ee2752ea | 889 | |
623281e0 | 890 | va_start (ap, str); |
ee2752ea UD |
891 | |
892 | wp = result; | |
893 | for (s = str; s != NULL; s = va_arg (ap, const char *)) | |
894 | @{ | |
895 | size_t len = strlen (s); | |
896 | ||
897 | /* @r{Resize the allocated memory if necessary.} */ | |
898 | if (wp + len + 1 > result + allocated) | |
899 | @{ | |
900 | allocated = (allocated + len) * 2; | |
901 | newp = (char *) realloc (result, allocated); | |
902 | if (newp == NULL) | |
903 | @{ | |
904 | free (result); | |
905 | return NULL; | |
906 | @} | |
907 | wp = newp + (wp - result); | |
908 | result = newp; | |
909 | @} | |
910 | ||
911 | wp = mempcpy (wp, s, len); | |
912 | @} | |
913 | ||
914 | /* @r{Terminate the result string.} */ | |
915 | *wp++ = '\0'; | |
916 | ||
917 | /* @r{Resize memory to the optimal size.} */ | |
918 | newp = realloc (result, wp - result); | |
919 | if (newp != NULL) | |
920 | result = newp; | |
921 | ||
922 | va_end (ap); | |
923 | @} | |
924 | ||
925 | return result; | |
926 | @} | |
927 | @end smallexample | |
928 | ||
929 | With a bit more knowledge about the input strings one could fine-tune | |
930 | the memory allocation. The difference we are pointing to here is that | |
931 | we don't use @code{strcat} anymore. We always keep track of the length | |
932 | of the current intermediate result so we can safe us the search for the | |
933 | end of the string and use @code{mempcpy}. Please note that we also | |
934 | don't use @code{stpcpy} which might seem more natural since we handle | |
935 | with strings. But this is not necessary since we already know the | |
936 | length of the string and therefore can use the faster memory copying | |
8a2f1f5b | 937 | function. The example would work for wide characters the same way. |
ee2752ea UD |
938 | |
939 | Whenever a programmer feels the need to use @code{strcat} she or he | |
940 | should think twice and look through the program whether the code cannot | |
941 | be rewritten to take advantage of already calculated results. Again: it | |
942 | is almost always unnecessary to use @code{strcat}. | |
943 | ||
28f540f4 | 944 | @comment string.h |
f65fd747 | 945 | @comment ISO |
8a2f1f5b | 946 | @deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
28f540f4 RM |
947 | This function is like @code{strcat} except that not more than @var{size} |
948 | characters from @var{from} are appended to the end of @var{to}. A | |
949 | single null character is also always appended to @var{to}, so the total | |
950 | allocated size of @var{to} must be at least @code{@var{size} + 1} bytes | |
951 | longer than its initial length. | |
952 | ||
953 | The @code{strncat} function could be implemented like this: | |
954 | ||
955 | @smallexample | |
956 | @group | |
957 | char * | |
958 | strncat (char *to, const char *from, size_t size) | |
959 | @{ | |
8a2f1f5b | 960 | to[strlen (to) + size] = '\0'; |
28f540f4 RM |
961 | strncpy (to + strlen (to), from, size); |
962 | return to; | |
963 | @} | |
964 | @end group | |
965 | @end smallexample | |
966 | ||
967 | The behavior of @code{strncat} is undefined if the strings overlap. | |
968 | @end deftypefun | |
969 | ||
8a2f1f5b UD |
970 | @comment wchar.h |
971 | @comment ISO | |
972 | @deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) | |
973 | This function is like @code{wcscat} except that not more than @var{size} | |
974 | characters from @var{from} are appended to the end of @var{to}. A | |
975 | single null character is also always appended to @var{to}, so the total | |
976 | allocated size of @var{to} must be at least @code{@var{size} + 1} bytes | |
977 | longer than its initial length. | |
978 | ||
979 | The @code{wcsncat} function could be implemented like this: | |
980 | ||
981 | @smallexample | |
982 | @group | |
983 | wchar_t * | |
984 | wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom, | |
985 | size_t size) | |
986 | @{ | |
987 | wto[wcslen (to) + size] = L'\0'; | |
988 | wcsncpy (wto + wcslen (wto), wfrom, size); | |
989 | return wto; | |
990 | @} | |
991 | @end group | |
992 | @end smallexample | |
993 | ||
994 | The behavior of @code{wcsncat} is undefined if the strings overlap. | |
995 | @end deftypefun | |
996 | ||
997 | Here is an example showing the use of @code{strncpy} and @code{strncat} | |
998 | (the wide character version is equivalent). Notice how, in the call to | |
999 | @code{strncat}, the @var{size} parameter is computed to avoid | |
1000 | overflowing the character array @code{buffer}. | |
28f540f4 RM |
1001 | |
1002 | @smallexample | |
1003 | @include strncat.c.texi | |
1004 | @end smallexample | |
1005 | ||
1006 | @noindent | |
1007 | The output produced by this program looks like: | |
1008 | ||
1009 | @smallexample | |
1010 | hello | |
1011 | hello, wo | |
1012 | @end smallexample | |
1013 | ||
1014 | @comment string.h | |
1015 | @comment BSD | |
af6f3906 | 1016 | @deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size}) |
28f540f4 RM |
1017 | This is a partially obsolete alternative for @code{memmove}, derived from |
1018 | BSD. Note that it is not quite equivalent to @code{memmove}, because the | |
af6f3906 | 1019 | arguments are not in the same order and there is no return value. |
28f540f4 RM |
1020 | @end deftypefun |
1021 | ||
1022 | @comment string.h | |
1023 | @comment BSD | |
af6f3906 | 1024 | @deftypefun void bzero (void *@var{block}, size_t @var{size}) |
28f540f4 RM |
1025 | This is a partially obsolete alternative for @code{memset}, derived from |
1026 | BSD. Note that it is not as general as @code{memset}, because the only | |
1027 | value it can store is zero. | |
1028 | @end deftypefun | |
1029 | ||
b4012b75 | 1030 | @node String/Array Comparison |
28f540f4 RM |
1031 | @section String/Array Comparison |
1032 | @cindex comparing strings and arrays | |
1033 | @cindex string comparison functions | |
1034 | @cindex array comparison functions | |
1035 | @cindex predicates on strings | |
1036 | @cindex predicates on arrays | |
1037 | ||
1038 | You can use the functions in this section to perform comparisons on the | |
1039 | contents of strings and arrays. As well as checking for equality, these | |
1040 | functions can also be used as the ordering functions for sorting | |
1041 | operations. @xref{Searching and Sorting}, for an example of this. | |
1042 | ||
1043 | Unlike most comparison operations in C, the string comparison functions | |
1044 | return a nonzero value if the strings are @emph{not} equivalent rather | |
1045 | than if they are. The sign of the value indicates the relative ordering | |
1046 | of the first characters in the strings that are not equivalent: a | |
1047 | negative value indicates that the first string is ``less'' than the | |
a5113b14 | 1048 | second, while a positive value indicates that the first string is |
28f540f4 RM |
1049 | ``greater''. |
1050 | ||
1051 | The most common use of these functions is to check only for equality. | |
1052 | This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}. | |
1053 | ||
1054 | All of these functions are declared in the header file @file{string.h}. | |
1055 | @pindex string.h | |
1056 | ||
1057 | @comment string.h | |
f65fd747 | 1058 | @comment ISO |
28f540f4 RM |
1059 | @deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) |
1060 | The function @code{memcmp} compares the @var{size} bytes of memory | |
1061 | beginning at @var{a1} against the @var{size} bytes of memory beginning | |
1062 | at @var{a2}. The value returned has the same sign as the difference | |
1063 | between the first differing pair of bytes (interpreted as @code{unsigned | |
1064 | char} objects, then promoted to @code{int}). | |
1065 | ||
1066 | If the contents of the two blocks are equal, @code{memcmp} returns | |
1067 | @code{0}. | |
1068 | @end deftypefun | |
1069 | ||
8a2f1f5b UD |
1070 | @comment wcjar.h |
1071 | @comment ISO | |
1072 | @deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size}) | |
1073 | The function @code{wmemcmp} compares the @var{size} wide characters | |
1074 | beginning at @var{a1} against the @var{size} wide characters beginning | |
1075 | at @var{a2}. The value returned is smaller than or larger than zero | |
1076 | depending on whether the first differing wide character is @var{a1} is | |
1077 | smaller or larger than the corresponding character in @var{a2}. | |
1078 | ||
1079 | If the contents of the two blocks are equal, @code{wmemcmp} returns | |
1080 | @code{0}. | |
1081 | @end deftypefun | |
1082 | ||
28f540f4 RM |
1083 | On arbitrary arrays, the @code{memcmp} function is mostly useful for |
1084 | testing equality. It usually isn't meaningful to do byte-wise ordering | |
1085 | comparisons on arrays of things other than bytes. For example, a | |
1086 | byte-wise comparison on the bytes that make up floating-point numbers | |
1087 | isn't likely to tell you anything about the relationship between the | |
1088 | values of the floating-point numbers. | |
1089 | ||
8a2f1f5b UD |
1090 | @code{wmemcmp} is really only useful to compare arrays of type |
1091 | @code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes | |
1092 | at a time and this number of bytes is system dependent. | |
1093 | ||
28f540f4 RM |
1094 | You should also be careful about using @code{memcmp} to compare objects |
1095 | that can contain ``holes'', such as the padding inserted into structure | |
1096 | objects to enforce alignment requirements, extra space at the end of | |
1097 | unions, and extra characters at the ends of strings whose length is less | |
1098 | than their allocated size. The contents of these ``holes'' are | |
1099 | indeterminate and may cause strange behavior when performing byte-wise | |
1100 | comparisons. For more predictable results, perform an explicit | |
1101 | component-wise comparison. | |
1102 | ||
1103 | For example, given a structure type definition like: | |
1104 | ||
1105 | @smallexample | |
1106 | struct foo | |
1107 | @{ | |
1108 | unsigned char tag; | |
1109 | union | |
1110 | @{ | |
1111 | double f; | |
1112 | long i; | |
1113 | char *p; | |
1114 | @} value; | |
1115 | @}; | |
1116 | @end smallexample | |
1117 | ||
1118 | @noindent | |
1119 | you are better off writing a specialized comparison function to compare | |
1120 | @code{struct foo} objects instead of comparing them with @code{memcmp}. | |
1121 | ||
1122 | @comment string.h | |
f65fd747 | 1123 | @comment ISO |
28f540f4 RM |
1124 | @deftypefun int strcmp (const char *@var{s1}, const char *@var{s2}) |
1125 | The @code{strcmp} function compares the string @var{s1} against | |
1126 | @var{s2}, returning a value that has the same sign as the difference | |
1127 | between the first differing pair of characters (interpreted as | |
1128 | @code{unsigned char} objects, then promoted to @code{int}). | |
1129 | ||
1130 | If the two strings are equal, @code{strcmp} returns @code{0}. | |
1131 | ||
1132 | A consequence of the ordering used by @code{strcmp} is that if @var{s1} | |
1133 | is an initial substring of @var{s2}, then @var{s1} is considered to be | |
1134 | ``less than'' @var{s2}. | |
8a2f1f5b UD |
1135 | |
1136 | @code{strcmp} does not take sorting conventions of the language the | |
1137 | strings are written in into account. To get that one has to use | |
1138 | @code{strcoll}. | |
1139 | @end deftypefun | |
1140 | ||
1141 | @comment wchar.h | |
1142 | @comment ISO | |
1143 | @deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) | |
1144 | ||
1145 | The @code{wcscmp} function compares the wide character string @var{ws1} | |
1146 | against @var{ws2}. The value returned is smaller than or larger than zero | |
1147 | depending on whether the first differing wide character is @var{ws1} is | |
1148 | smaller or larger than the corresponding character in @var{ws2}. | |
1149 | ||
1150 | If the two strings are equal, @code{wcscmp} returns @code{0}. | |
1151 | ||
1152 | A consequence of the ordering used by @code{wcscmp} is that if @var{ws1} | |
1153 | is an initial substring of @var{ws2}, then @var{ws1} is considered to be | |
1154 | ``less than'' @var{ws2}. | |
1155 | ||
1156 | @code{wcscmp} does not take sorting conventions of the language the | |
1157 | strings are written in into account. To get that one has to use | |
1158 | @code{wcscoll}. | |
28f540f4 RM |
1159 | @end deftypefun |
1160 | ||
1161 | @comment string.h | |
1162 | @comment BSD | |
1163 | @deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2}) | |
4547c1a4 | 1164 | This function is like @code{strcmp}, except that differences in case are |
dd7d45e8 | 1165 | ignored. How uppercase and lowercase characters are related is |
4547c1a4 UD |
1166 | determined by the currently selected locale. In the standard @code{"C"} |
1167 | locale the characters @"A and @"a do not match but in a locale which | |
dd7d45e8 | 1168 | regards these characters as parts of the alphabet they do match. |
28f540f4 | 1169 | |
85c165be | 1170 | @noindent |
28f540f4 RM |
1171 | @code{strcasecmp} is derived from BSD. |
1172 | @end deftypefun | |
1173 | ||
8a2f1f5b UD |
1174 | @comment wchar.h |
1175 | @comment GNU | |
1176 | @deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_T *@var{ws2}) | |
1177 | This function is like @code{wcscmp}, except that differences in case are | |
1178 | ignored. How uppercase and lowercase characters are related is | |
1179 | determined by the currently selected locale. In the standard @code{"C"} | |
1180 | locale the characters @"A and @"a do not match but in a locale which | |
1181 | regards these characters as parts of the alphabet they do match. | |
1182 | ||
1183 | @noindent | |
1184 | @code{wcscasecmp} is a GNU extension. | |
1185 | @end deftypefun | |
1186 | ||
1187 | @comment string.h | |
1188 | @comment ISO | |
1189 | @deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size}) | |
1190 | This function is the similar to @code{strcmp}, except that no more than | |
11bf311e UD |
1191 | @var{size} characters are compared. In other words, if the two |
1192 | strings are the same in their first @var{size} characters, the | |
8a2f1f5b UD |
1193 | return value is zero. |
1194 | @end deftypefun | |
1195 | ||
1196 | @comment wchar.h | |
1197 | @comment ISO | |
1198 | @deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size}) | |
1199 | This function is the similar to @code{wcscmp}, except that no more than | |
1200 | @var{size} wide characters are compared. In other words, if the two | |
1201 | strings are the same in their first @var{size} wide characters, the | |
1202 | return value is zero. | |
1203 | @end deftypefun | |
1204 | ||
28f540f4 RM |
1205 | @comment string.h |
1206 | @comment BSD | |
1207 | @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n}) | |
1208 | This function is like @code{strncmp}, except that differences in case | |
dd7d45e8 UD |
1209 | are ignored. Like @code{strcasecmp}, it is locale dependent how |
1210 | uppercase and lowercase characters are related. | |
28f540f4 | 1211 | |
85c165be | 1212 | @noindent |
28f540f4 RM |
1213 | @code{strncasecmp} is a GNU extension. |
1214 | @end deftypefun | |
1215 | ||
8a2f1f5b UD |
1216 | @comment wchar.h |
1217 | @comment GNU | |
1218 | @deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n}) | |
1219 | This function is like @code{wcsncmp}, except that differences in case | |
1220 | are ignored. Like @code{wcscasecmp}, it is locale dependent how | |
1221 | uppercase and lowercase characters are related. | |
1222 | ||
1223 | @noindent | |
1224 | @code{wcsncasecmp} is a GNU extension. | |
28f540f4 RM |
1225 | @end deftypefun |
1226 | ||
8a2f1f5b UD |
1227 | Here are some examples showing the use of @code{strcmp} and |
1228 | @code{strncmp} (equivalent examples can be constructed for the wide | |
1229 | character functions). These examples assume the use of the ASCII | |
1230 | character set. (If some other character set---say, EBCDIC---is used | |
1231 | instead, then the glyphs are associated with different numeric codes, | |
1232 | and the return values and ordering may differ.) | |
28f540f4 RM |
1233 | |
1234 | @smallexample | |
1235 | strcmp ("hello", "hello") | |
1236 | @result{} 0 /* @r{These two strings are the same.} */ | |
1237 | strcmp ("hello", "Hello") | |
1238 | @result{} 32 /* @r{Comparisons are case-sensitive.} */ | |
1239 | strcmp ("hello", "world") | |
1240 | @result{} -15 /* @r{The character @code{'h'} comes before @code{'w'}.} */ | |
1241 | strcmp ("hello", "hello, world") | |
1242 | @result{} -44 /* @r{Comparing a null character against a comma.} */ | |
6952e59e | 1243 | strncmp ("hello", "hello, world", 5) |
28f540f4 RM |
1244 | @result{} 0 /* @r{The initial 5 characters are the same.} */ |
1245 | strncmp ("hello, world", "hello, stupid world!!!", 5) | |
1246 | @result{} 0 /* @r{The initial 5 characters are the same.} */ | |
1247 | @end smallexample | |
1248 | ||
1f205a47 UD |
1249 | @comment string.h |
1250 | @comment GNU | |
1251 | @deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2}) | |
1252 | The @code{strverscmp} function compares the string @var{s1} against | |
f2282d42 RM |
1253 | @var{s2}, considering them as holding indices/version numbers. The |
1254 | return value follows the same conventions as found in the | |
1255 | @code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no | |
1256 | digits, @code{strverscmp} behaves like @code{strcmp}. | |
1f205a47 | 1257 | |
f2ea0f5b | 1258 | Basically, we compare strings normally (character by character), until |
1f205a47 | 1259 | we find a digit in each string - then we enter a special comparison |
dd7d45e8 | 1260 | mode, where each sequence of digits is taken as a whole. If we reach the |
1f205a47 UD |
1261 | end of these two parts without noticing a difference, we return to the |
1262 | standard comparison mode. There are two types of numeric parts: | |
f2ea0f5b | 1263 | "integral" and "fractional" (those begin with a '0'). The types |
1f205a47 UD |
1264 | of the numeric parts affect the way we sort them: |
1265 | ||
1266 | @itemize @bullet | |
1267 | @item | |
1268 | integral/integral: we compare values as you would expect. | |
1269 | ||
1270 | @item | |
f2ea0f5b | 1271 | fractional/integral: the fractional part is less than the integral one. |
1f205a47 UD |
1272 | Again, no surprise. |
1273 | ||
1274 | @item | |
f2ea0f5b UD |
1275 | fractional/fractional: the things become a bit more complex. |
1276 | If the common prefix contains only leading zeroes, the longest part is less | |
1277 | than the other one; else the comparison behaves normally. | |
1f205a47 UD |
1278 | @end itemize |
1279 | ||
1280 | @smallexample | |
1281 | strverscmp ("no digit", "no digit") | |
0bc93a2f | 1282 | @result{} 0 /* @r{same behavior as strcmp.} */ |
1f205a47 UD |
1283 | strverscmp ("item#99", "item#100") |
1284 | @result{} <0 /* @r{same prefix, but 99 < 100.} */ | |
1285 | strverscmp ("alpha1", "alpha001") | |
f2ea0f5b | 1286 | @result{} >0 /* @r{fractional part inferior to integral one.} */ |
1f205a47 | 1287 | strverscmp ("part1_f012", "part1_f01") |
f2ea0f5b | 1288 | @result{} >0 /* @r{two fractional parts.} */ |
1f205a47 UD |
1289 | strverscmp ("foo.009", "foo.0") |
1290 | @result{} <0 /* @r{idem, but with leading zeroes only.} */ | |
1291 | @end smallexample | |
1292 | ||
f2ea0f5b | 1293 | This function is especially useful when dealing with filename sorting, |
1f205a47 UD |
1294 | because filenames frequently hold indices/version numbers. |
1295 | ||
1296 | @code{strverscmp} is a GNU extension. | |
1297 | @end deftypefun | |
1298 | ||
28f540f4 RM |
1299 | @comment string.h |
1300 | @comment BSD | |
1301 | @deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) | |
1302 | This is an obsolete alias for @code{memcmp}, derived from BSD. | |
1303 | @end deftypefun | |
1304 | ||
b4012b75 | 1305 | @node Collation Functions |
28f540f4 RM |
1306 | @section Collation Functions |
1307 | ||
1308 | @cindex collating strings | |
1309 | @cindex string collation functions | |
1310 | ||
1311 | In some locales, the conventions for lexicographic ordering differ from | |
1312 | the strict numeric ordering of character codes. For example, in Spanish | |
1313 | most glyphs with diacritical marks such as accents are not considered | |
1314 | distinct letters for the purposes of collation. On the other hand, the | |
1315 | two-character sequence @samp{ll} is treated as a single letter that is | |
1316 | collated immediately after @samp{l}. | |
1317 | ||
1318 | You can use the functions @code{strcoll} and @code{strxfrm} (declared in | |
8a2f1f5b UD |
1319 | the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm} |
1320 | (declared in the headers file @file{wchar}) to compare strings using a | |
1321 | collation ordering appropriate for the current locale. The locale used | |
1322 | by these functions in particular can be specified by setting the locale | |
1323 | for the @code{LC_COLLATE} category; see @ref{Locales}. | |
28f540f4 | 1324 | @pindex string.h |
8a2f1f5b | 1325 | @pindex wchar.h |
28f540f4 RM |
1326 | |
1327 | In the standard C locale, the collation sequence for @code{strcoll} is | |
8a2f1f5b UD |
1328 | the same as that for @code{strcmp}. Similarly, @code{wcscoll} and |
1329 | @code{wcscmp} are the same in this situation. | |
28f540f4 RM |
1330 | |
1331 | Effectively, the way these functions work is by applying a mapping to | |
1332 | transform the characters in a string to a byte sequence that represents | |
1333 | the string's position in the collating sequence of the current locale. | |
1334 | Comparing two such byte sequences in a simple fashion is equivalent to | |
1335 | comparing the strings with the locale's collating sequence. | |
1336 | ||
8a2f1f5b UD |
1337 | The functions @code{strcoll} and @code{wcscoll} perform this translation |
1338 | implicitly, in order to do one comparison. By contrast, @code{strxfrm} | |
1339 | and @code{wcsxfrm} perform the mapping explicitly. If you are making | |
1340 | multiple comparisons using the same string or set of strings, it is | |
1341 | likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to | |
1342 | transform all the strings just once, and subsequently compare the | |
1343 | transformed strings with @code{strcmp} or @code{wcscmp}. | |
28f540f4 RM |
1344 | |
1345 | @comment string.h | |
f65fd747 | 1346 | @comment ISO |
28f540f4 RM |
1347 | @deftypefun int strcoll (const char *@var{s1}, const char *@var{s2}) |
1348 | The @code{strcoll} function is similar to @code{strcmp} but uses the | |
1349 | collating sequence of the current locale for collation (the | |
1350 | @code{LC_COLLATE} locale). | |
1351 | @end deftypefun | |
1352 | ||
8a2f1f5b UD |
1353 | @comment wchar.h |
1354 | @comment ISO | |
1355 | @deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) | |
1356 | The @code{wcscoll} function is similar to @code{wcscmp} but uses the | |
1357 | collating sequence of the current locale for collation (the | |
1358 | @code{LC_COLLATE} locale). | |
1359 | @end deftypefun | |
1360 | ||
28f540f4 RM |
1361 | Here is an example of sorting an array of strings, using @code{strcoll} |
1362 | to compare them. The actual sort algorithm is not written here; it | |
1363 | comes from @code{qsort} (@pxref{Array Sort Function}). The job of the | |
1364 | code shown here is to say how to compare the strings while sorting them. | |
1365 | (Later on in this section, we will show a way to do this more | |
1366 | efficiently using @code{strxfrm}.) | |
1367 | ||
1368 | @smallexample | |
1369 | /* @r{This is the comparison function used with @code{qsort}.} */ | |
1370 | ||
1371 | int | |
1372 | compare_elements (char **p1, char **p2) | |
1373 | @{ | |
1374 | return strcoll (*p1, *p2); | |
1375 | @} | |
1376 | ||
1377 | /* @r{This is the entry point---the function to sort} | |
1378 | @r{strings using the locale's collating sequence.} */ | |
1379 | ||
1380 | void | |
1381 | sort_strings (char **array, int nstrings) | |
1382 | @{ | |
1383 | /* @r{Sort @code{temp_array} by comparing the strings.} */ | |
9fc19e48 UD |
1384 | qsort (array, nstrings, |
1385 | sizeof (char *), compare_elements); | |
28f540f4 RM |
1386 | @} |
1387 | @end smallexample | |
1388 | ||
1389 | @cindex converting string to collation order | |
1390 | @comment string.h | |
f65fd747 | 1391 | @comment ISO |
8a2f1f5b UD |
1392 | @deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
1393 | The function @code{strxfrm} transforms the string @var{from} using the | |
1394 | collation transformation determined by the locale currently selected for | |
28f540f4 RM |
1395 | collation, and stores the transformed string in the array @var{to}. Up |
1396 | to @var{size} characters (including a terminating null character) are | |
1397 | stored. | |
1398 | ||
1399 | The behavior is undefined if the strings @var{to} and @var{from} | |
1400 | overlap; see @ref{Copying and Concatenation}. | |
1401 | ||
1402 | The return value is the length of the entire transformed string. This | |
1403 | value is not affected by the value of @var{size}, but if it is greater | |
a5113b14 UD |
1404 | or equal than @var{size}, it means that the transformed string did not |
1405 | entirely fit in the array @var{to}. In this case, only as much of the | |
1406 | string as actually fits was stored. To get the whole transformed | |
1407 | string, call @code{strxfrm} again with a bigger output array. | |
28f540f4 RM |
1408 | |
1409 | The transformed string may be longer than the original string, and it | |
1410 | may also be shorter. | |
1411 | ||
1412 | If @var{size} is zero, no characters are stored in @var{to}. In this | |
1413 | case, @code{strxfrm} simply returns the number of characters that would | |
1414 | be the length of the transformed string. This is useful for determining | |
8a2f1f5b UD |
1415 | what size the allocated array should be. It does not matter what |
1416 | @var{to} is if @var{size} is zero; @var{to} may even be a null pointer. | |
1417 | @end deftypefun | |
1418 | ||
1419 | @comment wchar.h | |
1420 | @comment ISO | |
1421 | @deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size}) | |
1422 | The function @code{wcsxfrm} transforms wide character string @var{wfrom} | |
1423 | using the collation transformation determined by the locale currently | |
1424 | selected for collation, and stores the transformed string in the array | |
1425 | @var{wto}. Up to @var{size} wide characters (including a terminating null | |
1426 | character) are stored. | |
1427 | ||
1428 | The behavior is undefined if the strings @var{wto} and @var{wfrom} | |
1429 | overlap; see @ref{Copying and Concatenation}. | |
1430 | ||
1431 | The return value is the length of the entire transformed wide character | |
1432 | string. This value is not affected by the value of @var{size}, but if | |
1433 | it is greater or equal than @var{size}, it means that the transformed | |
1434 | wide character string did not entirely fit in the array @var{wto}. In | |
1435 | this case, only as much of the wide character string as actually fits | |
1436 | was stored. To get the whole transformed wide character string, call | |
1437 | @code{wcsxfrm} again with a bigger output array. | |
1438 | ||
1439 | The transformed wide character string may be longer than the original | |
1440 | wide character string, and it may also be shorter. | |
1441 | ||
1442 | If @var{size} is zero, no characters are stored in @var{to}. In this | |
1443 | case, @code{wcsxfrm} simply returns the number of wide characters that | |
1444 | would be the length of the transformed wide character string. This is | |
1445 | useful for determining what size the allocated array should be (remember | |
1446 | to multiply with @code{sizeof (wchar_t)}). It does not matter what | |
1447 | @var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer. | |
28f540f4 RM |
1448 | @end deftypefun |
1449 | ||
1450 | Here is an example of how you can use @code{strxfrm} when | |
1451 | you plan to do many comparisons. It does the same thing as the previous | |
1452 | example, but much faster, because it has to transform each string only | |
1453 | once, no matter how many times it is compared with other strings. Even | |
1454 | the time needed to allocate and free storage is much less than the time | |
1455 | we save, when there are many strings. | |
1456 | ||
1457 | @smallexample | |
1458 | struct sorter @{ char *input; char *transformed; @}; | |
1459 | ||
1460 | /* @r{This is the comparison function used with @code{qsort}} | |
1461 | @r{to sort an array of @code{struct sorter}.} */ | |
1462 | ||
1463 | int | |
1464 | compare_elements (struct sorter *p1, struct sorter *p2) | |
1465 | @{ | |
1466 | return strcmp (p1->transformed, p2->transformed); | |
1467 | @} | |
1468 | ||
1469 | /* @r{This is the entry point---the function to sort} | |
1470 | @r{strings using the locale's collating sequence.} */ | |
1471 | ||
1472 | void | |
1473 | sort_strings_fast (char **array, int nstrings) | |
1474 | @{ | |
1475 | struct sorter temp_array[nstrings]; | |
1476 | int i; | |
1477 | ||
1478 | /* @r{Set up @code{temp_array}. Each element contains} | |
1479 | @r{one input string and its transformed string.} */ | |
1480 | for (i = 0; i < nstrings; i++) | |
1481 | @{ | |
1482 | size_t length = strlen (array[i]) * 2; | |
a5113b14 | 1483 | char *transformed; |
f2ea0f5b | 1484 | size_t transformed_length; |
28f540f4 RM |
1485 | |
1486 | temp_array[i].input = array[i]; | |
1487 | ||
a5113b14 UD |
1488 | /* @r{First try a buffer perhaps big enough.} */ |
1489 | transformed = (char *) xmalloc (length); | |
1490 | ||
1491 | /* @r{Transform @code{array[i]}.} */ | |
1492 | transformed_length = strxfrm (transformed, array[i], length); | |
1493 | ||
1494 | /* @r{If the buffer was not large enough, resize it} | |
1495 | @r{and try again.} */ | |
1496 | if (transformed_length >= length) | |
28f540f4 | 1497 | @{ |
a5113b14 UD |
1498 | /* @r{Allocate the needed space. +1 for terminating} |
1499 | @r{@code{NUL} character.} */ | |
1500 | transformed = (char *) xrealloc (transformed, | |
1501 | transformed_length + 1); | |
1502 | ||
1503 | /* @r{The return value is not interesting because we know} | |
1504 | @r{how long the transformed string is.} */ | |
dd7d45e8 UD |
1505 | (void) strxfrm (transformed, array[i], |
1506 | transformed_length + 1); | |
28f540f4 | 1507 | @} |
a5113b14 UD |
1508 | |
1509 | temp_array[i].transformed = transformed; | |
28f540f4 RM |
1510 | @} |
1511 | ||
1512 | /* @r{Sort @code{temp_array} by comparing transformed strings.} */ | |
1513 | qsort (temp_array, sizeof (struct sorter), | |
1514 | nstrings, compare_elements); | |
1515 | ||
1516 | /* @r{Put the elements back in the permanent array} | |
1517 | @r{in their sorted order.} */ | |
1518 | for (i = 0; i < nstrings; i++) | |
1519 | array[i] = temp_array[i].input; | |
1520 | ||
1521 | /* @r{Free the strings we allocated.} */ | |
1522 | for (i = 0; i < nstrings; i++) | |
1523 | free (temp_array[i].transformed); | |
1524 | @} | |
1525 | @end smallexample | |
1526 | ||
8a2f1f5b UD |
1527 | The interesting part of this code for the wide character version would |
1528 | look like this: | |
1529 | ||
1530 | @smallexample | |
1531 | void | |
1532 | sort_strings_fast (wchar_t **array, int nstrings) | |
1533 | @{ | |
1534 | @dots{} | |
1535 | /* @r{Transform @code{array[i]}.} */ | |
1536 | transformed_length = wcsxfrm (transformed, array[i], length); | |
1537 | ||
1538 | /* @r{If the buffer was not large enough, resize it} | |
1539 | @r{and try again.} */ | |
1540 | if (transformed_length >= length) | |
1541 | @{ | |
1542 | /* @r{Allocate the needed space. +1 for terminating} | |
1543 | @r{@code{NUL} character.} */ | |
1544 | transformed = (wchar_t *) xrealloc (transformed, | |
1545 | (transformed_length + 1) | |
1546 | * sizeof (wchar_t)); | |
1547 | ||
1548 | /* @r{The return value is not interesting because we know} | |
1549 | @r{how long the transformed string is.} */ | |
1550 | (void) wcsxfrm (transformed, array[i], | |
1551 | transformed_length + 1); | |
1552 | @} | |
1553 | @dots{} | |
1554 | @end smallexample | |
1555 | ||
1556 | @noindent | |
1557 | Note the additional multiplication with @code{sizeof (wchar_t)} in the | |
1558 | @code{realloc} call. | |
1559 | ||
1560 | @strong{Compatibility Note:} The string collation functions are a new | |
976780fd | 1561 | feature of @w{ISO C90}. Older C dialects have no equivalent feature. |
8a2f1f5b UD |
1562 | The wide character versions were introduced in @w{Amendment 1} to @w{ISO |
1563 | C90}. | |
28f540f4 | 1564 | |
b4012b75 | 1565 | @node Search Functions |
28f540f4 RM |
1566 | @section Search Functions |
1567 | ||
1568 | This section describes library functions which perform various kinds | |
1569 | of searching operations on strings and arrays. These functions are | |
1570 | declared in the header file @file{string.h}. | |
1571 | @pindex string.h | |
1572 | @cindex search functions (for strings) | |
1573 | @cindex string search functions | |
1574 | ||
1575 | @comment string.h | |
f65fd747 | 1576 | @comment ISO |
28f540f4 RM |
1577 | @deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size}) |
1578 | This function finds the first occurrence of the byte @var{c} (converted | |
1579 | to an @code{unsigned char}) in the initial @var{size} bytes of the | |
1580 | object beginning at @var{block}. The return value is a pointer to the | |
1581 | located byte, or a null pointer if no match was found. | |
1582 | @end deftypefun | |
1583 | ||
8a2f1f5b UD |
1584 | @comment wchar.h |
1585 | @comment ISO | |
1586 | @deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size}) | |
1587 | This function finds the first occurrence of the wide character @var{wc} | |
1588 | in the initial @var{size} wide characters of the object beginning at | |
1589 | @var{block}. The return value is a pointer to the located wide | |
1590 | character, or a null pointer if no match was found. | |
1591 | @end deftypefun | |
1592 | ||
87b56f36 UD |
1593 | @comment string.h |
1594 | @comment GNU | |
1595 | @deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c}) | |
1596 | Often the @code{memchr} function is used with the knowledge that the | |
1597 | byte @var{c} is available in the memory block specified by the | |
1598 | parameters. But this means that the @var{size} parameter is not really | |
1599 | needed and that the tests performed with it at runtime (to check whether | |
1600 | the end of the block is reached) are not needed. | |
1601 | ||
1602 | The @code{rawmemchr} function exists for just this situation which is | |
1603 | surprisingly frequent. The interface is similar to @code{memchr} except | |
1604 | that the @var{size} parameter is missing. The function will look beyond | |
1605 | the end of the block pointed to by @var{block} in case the programmer | |
6be569a4 | 1606 | made an error in assuming that the byte @var{c} is present in the block. |
87b56f36 UD |
1607 | In this case the result is unspecified. Otherwise the return value is a |
1608 | pointer to the located byte. | |
1609 | ||
1610 | This function is of special interest when looking for the end of a | |
1611 | string. Since all strings are terminated by a null byte a call like | |
1612 | ||
1613 | @smallexample | |
1614 | rawmemchr (str, '\0') | |
1615 | @end smallexample | |
1616 | ||
8a2f1f5b | 1617 | @noindent |
87b56f36 UD |
1618 | will never go beyond the end of the string. |
1619 | ||
1620 | This function is a GNU extension. | |
1621 | @end deftypefun | |
1622 | ||
ca747856 RM |
1623 | @comment string.h |
1624 | @comment GNU | |
1625 | @deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size}) | |
1626 | The function @code{memrchr} is like @code{memchr}, except that it searches | |
1627 | backwards from the end of the block defined by @var{block} and @var{size} | |
1628 | (instead of forwards from the front). | |
4efcb713 UD |
1629 | |
1630 | This function is a GNU extension. | |
a2d63612 | 1631 | @end deftypefun |
ca747856 | 1632 | |
28f540f4 | 1633 | @comment string.h |
f65fd747 | 1634 | @comment ISO |
28f540f4 RM |
1635 | @deftypefun {char *} strchr (const char *@var{string}, int @var{c}) |
1636 | The @code{strchr} function finds the first occurrence of the character | |
1637 | @var{c} (converted to a @code{char}) in the null-terminated string | |
1638 | beginning at @var{string}. The return value is a pointer to the located | |
1639 | character, or a null pointer if no match was found. | |
1640 | ||
1641 | For example, | |
1642 | @smallexample | |
1643 | strchr ("hello, world", 'l') | |
1644 | @result{} "llo, world" | |
1645 | strchr ("hello, world", '?') | |
1646 | @result{} NULL | |
a5113b14 | 1647 | @end smallexample |
28f540f4 RM |
1648 | |
1649 | The terminating null character is considered to be part of the string, | |
1650 | so you can use this function get a pointer to the end of a string by | |
0520adde FB |
1651 | specifying a null character as the value of the @var{c} argument. |
1652 | ||
1653 | When @code{strchr} returns a null pointer, it does not let you know | |
1654 | the position of the terminating null character it has found. If you | |
1655 | need that information, it is better (but less portable) to use | |
1656 | @code{strchrnul} than to search for it a second time. | |
8a2f1f5b UD |
1657 | @end deftypefun |
1658 | ||
1659 | @comment wchar.h | |
1660 | @comment ISO | |
1661 | @deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, int @var{wc}) | |
1662 | The @code{wcschr} function finds the first occurrence of the wide | |
1663 | character @var{wc} in the null-terminated wide character string | |
1664 | beginning at @var{wstring}. The return value is a pointer to the | |
1665 | located wide character, or a null pointer if no match was found. | |
1666 | ||
1667 | The terminating null character is considered to be part of the wide | |
1668 | character string, so you can use this function get a pointer to the end | |
1669 | of a wide character string by specifying a null wude character as the | |
1670 | value of the @var{wc} argument. It would be better (but less portable) | |
1671 | to use @code{wcschrnul} in this case, though. | |
28f540f4 RM |
1672 | @end deftypefun |
1673 | ||
1674 | @comment string.h | |
87b56f36 | 1675 | @comment GNU |
0e4ee106 UD |
1676 | @deftypefun {char *} strchrnul (const char *@var{string}, int @var{c}) |
1677 | @code{strchrnul} is the same as @code{strchr} except that if it does | |
ec28fc7c | 1678 | not find the character, it returns a pointer to string's terminating |
0e4ee106 | 1679 | null character rather than a null pointer. |
8a2f1f5b UD |
1680 | |
1681 | This function is a GNU extension. | |
1682 | @end deftypefun | |
1683 | ||
1684 | @comment wchar.h | |
1685 | @comment GNU | |
1686 | @deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc}) | |
1687 | @code{wcschrnul} is the same as @code{wcschr} except that if it does not | |
1688 | find the wide character, it returns a pointer to wide character string's | |
1689 | terminating null wide character rather than a null pointer. | |
1690 | ||
1691 | This function is a GNU extension. | |
28f540f4 RM |
1692 | @end deftypefun |
1693 | ||
ec28fc7c | 1694 | One useful, but unusual, use of the @code{strchr} |
ee2752ea UD |
1695 | function is when one wants to have a pointer pointing to the NUL byte |
1696 | terminating a string. This is often written in this way: | |
1697 | ||
1698 | @smallexample | |
1699 | s += strlen (s); | |
1700 | @end smallexample | |
1701 | ||
1702 | @noindent | |
1703 | This is almost optimal but the addition operation duplicated a bit of | |
1704 | the work already done in the @code{strlen} function. A better solution | |
1705 | is this: | |
1706 | ||
1707 | @smallexample | |
1708 | s = strchr (s, '\0'); | |
1709 | @end smallexample | |
1710 | ||
1711 | There is no restriction on the second parameter of @code{strchr} so it | |
1712 | could very well also be the NUL character. Those readers thinking very | |
1713 | hard about this might now point out that the @code{strchr} function is | |
8c474db5 | 1714 | more expensive than the @code{strlen} function since we have two abort |
1f77f049 | 1715 | criteria. This is right. But in @theglibc{} the implementation of |
0e4ee106 | 1716 | @code{strchr} is optimized in a special way so that @code{strchr} |
8c474db5 | 1717 | actually is faster. |
ee2752ea | 1718 | |
28f540f4 | 1719 | @comment string.h |
f65fd747 | 1720 | @comment ISO |
28f540f4 RM |
1721 | @deftypefun {char *} strrchr (const char *@var{string}, int @var{c}) |
1722 | The function @code{strrchr} is like @code{strchr}, except that it searches | |
1723 | backwards from the end of the string @var{string} (instead of forwards | |
1724 | from the front). | |
1725 | ||
1726 | For example, | |
1727 | @smallexample | |
1728 | strrchr ("hello, world", 'l') | |
1729 | @result{} "ld" | |
1730 | @end smallexample | |
1731 | @end deftypefun | |
1732 | ||
8a2f1f5b UD |
1733 | @comment wchar.h |
1734 | @comment ISO | |
1735 | @deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{c}) | |
1736 | The function @code{wcsrchr} is like @code{wcschr}, except that it searches | |
1737 | backwards from the end of the string @var{wstring} (instead of forwards | |
1738 | from the front). | |
1739 | @end deftypefun | |
1740 | ||
28f540f4 | 1741 | @comment string.h |
f65fd747 | 1742 | @comment ISO |
28f540f4 RM |
1743 | @deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle}) |
1744 | This is like @code{strchr}, except that it searches @var{haystack} for a | |
1745 | substring @var{needle} rather than just a single character. It | |
1746 | returns a pointer into the string @var{haystack} that is the first | |
1747 | character of the substring, or a null pointer if no match was found. If | |
1748 | @var{needle} is an empty string, the function returns @var{haystack}. | |
1749 | ||
1750 | For example, | |
1751 | @smallexample | |
1752 | strstr ("hello, world", "l") | |
1753 | @result{} "llo, world" | |
1754 | strstr ("hello, world", "wo") | |
1755 | @result{} "world" | |
1756 | @end smallexample | |
1757 | @end deftypefun | |
1758 | ||
8a2f1f5b UD |
1759 | @comment wchar.h |
1760 | @comment ISO | |
1761 | @deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle}) | |
1762 | This is like @code{wcschr}, except that it searches @var{haystack} for a | |
1763 | substring @var{needle} rather than just a single wide character. It | |
1764 | returns a pointer into the string @var{haystack} that is the first wide | |
1765 | character of the substring, or a null pointer if no match was found. If | |
1766 | @var{needle} is an empty string, the function returns @var{haystack}. | |
1767 | @end deftypefun | |
1768 | ||
1769 | @comment wchar.h | |
1770 | @comment XPG | |
1771 | @deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle}) | |
5bd4d368 | 1772 | @code{wcswcs} is an deprecated alias for @code{wcsstr}. This is the |
8a2f1f5b UD |
1773 | name originally used in the X/Open Portability Guide before the |
1774 | @w{Amendment 1} to @w{ISO C90} was published. | |
1775 | @end deftypefun | |
1776 | ||
28f540f4 | 1777 | |
0e4ee106 | 1778 | @comment string.h |
8a2f1f5b | 1779 | @comment GNU |
0e4ee106 UD |
1780 | @deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle}) |
1781 | This is like @code{strstr}, except that it ignores case in searching for | |
1782 | the substring. Like @code{strcasecmp}, it is locale dependent how | |
1783 | uppercase and lowercase characters are related. | |
1784 | ||
1785 | ||
1786 | For example, | |
1787 | @smallexample | |
d6868416 | 1788 | strcasestr ("hello, world", "L") |
0e4ee106 | 1789 | @result{} "llo, world" |
d6868416 | 1790 | strcasestr ("hello, World", "wo") |
0e4ee106 UD |
1791 | @result{} "World" |
1792 | @end smallexample | |
1793 | @end deftypefun | |
1794 | ||
1795 | ||
28f540f4 RM |
1796 | @comment string.h |
1797 | @comment GNU | |
63551311 | 1798 | @deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len}) |
28f540f4 RM |
1799 | This is like @code{strstr}, but @var{needle} and @var{haystack} are byte |
1800 | arrays rather than null-terminated strings. @var{needle-len} is the | |
1801 | length of @var{needle} and @var{haystack-len} is the length of | |
1802 | @var{haystack}.@refill | |
1803 | ||
1804 | This function is a GNU extension. | |
1805 | @end deftypefun | |
1806 | ||
1807 | @comment string.h | |
f65fd747 | 1808 | @comment ISO |
28f540f4 RM |
1809 | @deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset}) |
1810 | The @code{strspn} (``string span'') function returns the length of the | |
1811 | initial substring of @var{string} that consists entirely of characters that | |
1812 | are members of the set specified by the string @var{skipset}. The order | |
1813 | of the characters in @var{skipset} is not important. | |
1814 | ||
1815 | For example, | |
1816 | @smallexample | |
1817 | strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz") | |
1818 | @result{} 5 | |
1819 | @end smallexample | |
8a2f1f5b UD |
1820 | |
1821 | Note that ``character'' is here used in the sense of byte. In a string | |
1822 | using a multibyte character encoding (abstract) character consisting of | |
1823 | more than one byte are not treated as an entity. Each byte is treated | |
1824 | separately. The function is not locale-dependent. | |
1825 | @end deftypefun | |
1826 | ||
1827 | @comment wchar.h | |
1828 | @comment ISO | |
1829 | @deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset}) | |
1830 | The @code{wcsspn} (``wide character string span'') function returns the | |
1831 | length of the initial substring of @var{wstring} that consists entirely | |
1832 | of wide characters that are members of the set specified by the string | |
1833 | @var{skipset}. The order of the wide characters in @var{skipset} is not | |
1834 | important. | |
28f540f4 RM |
1835 | @end deftypefun |
1836 | ||
1837 | @comment string.h | |
f65fd747 | 1838 | @comment ISO |
28f540f4 RM |
1839 | @deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset}) |
1840 | The @code{strcspn} (``string complement span'') function returns the length | |
1841 | of the initial substring of @var{string} that consists entirely of characters | |
1842 | that are @emph{not} members of the set specified by the string @var{stopset}. | |
1843 | (In other words, it returns the offset of the first character in @var{string} | |
1844 | that is a member of the set @var{stopset}.) | |
1845 | ||
1846 | For example, | |
1847 | @smallexample | |
1848 | strcspn ("hello, world", " \t\n,.;!?") | |
1849 | @result{} 5 | |
1850 | @end smallexample | |
8a2f1f5b UD |
1851 | |
1852 | Note that ``character'' is here used in the sense of byte. In a string | |
1853 | using a multibyte character encoding (abstract) character consisting of | |
1854 | more than one byte are not treated as an entity. Each byte is treated | |
1855 | separately. The function is not locale-dependent. | |
1856 | @end deftypefun | |
1857 | ||
1858 | @comment wchar.h | |
1859 | @comment ISO | |
1860 | @deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset}) | |
1861 | The @code{wcscspn} (``wide character string complement span'') function | |
1862 | returns the length of the initial substring of @var{wstring} that | |
1863 | consists entirely of wide characters that are @emph{not} members of the | |
1864 | set specified by the string @var{stopset}. (In other words, it returns | |
1865 | the offset of the first character in @var{string} that is a member of | |
1866 | the set @var{stopset}.) | |
28f540f4 RM |
1867 | @end deftypefun |
1868 | ||
1869 | @comment string.h | |
f65fd747 | 1870 | @comment ISO |
28f540f4 RM |
1871 | @deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset}) |
1872 | The @code{strpbrk} (``string pointer break'') function is related to | |
1873 | @code{strcspn}, except that it returns a pointer to the first character | |
1874 | in @var{string} that is a member of the set @var{stopset} instead of the | |
1875 | length of the initial substring. It returns a null pointer if no such | |
1876 | character from @var{stopset} is found. | |
1877 | ||
1878 | @c @group Invalid outside the example. | |
1879 | For example, | |
1880 | ||
1881 | @smallexample | |
1882 | strpbrk ("hello, world", " \t\n,.;!?") | |
1883 | @result{} ", world" | |
1884 | @end smallexample | |
1885 | @c @end group | |
8a2f1f5b UD |
1886 | |
1887 | Note that ``character'' is here used in the sense of byte. In a string | |
1888 | using a multibyte character encoding (abstract) character consisting of | |
1889 | more than one byte are not treated as an entity. Each byte is treated | |
1890 | separately. The function is not locale-dependent. | |
1891 | @end deftypefun | |
1892 | ||
1893 | @comment wchar.h | |
1894 | @comment ISO | |
1895 | @deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset}) | |
1896 | The @code{wcspbrk} (``wide character string pointer break'') function is | |
1897 | related to @code{wcscspn}, except that it returns a pointer to the first | |
1898 | wide character in @var{wstring} that is a member of the set | |
1899 | @var{stopset} instead of the length of the initial substring. It | |
1900 | returns a null pointer if no such character from @var{stopset} is found. | |
28f540f4 RM |
1901 | @end deftypefun |
1902 | ||
0e4ee106 UD |
1903 | |
1904 | @subsection Compatibility String Search Functions | |
1905 | ||
1906 | @comment string.h | |
1907 | @comment BSD | |
1908 | @deftypefun {char *} index (const char *@var{string}, int @var{c}) | |
1909 | @code{index} is another name for @code{strchr}; they are exactly the same. | |
1910 | New code should always use @code{strchr} since this name is defined in | |
1911 | @w{ISO C} while @code{index} is a BSD invention which never was available | |
1912 | on @w{System V} derived systems. | |
1913 | @end deftypefun | |
1914 | ||
1915 | @comment string.h | |
1916 | @comment BSD | |
1917 | @deftypefun {char *} rindex (const char *@var{string}, int @var{c}) | |
1918 | @code{rindex} is another name for @code{strrchr}; they are exactly the same. | |
1919 | New code should always use @code{strrchr} since this name is defined in | |
1920 | @w{ISO C} while @code{rindex} is a BSD invention which never was available | |
1921 | on @w{System V} derived systems. | |
1922 | @end deftypefun | |
1923 | ||
b4012b75 | 1924 | @node Finding Tokens in a String |
28f540f4 RM |
1925 | @section Finding Tokens in a String |
1926 | ||
28f540f4 RM |
1927 | @cindex tokenizing strings |
1928 | @cindex breaking a string into tokens | |
1929 | @cindex parsing tokens from a string | |
1930 | It's fairly common for programs to have a need to do some simple kinds | |
1931 | of lexical analysis and parsing, such as splitting a command string up | |
1932 | into tokens. You can do this with the @code{strtok} function, declared | |
1933 | in the header file @file{string.h}. | |
1934 | @pindex string.h | |
1935 | ||
1936 | @comment string.h | |
f65fd747 | 1937 | @comment ISO |
8a2f1f5b | 1938 | @deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters}) |
28f540f4 RM |
1939 | A string can be split into tokens by making a series of calls to the |
1940 | function @code{strtok}. | |
1941 | ||
1942 | The string to be split up is passed as the @var{newstring} argument on | |
1943 | the first call only. The @code{strtok} function uses this to set up | |
1944 | some internal state information. Subsequent calls to get additional | |
1945 | tokens from the same string are indicated by passing a null pointer as | |
1946 | the @var{newstring} argument. Calling @code{strtok} with another | |
1947 | non-null @var{newstring} argument reinitializes the state information. | |
1948 | It is guaranteed that no other library function ever calls @code{strtok} | |
1949 | behind your back (which would mess up this internal state information). | |
1950 | ||
1951 | The @var{delimiters} argument is a string that specifies a set of delimiters | |
1952 | that may surround the token being extracted. All the initial characters | |
1953 | that are members of this set are discarded. The first character that is | |
1954 | @emph{not} a member of this set of delimiters marks the beginning of the | |
1955 | next token. The end of the token is found by looking for the next | |
1956 | character that is a member of the delimiter set. This character in the | |
1957 | original string @var{newstring} is overwritten by a null character, and the | |
1958 | pointer to the beginning of the token in @var{newstring} is returned. | |
1959 | ||
1960 | On the next call to @code{strtok}, the searching begins at the next | |
1961 | character beyond the one that marked the end of the previous token. | |
1962 | Note that the set of delimiters @var{delimiters} do not have to be the | |
1963 | same on every call in a series of calls to @code{strtok}. | |
1964 | ||
1965 | If the end of the string @var{newstring} is reached, or if the remainder of | |
1966 | string consists only of delimiter characters, @code{strtok} returns | |
1967 | a null pointer. | |
8a2f1f5b | 1968 | |
8a2f1f5b UD |
1969 | Note that ``character'' is here used in the sense of byte. In a string |
1970 | using a multibyte character encoding (abstract) character consisting of | |
1971 | more than one byte are not treated as an entity. Each byte is treated | |
1972 | separately. The function is not locale-dependent. | |
1973 | @end deftypefun | |
1974 | ||
1975 | @comment wchar.h | |
1976 | @comment ISO | |
1977 | @deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const char *@var{delimiters}) | |
1978 | A string can be split into tokens by making a series of calls to the | |
1979 | function @code{wcstok}. | |
1980 | ||
1981 | The string to be split up is passed as the @var{newstring} argument on | |
1982 | the first call only. The @code{wcstok} function uses this to set up | |
1983 | some internal state information. Subsequent calls to get additional | |
1984 | tokens from the same wide character string are indicated by passing a | |
1985 | null pointer as the @var{newstring} argument. Calling @code{wcstok} | |
1986 | with another non-null @var{newstring} argument reinitializes the state | |
1987 | information. It is guaranteed that no other library function ever calls | |
1988 | @code{wcstok} behind your back (which would mess up this internal state | |
1989 | information). | |
1990 | ||
1991 | The @var{delimiters} argument is a wide character string that specifies | |
1992 | a set of delimiters that may surround the token being extracted. All | |
1993 | the initial wide characters that are members of this set are discarded. | |
1994 | The first wide character that is @emph{not} a member of this set of | |
1995 | delimiters marks the beginning of the next token. The end of the token | |
1996 | is found by looking for the next wide character that is a member of the | |
1997 | delimiter set. This wide character in the original wide character | |
1998 | string @var{newstring} is overwritten by a null wide character, and the | |
1999 | pointer to the beginning of the token in @var{newstring} is returned. | |
2000 | ||
2001 | On the next call to @code{wcstok}, the searching begins at the next | |
2002 | wide character beyond the one that marked the end of the previous token. | |
2003 | Note that the set of delimiters @var{delimiters} do not have to be the | |
2004 | same on every call in a series of calls to @code{wcstok}. | |
2005 | ||
2006 | If the end of the wide character string @var{newstring} is reached, or | |
2007 | if the remainder of string consists only of delimiter wide characters, | |
2008 | @code{wcstok} returns a null pointer. | |
2009 | ||
2010 | Note that ``character'' is here used in the sense of byte. In a string | |
2011 | using a multibyte character encoding (abstract) character consisting of | |
2012 | more than one byte are not treated as an entity. Each byte is treated | |
2013 | separately. The function is not locale-dependent. | |
28f540f4 RM |
2014 | @end deftypefun |
2015 | ||
8a2f1f5b UD |
2016 | @strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string |
2017 | they is parsing, you should always copy the string to a temporary buffer | |
2018 | before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying and | |
2019 | Concatenation}). If you allow @code{strtok} or @code{wcstok} to modify | |
2020 | a string that came from another part of your program, you are asking for | |
2021 | trouble; that string might be used for other purposes after | |
2022 | @code{strtok} or @code{wcstok} has modified it, and it would not have | |
2023 | the expected value. | |
28f540f4 RM |
2024 | |
2025 | The string that you are operating on might even be a constant. Then | |
8a2f1f5b UD |
2026 | when @code{strtok} or @code{wcstok} tries to modify it, your program |
2027 | will get a fatal signal for writing in read-only memory. @xref{Program | |
2028 | Error Signals}. Even if the operation of @code{strtok} or @code{wcstok} | |
2029 | would not require a modification of the string (e.g., if there is | |
1f77f049 | 2030 | exactly one token) the string can (and in the @glibcadj{} case will) be |
8a2f1f5b | 2031 | modified. |
28f540f4 RM |
2032 | |
2033 | This is a special case of a general principle: if a part of a program | |
2034 | does not have as its purpose the modification of a certain data | |
2035 | structure, then it is error-prone to modify the data structure | |
2036 | temporarily. | |
2037 | ||
8a2f1f5b UD |
2038 | The functions @code{strtok} and @code{wcstok} are not reentrant. |
2039 | @xref{Nonreentrancy}, for a discussion of where and why reentrancy is | |
2040 | important. | |
28f540f4 RM |
2041 | |
2042 | Here is a simple example showing the use of @code{strtok}. | |
2043 | ||
2044 | @comment Yes, this example has been tested. | |
2045 | @smallexample | |
2046 | #include <string.h> | |
2047 | #include <stddef.h> | |
2048 | ||
2049 | @dots{} | |
2050 | ||
5649a1d6 | 2051 | const char string[] = "words separated by spaces -- and, punctuation!"; |
28f540f4 | 2052 | const char delimiters[] = " .,;:!-"; |
5649a1d6 | 2053 | char *token, *cp; |
28f540f4 RM |
2054 | |
2055 | @dots{} | |
2056 | ||
5649a1d6 UD |
2057 | cp = strdupa (string); /* Make writable copy. */ |
2058 | token = strtok (cp, delimiters); /* token => "words" */ | |
28f540f4 RM |
2059 | token = strtok (NULL, delimiters); /* token => "separated" */ |
2060 | token = strtok (NULL, delimiters); /* token => "by" */ | |
2061 | token = strtok (NULL, delimiters); /* token => "spaces" */ | |
2062 | token = strtok (NULL, delimiters); /* token => "and" */ | |
2063 | token = strtok (NULL, delimiters); /* token => "punctuation" */ | |
2064 | token = strtok (NULL, delimiters); /* token => NULL */ | |
2065 | @end smallexample | |
a5113b14 | 2066 | |
1f77f049 | 2067 | @Theglibc{} contains two more functions for tokenizing a string |
8a2f1f5b UD |
2068 | which overcome the limitation of non-reentrancy. They are only |
2069 | available for multibyte character strings. | |
a5113b14 UD |
2070 | |
2071 | @comment string.h | |
2072 | @comment POSIX | |
2073 | @deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr}) | |
dd7d45e8 UD |
2074 | Just like @code{strtok}, this function splits the string into several |
2075 | tokens which can be accessed by successive calls to @code{strtok_r}. | |
2076 | The difference is that the information about the next token is stored in | |
2077 | the space pointed to by the third argument, @var{save_ptr}, which is a | |
2078 | pointer to a string pointer. Calling @code{strtok_r} with a null | |
2079 | pointer for @var{newstring} and leaving @var{save_ptr} between the calls | |
2080 | unchanged does the job without hindering reentrancy. | |
a5113b14 | 2081 | |
976780fd | 2082 | This function is defined in POSIX.1 and can be found on many systems |
a5113b14 UD |
2083 | which support multi-threading. |
2084 | @end deftypefun | |
2085 | ||
2086 | @comment string.h | |
2087 | @comment BSD | |
2088 | @deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter}) | |
0050ad5f UD |
2089 | This function has a similar functionality as @code{strtok_r} with the |
2090 | @var{newstring} argument replaced by the @var{save_ptr} argument. The | |
2091 | initialization of the moving pointer has to be done by the user. | |
2092 | Successive calls to @code{strsep} move the pointer along the tokens | |
2093 | separated by @var{delimiter}, returning the address of the next token | |
2094 | and updating @var{string_ptr} to point to the beginning of the next | |
2095 | token. | |
2096 | ||
2097 | One difference between @code{strsep} and @code{strtok_r} is that if the | |
2098 | input string contains more than one character from @var{delimiter} in a | |
2099 | row @code{strsep} returns an empty string for each pair of characters | |
2100 | from @var{delimiter}. This means that a program normally should test | |
2101 | for @code{strsep} returning an empty string before processing it. | |
9afc8a59 | 2102 | |
a5113b14 UD |
2103 | This function was introduced in 4.3BSD and therefore is widely available. |
2104 | @end deftypefun | |
2105 | ||
2106 | Here is how the above example looks like when @code{strsep} is used. | |
2107 | ||
2108 | @comment Yes, this example has been tested. | |
2109 | @smallexample | |
2110 | #include <string.h> | |
2111 | #include <stddef.h> | |
2112 | ||
2113 | @dots{} | |
2114 | ||
5649a1d6 | 2115 | const char string[] = "words separated by spaces -- and, punctuation!"; |
a5113b14 UD |
2116 | const char delimiters[] = " .,;:!-"; |
2117 | char *running; | |
2118 | char *token; | |
2119 | ||
2120 | @dots{} | |
2121 | ||
5649a1d6 | 2122 | running = strdupa (string); |
a5113b14 UD |
2123 | token = strsep (&running, delimiters); /* token => "words" */ |
2124 | token = strsep (&running, delimiters); /* token => "separated" */ | |
2125 | token = strsep (&running, delimiters); /* token => "by" */ | |
2126 | token = strsep (&running, delimiters); /* token => "spaces" */ | |
9afc8a59 UD |
2127 | token = strsep (&running, delimiters); /* token => "" */ |
2128 | token = strsep (&running, delimiters); /* token => "" */ | |
2129 | token = strsep (&running, delimiters); /* token => "" */ | |
a5113b14 | 2130 | token = strsep (&running, delimiters); /* token => "and" */ |
9afc8a59 | 2131 | token = strsep (&running, delimiters); /* token => "" */ |
a5113b14 | 2132 | token = strsep (&running, delimiters); /* token => "punctuation" */ |
9afc8a59 | 2133 | token = strsep (&running, delimiters); /* token => "" */ |
a5113b14 UD |
2134 | token = strsep (&running, delimiters); /* token => NULL */ |
2135 | @end smallexample | |
b4012b75 | 2136 | |
ec28fc7c UD |
2137 | @comment string.h |
2138 | @comment GNU | |
2139 | @deftypefun {char *} basename (const char *@var{filename}) | |
2140 | The GNU version of the @code{basename} function returns the last | |
9442cd75 | 2141 | component of the path in @var{filename}. This function is the preferred |
ec28fc7c UD |
2142 | usage, since it does not modify the argument, @var{filename}, and |
2143 | respects trailing slashes. The prototype for @code{basename} can be | |
2144 | found in @file{string.h}. Note, this function is overriden by the XPG | |
2145 | version, if @file{libgen.h} is included. | |
2146 | ||
2147 | Example of using GNU @code{basename}: | |
2148 | ||
2149 | @smallexample | |
2150 | #include <string.h> | |
2151 | ||
2152 | int | |
2153 | main (int argc, char *argv[]) | |
2154 | @{ | |
2155 | char *prog = basename (argv[0]); | |
2156 | ||
2157 | if (argc < 2) | |
2158 | @{ | |
2159 | fprintf (stderr, "Usage %s <arg>\n", prog); | |
2160 | exit (1); | |
2161 | @} | |
2162 | ||
2163 | @dots{} | |
2164 | @} | |
2165 | @end smallexample | |
2166 | ||
2167 | @strong{Portability Note:} This function may produce different results | |
2168 | on different systems. | |
2169 | ||
2170 | @end deftypefun | |
2171 | ||
2172 | @comment libgen.h | |
2173 | @comment XPG | |
2174 | @deftypefun {char *} basename (char *@var{path}) | |
2175 | This is the standard XPG defined @code{basename}. It is similar in | |
2176 | spirit to the GNU version, but may modify the @var{path} by removing | |
2177 | trailing '/' characters. If the @var{path} is made up entirely of '/' | |
2178 | characters, then "/" will be returned. Also, if @var{path} is | |
2179 | @code{NULL} or an empty string, then "." is returned. The prototype for | |
e4a5f77d | 2180 | the XPG version can be found in @file{libgen.h}. |
ec28fc7c UD |
2181 | |
2182 | Example of using XPG @code{basename}: | |
2183 | ||
2184 | @smallexample | |
2185 | #include <libgen.h> | |
2186 | ||
2187 | int | |
2188 | main (int argc, char *argv[]) | |
2189 | @{ | |
2190 | char *prog; | |
2191 | char *path = strdupa (argv[0]); | |
2192 | ||
2193 | prog = basename (path); | |
2194 | ||
2195 | if (argc < 2) | |
2196 | @{ | |
2197 | fprintf (stderr, "Usage %s <arg>\n", prog); | |
2198 | exit (1); | |
2199 | @} | |
2200 | ||
2201 | @dots{} | |
2202 | ||
2203 | @} | |
2204 | @end smallexample | |
2205 | @end deftypefun | |
2206 | ||
2207 | @comment libgen.h | |
2208 | @comment XPG | |
2209 | @deftypefun {char *} dirname (char *@var{path}) | |
2210 | The @code{dirname} function is the compliment to the XPG version of | |
2211 | @code{basename}. It returns the parent directory of the file specified | |
2212 | by @var{path}. If @var{path} is @code{NULL}, an empty string, or | |
2213 | contains no '/' characters, then "." is returned. The prototype for this | |
2214 | function can be found in @file{libgen.h}. | |
2215 | @end deftypefun | |
0e4ee106 UD |
2216 | |
2217 | @node strfry | |
2218 | @section strfry | |
2219 | ||
2220 | The function below addresses the perennial programming quandary: ``How do | |
2221 | I take good data in string form and painlessly turn it into garbage?'' | |
2222 | This is actually a fairly simple task for C programmers who do not use | |
1f77f049 JM |
2223 | @theglibc{} string functions, but for programs based on @theglibc{}, |
2224 | the @code{strfry} function is the preferred method for | |
0e4ee106 UD |
2225 | destroying string data. |
2226 | ||
2227 | The prototype for this function is in @file{string.h}. | |
2228 | ||
2229 | @comment string.h | |
2230 | @comment GNU | |
ec28fc7c | 2231 | @deftypefun {char *} strfry (char *@var{string}) |
0e4ee106 UD |
2232 | |
2233 | @code{strfry} creates a pseudorandom anagram of a string, replacing the | |
2234 | input with the anagram in place. For each position in the string, | |
2235 | @code{strfry} swaps it with a position in the string selected at random | |
2236 | (from a uniform distribution). The two positions may be the same. | |
2237 | ||
2238 | The return value of @code{strfry} is always @var{string}. | |
2239 | ||
1f77f049 | 2240 | @strong{Portability Note:} This function is unique to @theglibc{}. |
0e4ee106 UD |
2241 | |
2242 | @end deftypefun | |
2243 | ||
2244 | ||
2245 | @node Trivial Encryption | |
2246 | @section Trivial Encryption | |
2247 | @cindex encryption | |
2248 | ||
2249 | ||
2250 | The @code{memfrob} function converts an array of data to something | |
2251 | unrecognizable and back again. It is not encryption in its usual sense | |
2252 | since it is easy for someone to convert the encrypted data back to clear | |
2253 | text. The transformation is analogous to Usenet's ``Rot13'' encryption | |
2254 | method for obscuring offensive jokes from sensitive eyes and such. | |
2255 | Unlike Rot13, @code{memfrob} works on arbitrary binary data, not just | |
2256 | text. | |
2257 | @cindex Rot13 | |
2258 | ||
2259 | For true encryption, @xref{Cryptographic Functions}. | |
2260 | ||
2261 | This function is declared in @file{string.h}. | |
2262 | @pindex string.h | |
2263 | ||
2264 | @comment string.h | |
2265 | @comment GNU | |
2266 | @deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length}) | |
2267 | ||
2268 | @code{memfrob} transforms (frobnicates) each byte of the data structure | |
2269 | at @var{mem}, which is @var{length} bytes long, by bitwise exclusive | |
2270 | oring it with binary 00101010. It does the transformation in place and | |
2271 | its return value is always @var{mem}. | |
2272 | ||
2273 | Note that @code{memfrob} a second time on the same data structure | |
2274 | returns it to its original state. | |
2275 | ||
2276 | This is a good function for hiding information from someone who doesn't | |
2277 | want to see it or doesn't want to see it very much. To really prevent | |
2278 | people from retrieving the information, use stronger encryption such as | |
2279 | that described in @xref{Cryptographic Functions}. | |
2280 | ||
1f77f049 | 2281 | @strong{Portability Note:} This function is unique to @theglibc{}. |
0e4ee106 UD |
2282 | |
2283 | @end deftypefun | |
2284 | ||
b4012b75 UD |
2285 | @node Encode Binary Data |
2286 | @section Encode Binary Data | |
2287 | ||
2288 | To store or transfer binary data in environments which only support text | |
2289 | one has to encode the binary data by mapping the input bytes to | |
2290 | characters in the range allowed for storing or transfering. SVID | |
dd7d45e8 UD |
2291 | systems (and nowadays XPG compliant systems) provide minimal support for |
2292 | this task. | |
b4012b75 UD |
2293 | |
2294 | @comment stdlib.h | |
2295 | @comment XPG | |
2296 | @deftypefun {char *} l64a (long int @var{n}) | |
dd7d45e8 | 2297 | This function encodes a 32-bit input value using characters from the |
290639c3 | 2298 | basic character set. It returns a pointer to a 7 character buffer which |
dd7d45e8 UD |
2299 | contains an encoded version of @var{n}. To encode a series of bytes the |
2300 | user must copy the returned string to a destination buffer. It returns | |
2301 | the empty string if @var{n} is zero, which is somewhat bizarre but | |
2302 | mandated by the standard.@* | |
2303 | @strong{Warning:} Since a static buffer is used this function should not | |
5649a1d6 | 2304 | be used in multi-threaded programs. There is no thread-safe alternative |
dd7d45e8 UD |
2305 | to this function in the C library.@* |
2306 | @strong{Compatibility Note:} The XPG standard states that the return | |
2307 | value of @code{l64a} is undefined if @var{n} is negative. In the GNU | |
2308 | implementation, @code{l64a} treats its argument as unsigned, so it will | |
2309 | return a sensible encoding for any nonzero @var{n}; however, portable | |
2310 | programs should not rely on this. | |
b4012b75 | 2311 | |
dd7d45e8 UD |
2312 | To encode a large buffer @code{l64a} must be called in a loop, once for |
2313 | each 32-bit word of the buffer. For example, one could do something | |
2314 | like this: | |
5649a1d6 UD |
2315 | |
2316 | @smallexample | |
2317 | char * | |
2318 | encode (const void *buf, size_t len) | |
2319 | @{ | |
2320 | /* @r{We know in advance how long the buffer has to be.} */ | |
2321 | unsigned char *in = (unsigned char *) buf; | |
2322 | char *out = malloc (6 + ((len + 3) / 4) * 6 + 1); | |
290639c3 | 2323 | char *cp = out, *p; |
5649a1d6 UD |
2324 | |
2325 | /* @r{Encode the length.} */ | |
dd7d45e8 | 2326 | /* @r{Using `htonl' is necessary so that the data can be} |
290639c3 UD |
2327 | @r{decoded even on machines with different byte order.} |
2328 | @r{`l64a' can return a string shorter than 6 bytes, so } | |
2329 | @r{we pad it with encoding of 0 (}'.'@r{) at the end by } | |
2330 | @r{hand.} */ | |
dd7d45e8 | 2331 | |
290639c3 UD |
2332 | p = stpcpy (cp, l64a (htonl (len))); |
2333 | cp = mempcpy (p, "......", 6 - (p - cp)); | |
5649a1d6 UD |
2334 | |
2335 | while (len > 3) | |
2336 | @{ | |
2337 | unsigned long int n = *in++; | |
2338 | n = (n << 8) | *in++; | |
2339 | n = (n << 8) | *in++; | |
2340 | n = (n << 8) | *in++; | |
2341 | len -= 4; | |
290639c3 UD |
2342 | p = stpcpy (cp, l64a (htonl (n))); |
2343 | cp = mempcpy (p, "......", 6 - (p - cp)); | |
5649a1d6 UD |
2344 | @} |
2345 | if (len > 0) | |
2346 | @{ | |
2347 | unsigned long int n = *in++; | |
2348 | if (--len > 0) | |
2349 | @{ | |
2350 | n = (n << 8) | *in++; | |
2351 | if (--len > 0) | |
2352 | n = (n << 8) | *in; | |
2353 | @} | |
290639c3 | 2354 | cp = stpcpy (cp, l64a (htonl (n))); |
5649a1d6 UD |
2355 | @} |
2356 | *cp = '\0'; | |
2357 | return out; | |
2358 | @} | |
2359 | @end smallexample | |
2360 | ||
2361 | It is strange that the library does not provide the complete | |
dd7d45e8 UD |
2362 | functionality needed but so be it. |
2363 | ||
2364 | @end deftypefun | |
5649a1d6 | 2365 | |
b4012b75 UD |
2366 | To decode data produced with @code{l64a} the following function should be |
2367 | used. | |
2368 | ||
5649a1d6 UD |
2369 | @comment stdlib.h |
2370 | @comment XPG | |
b4012b75 UD |
2371 | @deftypefun {long int} a64l (const char *@var{string}) |
2372 | The parameter @var{string} should contain a string which was produced by | |
dd7d45e8 UD |
2373 | a call to @code{l64a}. The function processes at least 6 characters of |
2374 | this string, and decodes the characters it finds according to the table | |
2375 | below. It stops decoding when it finds a character not in the table, | |
2376 | rather like @code{atoi}; if you have a buffer which has been broken into | |
2377 | lines, you must be careful to skip over the end-of-line characters. | |
2378 | ||
2379 | The decoded number is returned as a @code{long int} value. | |
b4012b75 | 2380 | @end deftypefun |
b13927da | 2381 | |
dd7d45e8 UD |
2382 | The @code{l64a} and @code{a64l} functions use a base 64 encoding, in |
2383 | which each character of an encoded string represents six bits of an | |
2384 | input word. These symbols are used for the base 64 digits: | |
2385 | ||
2386 | @multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} | |
2387 | @item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7 | |
2388 | @item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1} | |
2389 | @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5} | |
2390 | @item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9} | |
2391 | @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D} | |
2392 | @item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H} | |
2393 | @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L} | |
2394 | @item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P} | |
2395 | @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T} | |
2396 | @item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X} | |
2397 | @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b} | |
2398 | @item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f} | |
2399 | @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j} | |
2400 | @item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n} | |
2401 | @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r} | |
2402 | @item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v} | |
2403 | @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z} | |
2404 | @end multitable | |
2405 | ||
2406 | This encoding scheme is not standard. There are some other encoding | |
2407 | methods which are much more widely used (UU encoding, MIME encoding). | |
2408 | Generally, it is better to use one of these encodings. | |
2409 | ||
b13927da UD |
2410 | @node Argz and Envz Vectors |
2411 | @section Argz and Envz Vectors | |
2412 | ||
5649a1d6 | 2413 | @cindex argz vectors (string vectors) |
b13927da UD |
2414 | @cindex string vectors, null-character separated |
2415 | @cindex argument vectors, null-character separated | |
2416 | @dfn{argz vectors} are vectors of strings in a contiguous block of | |
2417 | memory, each element separated from its neighbors by null-characters | |
2418 | (@code{'\0'}). | |
2419 | ||
5649a1d6 | 2420 | @cindex envz vectors (environment vectors) |
b13927da UD |
2421 | @cindex environment vectors, null-character separated |
2422 | @dfn{Envz vectors} are an extension of argz vectors where each element is a | |
5649a1d6 | 2423 | name-value pair, separated by a @code{'='} character (as in a Unix |
b13927da UD |
2424 | environment). |
2425 | ||
2426 | @menu | |
2427 | * Argz Functions:: Operations on argz vectors. | |
2428 | * Envz Functions:: Additional operations on environment vectors. | |
2429 | @end menu | |
2430 | ||
2431 | @node Argz Functions, Envz Functions, , Argz and Envz Vectors | |
2432 | @subsection Argz Functions | |
2433 | ||
2434 | Each argz vector is represented by a pointer to the first element, of | |
2435 | type @code{char *}, and a size, of type @code{size_t}, both of which can | |
2436 | be initialized to @code{0} to represent an empty argz vector. All argz | |
2437 | functions accept either a pointer and a size argument, or pointers to | |
2438 | them, if they will be modified. | |
2439 | ||
2440 | The argz functions use @code{malloc}/@code{realloc} to allocate/grow | |
2441 | argz vectors, and so any argz vector creating using these functions may | |
2442 | be freed by using @code{free}; conversely, any argz function that may | |
2443 | grow a string expects that string to have been allocated using | |
2444 | @code{malloc} (those argz functions that only examine their arguments or | |
2445 | modify them in place will work on any sort of memory). | |
2446 | @xref{Unconstrained Allocation}. | |
2447 | ||
2448 | All argz functions that do memory allocation have a return type of | |
2449 | @code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an | |
2450 | allocation error occurs. | |
2451 | ||
2452 | @pindex argz.h | |
2453 | These functions are declared in the standard include file @file{argz.h}. | |
2454 | ||
5649a1d6 UD |
2455 | @comment argz.h |
2456 | @comment GNU | |
b13927da | 2457 | @deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len}) |
5649a1d6 | 2458 | The @code{argz_create} function converts the Unix-style argument vector |
b13927da UD |
2459 | @var{argv} (a vector of pointers to normal C strings, terminated by |
2460 | @code{(char *)0}; @pxref{Program Arguments}) into an argz vector with | |
2461 | the same elements, which is returned in @var{argz} and @var{argz_len}. | |
2462 | @end deftypefun | |
2463 | ||
5649a1d6 UD |
2464 | @comment argz.h |
2465 | @comment GNU | |
b13927da UD |
2466 | @deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len}) |
2467 | The @code{argz_create_sep} function converts the null-terminated string | |
2468 | @var{string} into an argz vector (returned in @var{argz} and | |
49c091e5 | 2469 | @var{argz_len}) by splitting it into elements at every occurrence of the |
b13927da UD |
2470 | character @var{sep}. |
2471 | @end deftypefun | |
2472 | ||
5649a1d6 UD |
2473 | @comment argz.h |
2474 | @comment GNU | |
b13927da UD |
2475 | @deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{arg_len}) |
2476 | Returns the number of elements in the argz vector @var{argz} and | |
2477 | @var{argz_len}. | |
2478 | @end deftypefun | |
2479 | ||
5649a1d6 UD |
2480 | @comment argz.h |
2481 | @comment GNU | |
b13927da UD |
2482 | @deftypefun {void} argz_extract (char *@var{argz}, size_t @var{argz_len}, char **@var{argv}) |
2483 | The @code{argz_extract} function converts the argz vector @var{argz} and | |
5649a1d6 | 2484 | @var{argz_len} into a Unix-style argument vector stored in @var{argv}, |
b13927da UD |
2485 | by putting pointers to every element in @var{argz} into successive |
2486 | positions in @var{argv}, followed by a terminator of @code{0}. | |
2487 | @var{Argv} must be pre-allocated with enough space to hold all the | |
2488 | elements in @var{argz} plus the terminating @code{(char *)0} | |
2489 | (@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)} | |
2490 | bytes should be enough). Note that the string pointers stored into | |
2491 | @var{argv} point into @var{argz}---they are not copies---and so | |
2492 | @var{argz} must be copied if it will be changed while @var{argv} is | |
2493 | still active. This function is useful for passing the elements in | |
2494 | @var{argz} to an exec function (@pxref{Executing a File}). | |
2495 | @end deftypefun | |
2496 | ||
5649a1d6 UD |
2497 | @comment argz.h |
2498 | @comment GNU | |
b13927da UD |
2499 | @deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep}) |
2500 | The @code{argz_stringify} converts @var{argz} into a normal string with | |
2501 | the elements separated by the character @var{sep}, by replacing each | |
2502 | @code{'\0'} inside @var{argz} (except the last one, which terminates the | |
2503 | string) with @var{sep}. This is handy for printing @var{argz} in a | |
2504 | readable manner. | |
2505 | @end deftypefun | |
2506 | ||
5649a1d6 UD |
2507 | @comment argz.h |
2508 | @comment GNU | |
b13927da UD |
2509 | @deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}) |
2510 | The @code{argz_add} function adds the string @var{str} to the end of the | |
2511 | argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and | |
2512 | @code{*@var{argz_len}} accordingly. | |
2513 | @end deftypefun | |
2514 | ||
5649a1d6 UD |
2515 | @comment argz.h |
2516 | @comment GNU | |
b13927da UD |
2517 | @deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim}) |
2518 | The @code{argz_add_sep} function is similar to @code{argz_add}, but | |
49c091e5 | 2519 | @var{str} is split into separate elements in the result at occurrences of |
b13927da | 2520 | the character @var{delim}. This is useful, for instance, for |
5649a1d6 | 2521 | adding the components of a Unix search path to an argz vector, by using |
b13927da UD |
2522 | a value of @code{':'} for @var{delim}. |
2523 | @end deftypefun | |
2524 | ||
5649a1d6 UD |
2525 | @comment argz.h |
2526 | @comment GNU | |
b13927da UD |
2527 | @deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len}) |
2528 | The @code{argz_append} function appends @var{buf_len} bytes starting at | |
2529 | @var{buf} to the argz vector @code{*@var{argz}}, reallocating | |
2530 | @code{*@var{argz}} to accommodate it, and adding @var{buf_len} to | |
2531 | @code{*@var{argz_len}}. | |
2532 | @end deftypefun | |
2533 | ||
5649a1d6 UD |
2534 | @comment argz.h |
2535 | @comment GNU | |
30aa5785 | 2536 | @deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry}) |
b13927da UD |
2537 | If @var{entry} points to the beginning of one of the elements in the |
2538 | argz vector @code{*@var{argz}}, the @code{argz_delete} function will | |
2539 | remove this entry and reallocate @code{*@var{argz}}, modifying | |
2540 | @code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as | |
2541 | destructive argz functions usually reallocate their argz argument, | |
2542 | pointers into argz vectors such as @var{entry} will then become invalid. | |
2543 | @end deftypefun | |
2544 | ||
5649a1d6 UD |
2545 | @comment argz.h |
2546 | @comment GNU | |
b13927da UD |
2547 | @deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry}) |
2548 | The @code{argz_insert} function inserts the string @var{entry} into the | |
2549 | argz vector @code{*@var{argz}} at a point just before the existing | |
2550 | element pointed to by @var{before}, reallocating @code{*@var{argz}} and | |
2551 | updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before} | |
2552 | is @code{0}, @var{entry} is added to the end instead (as if by | |
2553 | @code{argz_add}). Since the first element is in fact the same as | |
2554 | @code{*@var{argz}}, passing in @code{*@var{argz}} as the value of | |
2555 | @var{before} will result in @var{entry} being inserted at the beginning. | |
2556 | @end deftypefun | |
2557 | ||
5649a1d6 UD |
2558 | @comment argz.h |
2559 | @comment GNU | |
b13927da UD |
2560 | @deftypefun {char *} argz_next (char *@var{argz}, size_t @var{argz_len}, const char *@var{entry}) |
2561 | The @code{argz_next} function provides a convenient way of iterating | |
2562 | over the elements in the argz vector @var{argz}. It returns a pointer | |
2563 | to the next element in @var{argz} after the element @var{entry}, or | |
2564 | @code{0} if there are no elements following @var{entry}. If @var{entry} | |
2565 | is @code{0}, the first element of @var{argz} is returned. | |
2566 | ||
2567 | This behavior suggests two styles of iteration: | |
2568 | ||
2569 | @smallexample | |
2570 | char *entry = 0; | |
2571 | while ((entry = argz_next (@var{argz}, @var{argz_len}, entry))) | |
2572 | @var{action}; | |
2573 | @end smallexample | |
2574 | ||
2575 | (the double parentheses are necessary to make some C compilers shut up | |
2576 | about what they consider a questionable @code{while}-test) and: | |
2577 | ||
2578 | @smallexample | |
2579 | char *entry; | |
2580 | for (entry = @var{argz}; | |
2581 | entry; | |
2582 | entry = argz_next (@var{argz}, @var{argz_len}, entry)) | |
2583 | @var{action}; | |
2584 | @end smallexample | |
2585 | ||
2586 | Note that the latter depends on @var{argz} having a value of @code{0} if | |
2587 | it is empty (rather than a pointer to an empty block of memory); this | |
2588 | invariant is maintained for argz vectors created by the functions here. | |
2589 | @end deftypefun | |
2590 | ||
d705269e UD |
2591 | @comment argz.h |
2592 | @comment GNU | |
2593 | @deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}}) | |
49c091e5 | 2594 | Replace any occurrences of the string @var{str} in @var{argz} with |
d705269e UD |
2595 | @var{with}, reallocating @var{argz} as necessary. If |
2596 | @var{replace_count} is non-zero, @code{*@var{replace_count}} will be | |
2597 | incremented by number of replacements performed. | |
2598 | @end deftypefun | |
2599 | ||
b13927da UD |
2600 | @node Envz Functions, , Argz Functions, Argz and Envz Vectors |
2601 | @subsection Envz Functions | |
2602 | ||
2603 | Envz vectors are just argz vectors with additional constraints on the form | |
2604 | of each element; as such, argz functions can also be used on them, where it | |
2605 | makes sense. | |
2606 | ||
2607 | Each element in an envz vector is a name-value pair, separated by a @code{'='} | |
2608 | character; if multiple @code{'='} characters are present in an element, those | |
2609 | after the first are considered part of the value, and treated like all other | |
2610 | non-@code{'\0'} characters. | |
2611 | ||
2612 | If @emph{no} @code{'='} characters are present in an element, that element is | |
2613 | considered the name of a ``null'' entry, as distinct from an entry with an | |
2614 | empty value: @code{envz_get} will return @code{0} if given the name of null | |
2615 | entry, whereas an entry with an empty value would result in a value of | |
2616 | @code{""}; @code{envz_entry} will still find such entries, however. Null | |
2617 | entries can be removed with @code{envz_strip} function. | |
2618 | ||
2619 | As with argz functions, envz functions that may allocate memory (and thus | |
2620 | fail) have a return type of @code{error_t}, and return either @code{0} or | |
2621 | @code{ENOMEM}. | |
2622 | ||
2623 | @pindex envz.h | |
2624 | These functions are declared in the standard include file @file{envz.h}. | |
2625 | ||
5649a1d6 UD |
2626 | @comment envz.h |
2627 | @comment GNU | |
b13927da UD |
2628 | @deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) |
2629 | The @code{envz_entry} function finds the entry in @var{envz} with the name | |
2630 | @var{name}, and returns a pointer to the whole entry---that is, the argz | |
2631 | element which begins with @var{name} followed by a @code{'='} character. If | |
2632 | there is no entry with that name, @code{0} is returned. | |
2633 | @end deftypefun | |
2634 | ||
5649a1d6 UD |
2635 | @comment envz.h |
2636 | @comment GNU | |
b13927da UD |
2637 | @deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) |
2638 | The @code{envz_get} function finds the entry in @var{envz} with the name | |
2639 | @var{name} (like @code{envz_entry}), and returns a pointer to the value | |
2640 | portion of that entry (following the @code{'='}). If there is no entry with | |
2641 | that name (or only a null entry), @code{0} is returned. | |
2642 | @end deftypefun | |
2643 | ||
5649a1d6 UD |
2644 | @comment envz.h |
2645 | @comment GNU | |
b13927da UD |
2646 | @deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value}) |
2647 | The @code{envz_add} function adds an entry to @code{*@var{envz}} | |
2648 | (updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name | |
2649 | @var{name}, and value @var{value}. If an entry with the same name | |
2650 | already exists in @var{envz}, it is removed first. If @var{value} is | |
2651 | @code{0}, then the new entry will the special null type of entry | |
2652 | (mentioned above). | |
2653 | @end deftypefun | |
2654 | ||
5649a1d6 UD |
2655 | @comment envz.h |
2656 | @comment GNU | |
b13927da UD |
2657 | @deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override}) |
2658 | The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz}, | |
2659 | as if with @code{envz_add}, updating @code{*@var{envz}} and | |
2660 | @code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2} | |
2661 | will supersede those with the same name in @var{envz}, otherwise not. | |
2662 | ||
2663 | Null entries are treated just like other entries in this respect, so a null | |
2664 | entry in @var{envz} can prevent an entry of the same name in @var{envz2} from | |
2665 | being added to @var{envz}, if @var{override} is false. | |
2666 | @end deftypefun | |
2667 | ||
5649a1d6 UD |
2668 | @comment envz.h |
2669 | @comment GNU | |
b13927da UD |
2670 | @deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len}) |
2671 | The @code{envz_strip} function removes any null entries from @var{envz}, | |
2672 | updating @code{*@var{envz}} and @code{*@var{envz_len}}. | |
2673 | @end deftypefun |