]> sourceware.org Git - glibc.git/blame - manual/string.texi
Hurd: Fix POSIX 2008 visibility
[glibc.git] / manual / string.texi
CommitLineData
390955cb 1@node String and Array Utilities, Character Set Handling, Character Handling, Top
7a68c94a 2@c %MENU% Utilities for copying and comparing strings and arrays
28f540f4
RM
3@chapter String and Array Utilities
4
5Operations on strings (or arrays of characters) are an important part of
1f77f049 6many programs. @Theglibc{} provides an extensive set of string
28f540f4
RM
7utility functions, including functions for copying, concatenating,
8comparing, and searching strings. Many of these functions can also
9operate on arbitrary regions of storage; for example, the @code{memcpy}
a5113b14 10function can be used to copy the contents of any kind of array.
28f540f4
RM
11
12It's fairly common for beginning C programmers to ``reinvent the wheel''
13by duplicating this functionality in their own code, but it pays to
14become familiar with the library functions and to make use of them,
15since this offers benefits in maintenance, efficiency, and portability.
16
17For instance, you could easily compare one string to another in two
18lines of C code, but if you use the built-in @code{strcmp} function,
19you're less likely to make a mistake. And, since these library
20functions are typically highly optimized, your program may run faster
21too.
22
23@menu
24* Representation of Strings:: Introduction to basic concepts.
25* String/Array Conventions:: Whether to use a string function or an
26 arbitrary array function.
27* String Length:: Determining the length of a string.
28* Copying and Concatenation:: Functions to copy the contents of strings
29 and arrays.
30* String/Array Comparison:: Functions for byte-wise and character-wise
31 comparison.
32* Collation Functions:: Functions for collating strings.
33* Search Functions:: Searching for a specific element or substring.
34* Finding Tokens in a String:: Splitting a string into tokens by looking
35 for delimiters.
0e4ee106
UD
36* strfry:: Function for flash-cooking a string.
37* Trivial Encryption:: Obscuring data.
b4012b75 38* Encode Binary Data:: Encoding and Decoding of Binary Data.
b13927da 39* Argz and Envz Vectors:: Null-separated string vectors.
28f540f4
RM
40@end menu
41
b4012b75 42@node Representation of Strings
28f540f4
RM
43@section Representation of Strings
44@cindex string, representation of
45
46This section is a quick summary of string concepts for beginning C
47programmers. It describes how character strings are represented in C
48and some common pitfalls. If you are already familiar with this
49material, you can skip this section.
50
51@cindex string
8a2f1f5b 52@cindex multibyte character string
28f540f4
RM
53A @dfn{string} is an array of @code{char} objects. But string-valued
54variables are usually declared to be pointers of type @code{char *}.
55Such variables do not include space for the text of a string; that has
56to be stored somewhere else---in an array variable, a string constant,
57or dynamically allocated memory (@pxref{Memory Allocation}). It's up to
58you to store the address of the chosen memory space into the pointer
59variable. Alternatively you can store a @dfn{null pointer} in the
60pointer variable. The null pointer does not point anywhere, so
61attempting to reference the string it points to gets an error.
62
8a2f1f5b
UD
63@cindex wide character string
64``string'' normally refers to multibyte character strings as opposed to
65wide character strings. Wide character strings are arrays of type
66@code{wchar_t} and as for multibyte character strings usually pointers
67of type @code{wchar_t *} are used.
68
69@cindex null character
70@cindex null wide character
28f540f4 71By convention, a @dfn{null character}, @code{'\0'}, marks the end of a
8a2f1f5b
UD
72multibyte character string and the @dfn{null wide character},
73@code{L'\0'}, marks the end of a wide character string. For example, in
74testing to see whether the @code{char *} variable @var{p} points to a
75null character marking the end of a string, you can write
76@code{!*@var{p}} or @code{*@var{p} == '\0'}.
28f540f4
RM
77
78A null character is quite different conceptually from a null pointer,
79although both are represented by the integer @code{0}.
80
81@cindex string literal
82@dfn{String literals} appear in C program source as strings of
8a2f1f5b
UD
83characters between double-quote characters (@samp{"}) where the initial
84double-quote character is immediately preceded by a capital @samp{L}
85(ell) character (as in @code{L"foo"}). In @w{ISO C}, string literals
86can also be formed by @dfn{string concatenation}: @code{"a" "b"} is the
87same as @code{"ab"}. For wide character strings one can either use
88@code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is
89not allowed by the GNU C compiler, because literals are placed in
90read-only storage.
28f540f4
RM
91
92Character arrays that are declared @code{const} cannot be modified
93either. It's generally good style to declare non-modifiable string
94pointers to be of type @code{const char *}, since this often allows the
95C compiler to detect accidental modifications as well as providing some
96amount of documentation about what your program intends to do with the
97string.
98
99The amount of memory allocated for the character array may extend past
100the null character that normally marks the end of the string. In this
dd7d45e8 101document, the term @dfn{allocated size} is always used to refer to the
28f540f4
RM
102total amount of memory allocated for the string, while the term
103@dfn{length} refers to the number of characters up to (but not
104including) the terminating null character.
105@cindex length of string
106@cindex allocation size of string
107@cindex size of string
108@cindex string length
109@cindex string allocation
110
111A notorious source of program bugs is trying to put more characters in a
112string than fit in its allocated size. When writing code that extends
113strings or moves characters into a pre-allocated array, you should be
114very careful to keep track of the length of the text and make explicit
115checks for overflowing the array. Many of the library functions
116@emph{do not} do this for you! Remember also that you need to allocate
117an extra byte to hold the null character that marks the end of the
118string.
119
8a2f1f5b
UD
120@cindex single-byte string
121@cindex multibyte string
122Originally strings were sequences of bytes where each byte represents a
123single character. This is still true today if the strings are encoded
124using a single-byte character encoding. Things are different if the
125strings are encoded using a multibyte encoding (for more information on
126encodings see @ref{Extended Char Intro}). There is no difference in
127the programming interface for these two kind of strings; the programmer
128has to be aware of this and interpret the byte sequences accordingly.
129
130But since there is no separate interface taking care of these
131differences the byte-based string functions are sometimes hard to use.
132Since the count parameters of these functions specify bytes a call to
133@code{strncpy} could cut a multibyte character in the middle and put an
134incomplete (and therefore unusable) byte sequence in the target buffer.
135
136@cindex wide character string
137To avoid these problems later versions of the @w{ISO C} standard
138introduce a second set of functions which are operating on @dfn{wide
139characters} (@pxref{Extended Char Intro}). These functions don't have
140the problems the single-byte versions have since every wide character is
141a legal, interpretable value. This does not mean that cutting wide
142character strings at arbitrary points is without problems. It normally
143is for alphabet-based languages (except for non-normalized text) but
144languages based on syllables still have the problem that more than one
145wide character is necessary to complete a logical unit. This is a
146higher level problem which the @w{C library} functions are not designed
147to solve. But it is at least good that no invalid byte sequences can be
148created. Also, the higher level functions can also much easier operate
149on wide character than on multibyte characters so that a general advise
150is to use wide characters internally whenever text is more than simply
151copied.
152
153The remaining of this chapter will discuss the functions for handling
154wide character strings in parallel with the discussion of the multibyte
155character strings since there is almost always an exact equivalent
156available.
157
b4012b75 158@node String/Array Conventions
28f540f4
RM
159@section String and Array Conventions
160
161This chapter describes both functions that work on arbitrary arrays or
162blocks of memory, and functions that are specific to null-terminated
8a2f1f5b 163arrays of characters and wide characters.
28f540f4
RM
164
165Functions that operate on arbitrary blocks of memory have names
8a2f1f5b
UD
166beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and
167@code{wmemcpy}) and invariably take an argument which specifies the size
168(in bytes and wide characters respectively) of the block of memory to
28f540f4 169operate on. The array arguments and return values for these functions
8a2f1f5b
UD
170have type @code{void *} or @code{wchar_t}. As a matter of style, the
171elements of the arrays used with the @samp{mem} functions are referred
172to as ``bytes''. You can pass any kind of pointer to these functions,
173and the @code{sizeof} operator is useful in computing the value for the
174size argument. Parameters to the @samp{wmem} functions must be of type
175@code{wchar_t *}. These functions are not really usable with anything
176but arrays of this type.
177
178In contrast, functions that operate specifically on strings and wide
179character strings have names beginning with @samp{str} and @samp{wcs}
180respectively (such as @code{strcpy} and @code{wcscpy}) and look for a
181null character to terminate the string instead of requiring an explicit
182size argument to be passed. (Some of these functions accept a specified
28f540f4
RM
183maximum length, but they also check for premature termination with a
184null character.) The array arguments and return values for these
8a2f1f5b
UD
185functions have type @code{char *} and @code{wchar_t *} respectively, and
186the array elements are referred to as ``characters'' and ``wide
187characters''.
188
189In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs}
190versions of a function. The one that is more appropriate to use depends
191on the exact situation. When your program is manipulating arbitrary
192arrays or blocks of storage, then you should always use the @samp{mem}
193functions. On the other hand, when you are manipulating null-terminated
194strings it is usually more convenient to use the @samp{str}/@samp{wcs}
195functions, unless you already know the length of the string in advance.
196The @samp{wmem} functions should be used for wide character arrays with
197known size.
198
199@cindex wint_t
200@cindex parameter promotion
201Some of the memory and string functions take single characters as
202arguments. Since a value of type @code{char} is automatically promoted
203into an value of type @code{int} when used as a parameter, the functions
204are declared with @code{int} as the type of the parameter in question.
205In case of the wide character function the situation is similarly: the
206parameter type for a single wide character is @code{wint_t} and not
207@code{wchar_t}. This would for many implementations not be necessary
208since the @code{wchar_t} is large enough to not be automatically
209promoted, but since the @w{ISO C} standard does not require such a
210choice of types the @code{wint_t} type is used.
28f540f4 211
b4012b75 212@node String Length
28f540f4
RM
213@section String Length
214
215You can get the length of a string using the @code{strlen} function.
216This function is declared in the header file @file{string.h}.
217@pindex string.h
218
219@comment string.h
f65fd747 220@comment ISO
28f540f4
RM
221@deftypefun size_t strlen (const char *@var{s})
222The @code{strlen} function returns the length of the null-terminated
8a2f1f5b
UD
223string @var{s} in bytes. (In other words, it returns the offset of the
224terminating null character within the array.)
28f540f4
RM
225
226For example,
227@smallexample
228strlen ("hello, world")
229 @result{} 12
230@end smallexample
231
232When applied to a character array, the @code{strlen} function returns
dd7d45e8
UD
233the length of the string stored there, not its allocated size. You can
234get the allocated size of the character array that holds a string using
28f540f4
RM
235the @code{sizeof} operator:
236
237@smallexample
a5113b14 238char string[32] = "hello, world";
28f540f4
RM
239sizeof (string)
240 @result{} 32
241strlen (string)
242 @result{} 12
243@end smallexample
dd7d45e8
UD
244
245But beware, this will not work unless @var{string} is the character
246array itself, not a pointer to it. For example:
247
248@smallexample
249char string[32] = "hello, world";
250char *ptr = string;
251sizeof (string)
252 @result{} 32
253sizeof (ptr)
254 @result{} 4 /* @r{(on a machine with 4 byte pointers)} */
255@end smallexample
256
257This is an easy mistake to make when you are working with functions that
258take string arguments; those arguments are always pointers, not arrays.
259
8a2f1f5b
UD
260It must also be noted that for multibyte encoded strings the return
261value does not have to correspond to the number of characters in the
262string. To get this value the string can be converted to wide
263characters and @code{wcslen} can be used or something like the following
264code can be used:
265
266@smallexample
267/* @r{The input is in @code{string}.}
268 @r{The length is expected in @code{n}.} */
269@{
270 mbstate_t t;
271 char *scopy = string;
272 /* In initial state. */
273 memset (&t, '\0', sizeof (t));
274 /* Determine number of characters. */
275 n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t);
276@}
277@end smallexample
278
279This is cumbersome to do so if the number of characters (as opposed to
280bytes) is needed often it is better to work with wide characters.
281@end deftypefun
282
283The wide character equivalent is declared in @file{wchar.h}.
284
285@comment wchar.h
286@comment ISO
287@deftypefun size_t wcslen (const wchar_t *@var{ws})
288The @code{wcslen} function is the wide character equivalent to
289@code{strlen}. The return value is the number of wide characters in the
290wide character string pointed to by @var{ws} (this is also the offset of
291the terminating null wide character of @var{ws}).
292
293Since there are no multi wide character sequences making up one
294character the return value is not only the offset in the array, it is
295also the number of wide characters.
296
297This function was introduced in @w{Amendment 1} to @w{ISO C90}.
28f540f4
RM
298@end deftypefun
299
4547c1a4
UD
300@comment string.h
301@comment GNU
302@deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen})
8a2f1f5b
UD
303The @code{strnlen} function returns the length of the string @var{s} in
304bytes if this length is smaller than @var{maxlen} bytes. Otherwise it
305returns @var{maxlen}. Therefore this function is equivalent to
ebaf36eb
JM
306@code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})}
307but it
8a2f1f5b
UD
308is more efficient and works even if the string @var{s} is not
309null-terminated.
4547c1a4
UD
310
311@smallexample
312char string[32] = "hello, world";
313strnlen (string, 32)
314 @result{} 12
315strnlen (string, 5)
316 @result{} 5
317@end smallexample
318
8a2f1f5b
UD
319This function is a GNU extension and is declared in @file{string.h}.
320@end deftypefun
321
322@comment wchar.h
323@comment GNU
324@deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen})
325@code{wcsnlen} is the wide character equivalent to @code{strnlen}. The
326@var{maxlen} parameter specifies the maximum number of wide characters.
327
328This function is a GNU extension and is declared in @file{wchar.h}.
4547c1a4
UD
329@end deftypefun
330
b4012b75 331@node Copying and Concatenation
28f540f4
RM
332@section Copying and Concatenation
333
334You can use the functions described in this section to copy the contents
335of strings and arrays, or to append the contents of one string to
8a2f1f5b
UD
336another. The @samp{str} and @samp{mem} functions are declared in the
337header file @file{string.h} while the @samp{wstr} and @samp{wmem}
338functions are declared in the file @file{wchar.h}.
28f540f4 339@pindex string.h
8a2f1f5b 340@pindex wchar.h
28f540f4
RM
341@cindex copying strings and arrays
342@cindex string copy functions
343@cindex array copy functions
344@cindex concatenating strings
345@cindex string concatenation functions
346
347A helpful way to remember the ordering of the arguments to the functions
348in this section is that it corresponds to an assignment expression, with
349the destination array specified to the left of the source array. All
350of these functions return the address of the destination array.
351
352Most of these functions do not work properly if the source and
353destination arrays overlap. For example, if the beginning of the
354destination array overlaps the end of the source array, the original
355contents of that part of the source array may get overwritten before it
356is copied. Even worse, in the case of the string functions, the null
357character marking the end of the string may be lost, and the copy
358function might get stuck in a loop trashing all the memory allocated to
359your program.
360
361All functions that have problems copying between overlapping arrays are
362explicitly identified in this manual. In addition to functions in this
363section, there are a few others like @code{sprintf} (@pxref{Formatted
364Output Functions}) and @code{scanf} (@pxref{Formatted Input
365Functions}).
366
367@comment string.h
f65fd747 368@comment ISO
8a2f1f5b 369@deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
28f540f4
RM
370The @code{memcpy} function copies @var{size} bytes from the object
371beginning at @var{from} into the object beginning at @var{to}. The
372behavior of this function is undefined if the two arrays @var{to} and
373@var{from} overlap; use @code{memmove} instead if overlapping is possible.
374
375The value returned by @code{memcpy} is the value of @var{to}.
376
377Here is an example of how you might use @code{memcpy} to copy the
378contents of an array:
379
380@smallexample
381struct foo *oldarray, *newarray;
382int arraysize;
383@dots{}
384memcpy (new, old, arraysize * sizeof (struct foo));
385@end smallexample
386@end deftypefun
387
8a2f1f5b
UD
388@comment wchar.h
389@comment ISO
79827876 390@deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
8a2f1f5b
UD
391The @code{wmemcpy} function copies @var{size} wide characters from the object
392beginning at @var{wfrom} into the object beginning at @var{wto}. The
393behavior of this function is undefined if the two arrays @var{wto} and
394@var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible.
395
396The following is a possible implementation of @code{wmemcpy} but there
397are more optimizations possible.
398
399@smallexample
400wchar_t *
401wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
402 size_t size)
403@{
404 return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t));
405@}
406@end smallexample
407
408The value returned by @code{wmemcpy} is the value of @var{wto}.
409
410This function was introduced in @w{Amendment 1} to @w{ISO C90}.
411@end deftypefun
412
4547c1a4
UD
413@comment string.h
414@comment GNU
8a2f1f5b 415@deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
4547c1a4 416The @code{mempcpy} function is nearly identical to the @code{memcpy}
f2ea0f5b 417function. It copies @var{size} bytes from the object beginning at
4547c1a4 418@code{from} into the object pointed to by @var{to}. But instead of
976780fd 419returning the value of @var{to} it returns a pointer to the byte
4547c1a4
UD
420following the last written byte in the object beginning at @var{to}.
421I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}.
422
423This function is useful in situations where a number of objects shall be
424copied to consecutive memory positions.
425
426@smallexample
427void *
428combine (void *o1, size_t s1, void *o2, size_t s2)
429@{
430 void *result = malloc (s1 + s2);
431 if (result != NULL)
432 mempcpy (mempcpy (result, o1, s1), o2, s2);
433 return result;
434@}
435@end smallexample
436
437This function is a GNU extension.
438@end deftypefun
439
8a2f1f5b
UD
440@comment wchar.h
441@comment GNU
442@deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
443The @code{wmempcpy} function is nearly identical to the @code{wmemcpy}
444function. It copies @var{size} wide characters from the object
445beginning at @code{wfrom} into the object pointed to by @var{wto}. But
446instead of returning the value of @var{wto} it returns a pointer to the
447wide character following the last written wide character in the object
448beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}.
449
450This function is useful in situations where a number of objects shall be
451copied to consecutive memory positions.
452
453The following is a possible implementation of @code{wmemcpy} but there
454are more optimizations possible.
455
456@smallexample
457wchar_t *
458wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
459 size_t size)
460@{
461 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
462@}
463@end smallexample
464
465This function is a GNU extension.
466@end deftypefun
467
28f540f4 468@comment string.h
f65fd747 469@comment ISO
28f540f4
RM
470@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
471@code{memmove} copies the @var{size} bytes at @var{from} into the
472@var{size} bytes at @var{to}, even if those two blocks of space
473overlap. In the case of overlap, @code{memmove} is careful to copy the
474original values of the bytes in the block at @var{from}, including those
475bytes which also belong to the block at @var{to}.
8a2f1f5b
UD
476
477The value returned by @code{memmove} is the value of @var{to}.
478@end deftypefun
479
480@comment wchar.h
481@comment ISO
482@deftypefun {wchar_t *} wmemmove (wchar *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
483@code{wmemmove} copies the @var{size} wide characters at @var{wfrom}
484into the @var{size} wide characters at @var{wto}, even if those two
485blocks of space overlap. In the case of overlap, @code{memmove} is
486careful to copy the original values of the wide characters in the block
487at @var{wfrom}, including those wide characters which also belong to the
488block at @var{wto}.
489
490The following is a possible implementation of @code{wmemcpy} but there
491are more optimizations possible.
492
493@smallexample
494wchar_t *
495wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
496 size_t size)
497@{
498 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
499@}
500@end smallexample
501
502The value returned by @code{wmemmove} is the value of @var{wto}.
503
504This function is a GNU extension.
28f540f4
RM
505@end deftypefun
506
507@comment string.h
508@comment SVID
8a2f1f5b 509@deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size})
28f540f4
RM
510This function copies no more than @var{size} bytes from @var{from} to
511@var{to}, stopping if a byte matching @var{c} is found. The return
512value is a pointer into @var{to} one byte past where @var{c} was copied,
513or a null pointer if no byte matching @var{c} appeared in the first
514@var{size} bytes of @var{from}.
515@end deftypefun
516
517@comment string.h
f65fd747 518@comment ISO
28f540f4
RM
519@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
520This function copies the value of @var{c} (converted to an
521@code{unsigned char}) into each of the first @var{size} bytes of the
522object beginning at @var{block}. It returns the value of @var{block}.
523@end deftypefun
524
8a2f1f5b
UD
525@comment wchar.h
526@comment ISO
527@deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
528This function copies the value of @var{wc} into each of the first
529@var{size} wide characters of the object beginning at @var{block}. It
530returns the value of @var{block}.
531@end deftypefun
532
28f540f4 533@comment string.h
f65fd747 534@comment ISO
8a2f1f5b 535@deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from})
28f540f4
RM
536This copies characters from the string @var{from} (up to and including
537the terminating null character) into the string @var{to}. Like
538@code{memcpy}, this function has undefined results if the strings
539overlap. The return value is the value of @var{to}.
540@end deftypefun
541
8a2f1f5b
UD
542@comment wchar.h
543@comment ISO
544@deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
545This copies wide characters from the string @var{wfrom} (up to and
546including the terminating null wide character) into the string
547@var{wto}. Like @code{wmemcpy}, this function has undefined results if
548the strings overlap. The return value is the value of @var{wto}.
549@end deftypefun
550
28f540f4 551@comment string.h
f65fd747 552@comment ISO
8a2f1f5b 553@deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
28f540f4
RM
554This function is similar to @code{strcpy} but always copies exactly
555@var{size} characters into @var{to}.
556
557If the length of @var{from} is more than @var{size}, then @code{strncpy}
558copies just the first @var{size} characters. Note that in this case
559there is no null terminator written into @var{to}.
560
561If the length of @var{from} is less than @var{size}, then @code{strncpy}
562copies all of @var{from}, followed by enough null characters to add up
563to @var{size} characters in all. This behavior is rarely useful, but it
f65fd747 564is specified by the @w{ISO C} standard.
28f540f4
RM
565
566The behavior of @code{strncpy} is undefined if the strings overlap.
567
568Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs
569relating to writing past the end of the allocated space for @var{to}.
570However, it can also make your program much slower in one common case:
571copying a string which is probably small into a potentially large buffer.
572In this case, @var{size} may be large, and when it is, @code{strncpy} will
573waste a considerable amount of time copying null characters.
574@end deftypefun
575
8a2f1f5b
UD
576@comment wchar.h
577@comment ISO
578@deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
579This function is similar to @code{wcscpy} but always copies exactly
580@var{size} wide characters into @var{wto}.
581
582If the length of @var{wfrom} is more than @var{size}, then
583@code{wcsncpy} copies just the first @var{size} wide characters. Note
584that in this case there is no null terminator written into @var{wto}.
585
586If the length of @var{wfrom} is less than @var{size}, then
587@code{wcsncpy} copies all of @var{wfrom}, followed by enough null wide
588characters to add up to @var{size} wide characters in all. This
589behavior is rarely useful, but it is specified by the @w{ISO C}
590standard.
591
592The behavior of @code{wcsncpy} is undefined if the strings overlap.
593
594Using @code{wcsncpy} as opposed to @code{wcscpy} is a way to avoid bugs
595relating to writing past the end of the allocated space for @var{wto}.
596However, it can also make your program much slower in one common case:
597copying a string which is probably small into a potentially large buffer.
598In this case, @var{size} may be large, and when it is, @code{wcsncpy} will
599waste a considerable amount of time copying null wide characters.
600@end deftypefun
601
28f540f4
RM
602@comment string.h
603@comment SVID
604@deftypefun {char *} strdup (const char *@var{s})
605This function copies the null-terminated string @var{s} into a newly
606allocated string. The string is allocated using @code{malloc}; see
607@ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space
608for the new string, @code{strdup} returns a null pointer. Otherwise it
609returns a pointer to the new string.
610@end deftypefun
611
8a2f1f5b
UD
612@comment wchar.h
613@comment GNU
614@deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws})
615This function copies the null-terminated wide character string @var{ws}
616into a newly allocated string. The string is allocated using
617@code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc}
618cannot allocate space for the new string, @code{wcsdup} returns a null
619pointer. Otherwise it returns a pointer to the new wide character
620string.
621
622This function is a GNU extension.
623@end deftypefun
624
706074a5
UD
625@comment string.h
626@comment GNU
627@deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
628This function is similar to @code{strdup} but always copies at most
629@var{size} characters into the newly allocated string.
630
631If the length of @var{s} is more than @var{size}, then @code{strndup}
632copies just the first @var{size} characters and adds a closing null
633terminator. Otherwise all characters are copied and the string is
634terminated.
635
636This function is different to @code{strncpy} in that it always
637terminates the destination string.
738d1a5a
UD
638
639@code{strndup} is a GNU extension.
706074a5
UD
640@end deftypefun
641
28f540f4
RM
642@comment string.h
643@comment Unknown origin
8a2f1f5b 644@deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from})
28f540f4
RM
645This function is like @code{strcpy}, except that it returns a pointer to
646the end of the string @var{to} (that is, the address of the terminating
8a2f1f5b 647null character @code{to + strlen (from)}) rather than the beginning.
28f540f4
RM
648
649For example, this program uses @code{stpcpy} to concatenate @samp{foo}
650and @samp{bar} to produce @samp{foobar}, which it then prints.
651
652@smallexample
653@include stpcpy.c.texi
654@end smallexample
655
f65fd747 656This function is not part of the ISO or POSIX standards, and is not
28f540f4
RM
657customary on Unix systems, but we did not invent it either. Perhaps it
658comes from MS-DOG.
659
8a2f1f5b
UD
660Its behavior is undefined if the strings overlap. The function is
661declared in @file{string.h}.
662@end deftypefun
663
664@comment wchar.h
665@comment GNU
666@deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
667This function is like @code{wcscpy}, except that it returns a pointer to
668the end of the string @var{wto} (that is, the address of the terminating
669null character @code{wto + strlen (wfrom)}) rather than the beginning.
670
671This function is not part of ISO or POSIX but was found useful while
1f77f049 672developing @theglibc{} itself.
8a2f1f5b
UD
673
674The behavior of @code{wcpcpy} is undefined if the strings overlap.
675
676@code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}.
28f540f4
RM
677@end deftypefun
678
706074a5
UD
679@comment string.h
680@comment GNU
8a2f1f5b 681@deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
706074a5
UD
682This function is similar to @code{stpcpy} but copies always exactly
683@var{size} characters into @var{to}.
684
685If the length of @var{from} is more then @var{size}, then @code{stpncpy}
686copies just the first @var{size} characters and returns a pointer to the
687character directly following the one which was copied last. Note that in
688this case there is no null terminator written into @var{to}.
689
690If the length of @var{from} is less than @var{size}, then @code{stpncpy}
691copies all of @var{from}, followed by enough null characters to add up
0bc93a2f
AJ
692to @var{size} characters in all. This behavior is rarely useful, but it
693is implemented to be useful in contexts where this behavior of the
706074a5
UD
694@code{strncpy} is used. @code{stpncpy} returns a pointer to the
695@emph{first} written null character.
696
f65fd747 697This function is not part of ISO or POSIX but was found useful while
1f77f049 698developing @theglibc{} itself.
706074a5 699
0bc93a2f 700Its behavior is undefined if the strings overlap. The function is
8a2f1f5b
UD
701declared in @file{string.h}.
702@end deftypefun
703
704@comment wchar.h
705@comment GNU
706@deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
707This function is similar to @code{wcpcpy} but copies always exactly
708@var{wsize} characters into @var{wto}.
709
710If the length of @var{wfrom} is more then @var{size}, then
711@code{wcpncpy} copies just the first @var{size} wide characters and
80b54217
UD
712returns a pointer to the wide character directly following the last
713non-null wide character which was copied last. Note that in this case
714there is no null terminator written into @var{wto}.
8a2f1f5b
UD
715
716If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy}
717copies all of @var{wfrom}, followed by enough null characters to add up
0bc93a2f
AJ
718to @var{size} characters in all. This behavior is rarely useful, but it
719is implemented to be useful in contexts where this behavior of the
8a2f1f5b
UD
720@code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the
721@emph{first} written null character.
722
723This function is not part of ISO or POSIX but was found useful while
1f77f049 724developing @theglibc{} itself.
8a2f1f5b 725
0bc93a2f 726Its behavior is undefined if the strings overlap.
8a2f1f5b
UD
727
728@code{wcpncpy} is a GNU extension and is declared in @file{wchar.h}.
706074a5
UD
729@end deftypefun
730
731@comment string.h
732@comment GNU
26b4d766 733@deftypefn {Macro} {char *} strdupa (const char *@var{s})
976780fd 734This macro is similar to @code{strdup} but allocates the new string
dd7d45e8
UD
735using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
736Automatic}). This means of course the returned string has the same
737limitations as any block of memory allocated using @code{alloca}.
706074a5 738
dd7d45e8 739For obvious reasons @code{strdupa} is implemented only as a macro;
40a55d20 740you cannot get the address of this function. Despite this limitation
706074a5
UD
741it is a useful function. The following code shows a situation where
742using @code{malloc} would be a lot more expensive.
743
744@smallexample
745@include strdupa.c.texi
746@end smallexample
747
748Please note that calling @code{strtok} using @var{path} directly is
8a2f1f5b
UD
749invalid. It is also not allowed to call @code{strdupa} in the argument
750list of @code{strtok} since @code{strdupa} uses @code{alloca}
751(@pxref{Variable Size Automatic}) can interfere with the parameter
752passing.
706074a5
UD
753
754This function is only available if GNU CC is used.
26b4d766 755@end deftypefn
706074a5
UD
756
757@comment string.h
758@comment GNU
26b4d766 759@deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size})
706074a5
UD
760This function is similar to @code{strndup} but like @code{strdupa} it
761allocates the new string using @code{alloca}
762@pxref{Variable Size Automatic}. The same advantages and limitations
763of @code{strdupa} are valid for @code{strndupa}, too.
764
dd7d45e8 765This function is implemented only as a macro, just like @code{strdupa}.
8a2f1f5b
UD
766Just as @code{strdupa} this macro also must not be used inside the
767parameter list in a function call.
706074a5
UD
768
769@code{strndupa} is only available if GNU CC is used.
26b4d766 770@end deftypefn
706074a5 771
28f540f4 772@comment string.h
f65fd747 773@comment ISO
8a2f1f5b 774@deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from})
28f540f4
RM
775The @code{strcat} function is similar to @code{strcpy}, except that the
776characters from @var{from} are concatenated or appended to the end of
777@var{to}, instead of overwriting it. That is, the first character from
778@var{from} overwrites the null character marking the end of @var{to}.
779
780An equivalent definition for @code{strcat} would be:
781
782@smallexample
783char *
8a2f1f5b 784strcat (char *restrict to, const char *restrict from)
28f540f4
RM
785@{
786 strcpy (to + strlen (to), from);
787 return to;
788@}
789@end smallexample
790
791This function has undefined results if the strings overlap.
792@end deftypefun
793
8a2f1f5b
UD
794@comment wchar.h
795@comment ISO
796@deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
797The @code{wcscat} function is similar to @code{wcscpy}, except that the
798characters from @var{wfrom} are concatenated or appended to the end of
799@var{wto}, instead of overwriting it. That is, the first character from
800@var{wfrom} overwrites the null character marking the end of @var{wto}.
801
802An equivalent definition for @code{wcscat} would be:
803
804@smallexample
805wchar_t *
806wcscat (wchar_t *wto, const wchar_t *wfrom)
807@{
808 wcscpy (wto + wcslen (wto), wfrom);
809 return wto;
810@}
811@end smallexample
812
813This function has undefined results if the strings overlap.
814@end deftypefun
815
816Programmers using the @code{strcat} or @code{wcscat} function (or the
817following @code{strncat} or @code{wcsncar} functions for that matter)
818can easily be recognized as lazy and reckless. In almost all situations
819the lengths of the participating strings are known (it better should be
820since how can one otherwise ensure the allocated size of the buffer is
821sufficient?) Or at least, one could know them if one keeps track of the
ee2752ea 822results of the various function calls. But then it is very inefficient
8a2f1f5b
UD
823to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the
824end of the destination string so that the actual copying can start.
825This is a common example:
ee2752ea
UD
826
827@cindex __va_copy
828@cindex va_copy
829@smallexample
49c091e5 830/* @r{This function concatenates arbitrarily many strings. The last}
ee2752ea
UD
831 @r{parameter must be @code{NULL}.} */
832char *
8a2f1f5b 833concat (const char *str, @dots{})
ee2752ea
UD
834@{
835 va_list ap, ap2;
836 size_t total = 1;
837 const char *s;
838 char *result;
839
840 va_start (ap, str);
841 /* @r{Actually @code{va_copy}, but this is the name more gcc versions}
842 @r{understand.} */
843 __va_copy (ap2, ap);
844
845 /* @r{Determine how much space we need.} */
846 for (s = str; s != NULL; s = va_arg (ap, const char *))
847 total += strlen (s);
848
849 va_end (ap);
850
851 result = (char *) malloc (total);
852 if (result != NULL)
853 @{
854 result[0] = '\0';
855
856 /* @r{Copy the strings.} */
857 for (s = str; s != NULL; s = va_arg (ap2, const char *))
858 strcat (result, s);
859 @}
860
861 va_end (ap2);
862
863 return result;
864@}
865@end smallexample
866
867This looks quite simple, especially the second loop where the strings
868are actually copied. But these innocent lines hide a major performance
869penalty. Just imagine that ten strings of 100 bytes each have to be
870concatenated. For the second string we search the already stored 100
871bytes for the end of the string so that we can append the next string.
872For all strings in total the comparisons necessary to find the end of
873the intermediate results sums up to 5500! If we combine the copying
874with the search for the allocation we can write this function more
49c091e5 875efficient:
ee2752ea
UD
876
877@smallexample
878char *
8a2f1f5b 879concat (const char *str, @dots{})
ee2752ea
UD
880@{
881 va_list ap;
882 size_t allocated = 100;
883 char *result = (char *) malloc (allocated);
ee2752ea 884
623281e0 885 if (result != NULL)
ee2752ea
UD
886 @{
887 char *newp;
623281e0 888 char *wp;
ee2752ea 889
623281e0 890 va_start (ap, str);
ee2752ea
UD
891
892 wp = result;
893 for (s = str; s != NULL; s = va_arg (ap, const char *))
894 @{
895 size_t len = strlen (s);
896
897 /* @r{Resize the allocated memory if necessary.} */
898 if (wp + len + 1 > result + allocated)
899 @{
900 allocated = (allocated + len) * 2;
901 newp = (char *) realloc (result, allocated);
902 if (newp == NULL)
903 @{
904 free (result);
905 return NULL;
906 @}
907 wp = newp + (wp - result);
908 result = newp;
909 @}
910
911 wp = mempcpy (wp, s, len);
912 @}
913
914 /* @r{Terminate the result string.} */
915 *wp++ = '\0';
916
917 /* @r{Resize memory to the optimal size.} */
918 newp = realloc (result, wp - result);
919 if (newp != NULL)
920 result = newp;
921
922 va_end (ap);
923 @}
924
925 return result;
926@}
927@end smallexample
928
929With a bit more knowledge about the input strings one could fine-tune
930the memory allocation. The difference we are pointing to here is that
931we don't use @code{strcat} anymore. We always keep track of the length
932of the current intermediate result so we can safe us the search for the
933end of the string and use @code{mempcpy}. Please note that we also
934don't use @code{stpcpy} which might seem more natural since we handle
935with strings. But this is not necessary since we already know the
936length of the string and therefore can use the faster memory copying
8a2f1f5b 937function. The example would work for wide characters the same way.
ee2752ea
UD
938
939Whenever a programmer feels the need to use @code{strcat} she or he
940should think twice and look through the program whether the code cannot
941be rewritten to take advantage of already calculated results. Again: it
942is almost always unnecessary to use @code{strcat}.
943
28f540f4 944@comment string.h
f65fd747 945@comment ISO
8a2f1f5b 946@deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
28f540f4
RM
947This function is like @code{strcat} except that not more than @var{size}
948characters from @var{from} are appended to the end of @var{to}. A
949single null character is also always appended to @var{to}, so the total
950allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
951longer than its initial length.
952
953The @code{strncat} function could be implemented like this:
954
955@smallexample
956@group
957char *
958strncat (char *to, const char *from, size_t size)
959@{
8a2f1f5b 960 to[strlen (to) + size] = '\0';
28f540f4
RM
961 strncpy (to + strlen (to), from, size);
962 return to;
963@}
964@end group
965@end smallexample
966
967The behavior of @code{strncat} is undefined if the strings overlap.
968@end deftypefun
969
8a2f1f5b
UD
970@comment wchar.h
971@comment ISO
972@deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
973This function is like @code{wcscat} except that not more than @var{size}
974characters from @var{from} are appended to the end of @var{to}. A
975single null character is also always appended to @var{to}, so the total
976allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
977longer than its initial length.
978
979The @code{wcsncat} function could be implemented like this:
980
981@smallexample
982@group
983wchar_t *
984wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom,
985 size_t size)
986@{
987 wto[wcslen (to) + size] = L'\0';
988 wcsncpy (wto + wcslen (wto), wfrom, size);
989 return wto;
990@}
991@end group
992@end smallexample
993
994The behavior of @code{wcsncat} is undefined if the strings overlap.
995@end deftypefun
996
997Here is an example showing the use of @code{strncpy} and @code{strncat}
998(the wide character version is equivalent). Notice how, in the call to
999@code{strncat}, the @var{size} parameter is computed to avoid
1000overflowing the character array @code{buffer}.
28f540f4
RM
1001
1002@smallexample
1003@include strncat.c.texi
1004@end smallexample
1005
1006@noindent
1007The output produced by this program looks like:
1008
1009@smallexample
1010hello
1011hello, wo
1012@end smallexample
1013
1014@comment string.h
1015@comment BSD
af6f3906 1016@deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
28f540f4
RM
1017This is a partially obsolete alternative for @code{memmove}, derived from
1018BSD. Note that it is not quite equivalent to @code{memmove}, because the
af6f3906 1019arguments are not in the same order and there is no return value.
28f540f4
RM
1020@end deftypefun
1021
1022@comment string.h
1023@comment BSD
af6f3906 1024@deftypefun void bzero (void *@var{block}, size_t @var{size})
28f540f4
RM
1025This is a partially obsolete alternative for @code{memset}, derived from
1026BSD. Note that it is not as general as @code{memset}, because the only
1027value it can store is zero.
1028@end deftypefun
1029
b4012b75 1030@node String/Array Comparison
28f540f4
RM
1031@section String/Array Comparison
1032@cindex comparing strings and arrays
1033@cindex string comparison functions
1034@cindex array comparison functions
1035@cindex predicates on strings
1036@cindex predicates on arrays
1037
1038You can use the functions in this section to perform comparisons on the
1039contents of strings and arrays. As well as checking for equality, these
1040functions can also be used as the ordering functions for sorting
1041operations. @xref{Searching and Sorting}, for an example of this.
1042
1043Unlike most comparison operations in C, the string comparison functions
1044return a nonzero value if the strings are @emph{not} equivalent rather
1045than if they are. The sign of the value indicates the relative ordering
1046of the first characters in the strings that are not equivalent: a
1047negative value indicates that the first string is ``less'' than the
a5113b14 1048second, while a positive value indicates that the first string is
28f540f4
RM
1049``greater''.
1050
1051The most common use of these functions is to check only for equality.
1052This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
1053
1054All of these functions are declared in the header file @file{string.h}.
1055@pindex string.h
1056
1057@comment string.h
f65fd747 1058@comment ISO
28f540f4
RM
1059@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
1060The function @code{memcmp} compares the @var{size} bytes of memory
1061beginning at @var{a1} against the @var{size} bytes of memory beginning
1062at @var{a2}. The value returned has the same sign as the difference
1063between the first differing pair of bytes (interpreted as @code{unsigned
1064char} objects, then promoted to @code{int}).
1065
1066If the contents of the two blocks are equal, @code{memcmp} returns
1067@code{0}.
1068@end deftypefun
1069
8a2f1f5b
UD
1070@comment wcjar.h
1071@comment ISO
1072@deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size})
1073The function @code{wmemcmp} compares the @var{size} wide characters
1074beginning at @var{a1} against the @var{size} wide characters beginning
1075at @var{a2}. The value returned is smaller than or larger than zero
1076depending on whether the first differing wide character is @var{a1} is
1077smaller or larger than the corresponding character in @var{a2}.
1078
1079If the contents of the two blocks are equal, @code{wmemcmp} returns
1080@code{0}.
1081@end deftypefun
1082
28f540f4
RM
1083On arbitrary arrays, the @code{memcmp} function is mostly useful for
1084testing equality. It usually isn't meaningful to do byte-wise ordering
1085comparisons on arrays of things other than bytes. For example, a
1086byte-wise comparison on the bytes that make up floating-point numbers
1087isn't likely to tell you anything about the relationship between the
1088values of the floating-point numbers.
1089
8a2f1f5b
UD
1090@code{wmemcmp} is really only useful to compare arrays of type
1091@code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes
1092at a time and this number of bytes is system dependent.
1093
28f540f4
RM
1094You should also be careful about using @code{memcmp} to compare objects
1095that can contain ``holes'', such as the padding inserted into structure
1096objects to enforce alignment requirements, extra space at the end of
1097unions, and extra characters at the ends of strings whose length is less
1098than their allocated size. The contents of these ``holes'' are
1099indeterminate and may cause strange behavior when performing byte-wise
1100comparisons. For more predictable results, perform an explicit
1101component-wise comparison.
1102
1103For example, given a structure type definition like:
1104
1105@smallexample
1106struct foo
1107 @{
1108 unsigned char tag;
1109 union
1110 @{
1111 double f;
1112 long i;
1113 char *p;
1114 @} value;
1115 @};
1116@end smallexample
1117
1118@noindent
1119you are better off writing a specialized comparison function to compare
1120@code{struct foo} objects instead of comparing them with @code{memcmp}.
1121
1122@comment string.h
f65fd747 1123@comment ISO
28f540f4
RM
1124@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
1125The @code{strcmp} function compares the string @var{s1} against
1126@var{s2}, returning a value that has the same sign as the difference
1127between the first differing pair of characters (interpreted as
1128@code{unsigned char} objects, then promoted to @code{int}).
1129
1130If the two strings are equal, @code{strcmp} returns @code{0}.
1131
1132A consequence of the ordering used by @code{strcmp} is that if @var{s1}
1133is an initial substring of @var{s2}, then @var{s1} is considered to be
1134``less than'' @var{s2}.
8a2f1f5b
UD
1135
1136@code{strcmp} does not take sorting conventions of the language the
1137strings are written in into account. To get that one has to use
1138@code{strcoll}.
1139@end deftypefun
1140
1141@comment wchar.h
1142@comment ISO
1143@deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
1144
1145The @code{wcscmp} function compares the wide character string @var{ws1}
1146against @var{ws2}. The value returned is smaller than or larger than zero
1147depending on whether the first differing wide character is @var{ws1} is
1148smaller or larger than the corresponding character in @var{ws2}.
1149
1150If the two strings are equal, @code{wcscmp} returns @code{0}.
1151
1152A consequence of the ordering used by @code{wcscmp} is that if @var{ws1}
1153is an initial substring of @var{ws2}, then @var{ws1} is considered to be
1154``less than'' @var{ws2}.
1155
1156@code{wcscmp} does not take sorting conventions of the language the
1157strings are written in into account. To get that one has to use
1158@code{wcscoll}.
28f540f4
RM
1159@end deftypefun
1160
1161@comment string.h
1162@comment BSD
1163@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
4547c1a4 1164This function is like @code{strcmp}, except that differences in case are
dd7d45e8 1165ignored. How uppercase and lowercase characters are related is
4547c1a4
UD
1166determined by the currently selected locale. In the standard @code{"C"}
1167locale the characters @"A and @"a do not match but in a locale which
dd7d45e8 1168regards these characters as parts of the alphabet they do match.
28f540f4 1169
85c165be 1170@noindent
28f540f4
RM
1171@code{strcasecmp} is derived from BSD.
1172@end deftypefun
1173
8a2f1f5b
UD
1174@comment wchar.h
1175@comment GNU
1176@deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_T *@var{ws2})
1177This function is like @code{wcscmp}, except that differences in case are
1178ignored. How uppercase and lowercase characters are related is
1179determined by the currently selected locale. In the standard @code{"C"}
1180locale the characters @"A and @"a do not match but in a locale which
1181regards these characters as parts of the alphabet they do match.
1182
1183@noindent
1184@code{wcscasecmp} is a GNU extension.
1185@end deftypefun
1186
1187@comment string.h
1188@comment ISO
1189@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
1190This function is the similar to @code{strcmp}, except that no more than
11bf311e
UD
1191@var{size} characters are compared. In other words, if the two
1192strings are the same in their first @var{size} characters, the
8a2f1f5b
UD
1193return value is zero.
1194@end deftypefun
1195
1196@comment wchar.h
1197@comment ISO
1198@deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size})
1199This function is the similar to @code{wcscmp}, except that no more than
1200@var{size} wide characters are compared. In other words, if the two
1201strings are the same in their first @var{size} wide characters, the
1202return value is zero.
1203@end deftypefun
1204
28f540f4
RM
1205@comment string.h
1206@comment BSD
1207@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
1208This function is like @code{strncmp}, except that differences in case
dd7d45e8
UD
1209are ignored. Like @code{strcasecmp}, it is locale dependent how
1210uppercase and lowercase characters are related.
28f540f4 1211
85c165be 1212@noindent
28f540f4
RM
1213@code{strncasecmp} is a GNU extension.
1214@end deftypefun
1215
8a2f1f5b
UD
1216@comment wchar.h
1217@comment GNU
1218@deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n})
1219This function is like @code{wcsncmp}, except that differences in case
1220are ignored. Like @code{wcscasecmp}, it is locale dependent how
1221uppercase and lowercase characters are related.
1222
1223@noindent
1224@code{wcsncasecmp} is a GNU extension.
28f540f4
RM
1225@end deftypefun
1226
8a2f1f5b
UD
1227Here are some examples showing the use of @code{strcmp} and
1228@code{strncmp} (equivalent examples can be constructed for the wide
1229character functions). These examples assume the use of the ASCII
1230character set. (If some other character set---say, EBCDIC---is used
1231instead, then the glyphs are associated with different numeric codes,
1232and the return values and ordering may differ.)
28f540f4
RM
1233
1234@smallexample
1235strcmp ("hello", "hello")
1236 @result{} 0 /* @r{These two strings are the same.} */
1237strcmp ("hello", "Hello")
1238 @result{} 32 /* @r{Comparisons are case-sensitive.} */
1239strcmp ("hello", "world")
1240 @result{} -15 /* @r{The character @code{'h'} comes before @code{'w'}.} */
1241strcmp ("hello", "hello, world")
1242 @result{} -44 /* @r{Comparing a null character against a comma.} */
6952e59e 1243strncmp ("hello", "hello, world", 5)
28f540f4
RM
1244 @result{} 0 /* @r{The initial 5 characters are the same.} */
1245strncmp ("hello, world", "hello, stupid world!!!", 5)
1246 @result{} 0 /* @r{The initial 5 characters are the same.} */
1247@end smallexample
1248
1f205a47
UD
1249@comment string.h
1250@comment GNU
1251@deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2})
1252The @code{strverscmp} function compares the string @var{s1} against
f2282d42
RM
1253@var{s2}, considering them as holding indices/version numbers. The
1254return value follows the same conventions as found in the
1255@code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no
1256digits, @code{strverscmp} behaves like @code{strcmp}.
1f205a47 1257
f2ea0f5b 1258Basically, we compare strings normally (character by character), until
1f205a47 1259we find a digit in each string - then we enter a special comparison
dd7d45e8 1260mode, where each sequence of digits is taken as a whole. If we reach the
1f205a47
UD
1261end of these two parts without noticing a difference, we return to the
1262standard comparison mode. There are two types of numeric parts:
f2ea0f5b 1263"integral" and "fractional" (those begin with a '0'). The types
1f205a47
UD
1264of the numeric parts affect the way we sort them:
1265
1266@itemize @bullet
1267@item
1268integral/integral: we compare values as you would expect.
1269
1270@item
f2ea0f5b 1271fractional/integral: the fractional part is less than the integral one.
1f205a47
UD
1272Again, no surprise.
1273
1274@item
f2ea0f5b
UD
1275fractional/fractional: the things become a bit more complex.
1276If the common prefix contains only leading zeroes, the longest part is less
1277than the other one; else the comparison behaves normally.
1f205a47
UD
1278@end itemize
1279
1280@smallexample
1281strverscmp ("no digit", "no digit")
0bc93a2f 1282 @result{} 0 /* @r{same behavior as strcmp.} */
1f205a47
UD
1283strverscmp ("item#99", "item#100")
1284 @result{} <0 /* @r{same prefix, but 99 < 100.} */
1285strverscmp ("alpha1", "alpha001")
f2ea0f5b 1286 @result{} >0 /* @r{fractional part inferior to integral one.} */
1f205a47 1287strverscmp ("part1_f012", "part1_f01")
f2ea0f5b 1288 @result{} >0 /* @r{two fractional parts.} */
1f205a47
UD
1289strverscmp ("foo.009", "foo.0")
1290 @result{} <0 /* @r{idem, but with leading zeroes only.} */
1291@end smallexample
1292
f2ea0f5b 1293This function is especially useful when dealing with filename sorting,
1f205a47
UD
1294because filenames frequently hold indices/version numbers.
1295
1296@code{strverscmp} is a GNU extension.
1297@end deftypefun
1298
28f540f4
RM
1299@comment string.h
1300@comment BSD
1301@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
1302This is an obsolete alias for @code{memcmp}, derived from BSD.
1303@end deftypefun
1304
b4012b75 1305@node Collation Functions
28f540f4
RM
1306@section Collation Functions
1307
1308@cindex collating strings
1309@cindex string collation functions
1310
1311In some locales, the conventions for lexicographic ordering differ from
1312the strict numeric ordering of character codes. For example, in Spanish
1313most glyphs with diacritical marks such as accents are not considered
1314distinct letters for the purposes of collation. On the other hand, the
1315two-character sequence @samp{ll} is treated as a single letter that is
1316collated immediately after @samp{l}.
1317
1318You can use the functions @code{strcoll} and @code{strxfrm} (declared in
8a2f1f5b
UD
1319the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm}
1320(declared in the headers file @file{wchar}) to compare strings using a
1321collation ordering appropriate for the current locale. The locale used
1322by these functions in particular can be specified by setting the locale
1323for the @code{LC_COLLATE} category; see @ref{Locales}.
28f540f4 1324@pindex string.h
8a2f1f5b 1325@pindex wchar.h
28f540f4
RM
1326
1327In the standard C locale, the collation sequence for @code{strcoll} is
8a2f1f5b
UD
1328the same as that for @code{strcmp}. Similarly, @code{wcscoll} and
1329@code{wcscmp} are the same in this situation.
28f540f4
RM
1330
1331Effectively, the way these functions work is by applying a mapping to
1332transform the characters in a string to a byte sequence that represents
1333the string's position in the collating sequence of the current locale.
1334Comparing two such byte sequences in a simple fashion is equivalent to
1335comparing the strings with the locale's collating sequence.
1336
8a2f1f5b
UD
1337The functions @code{strcoll} and @code{wcscoll} perform this translation
1338implicitly, in order to do one comparison. By contrast, @code{strxfrm}
1339and @code{wcsxfrm} perform the mapping explicitly. If you are making
1340multiple comparisons using the same string or set of strings, it is
1341likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to
1342transform all the strings just once, and subsequently compare the
1343transformed strings with @code{strcmp} or @code{wcscmp}.
28f540f4
RM
1344
1345@comment string.h
f65fd747 1346@comment ISO
28f540f4
RM
1347@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
1348The @code{strcoll} function is similar to @code{strcmp} but uses the
1349collating sequence of the current locale for collation (the
1350@code{LC_COLLATE} locale).
1351@end deftypefun
1352
8a2f1f5b
UD
1353@comment wchar.h
1354@comment ISO
1355@deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
1356The @code{wcscoll} function is similar to @code{wcscmp} but uses the
1357collating sequence of the current locale for collation (the
1358@code{LC_COLLATE} locale).
1359@end deftypefun
1360
28f540f4
RM
1361Here is an example of sorting an array of strings, using @code{strcoll}
1362to compare them. The actual sort algorithm is not written here; it
1363comes from @code{qsort} (@pxref{Array Sort Function}). The job of the
1364code shown here is to say how to compare the strings while sorting them.
1365(Later on in this section, we will show a way to do this more
1366efficiently using @code{strxfrm}.)
1367
1368@smallexample
1369/* @r{This is the comparison function used with @code{qsort}.} */
1370
1371int
1372compare_elements (char **p1, char **p2)
1373@{
1374 return strcoll (*p1, *p2);
1375@}
1376
1377/* @r{This is the entry point---the function to sort}
1378 @r{strings using the locale's collating sequence.} */
1379
1380void
1381sort_strings (char **array, int nstrings)
1382@{
1383 /* @r{Sort @code{temp_array} by comparing the strings.} */
9fc19e48
UD
1384 qsort (array, nstrings,
1385 sizeof (char *), compare_elements);
28f540f4
RM
1386@}
1387@end smallexample
1388
1389@cindex converting string to collation order
1390@comment string.h
f65fd747 1391@comment ISO
8a2f1f5b
UD
1392@deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
1393The function @code{strxfrm} transforms the string @var{from} using the
1394collation transformation determined by the locale currently selected for
28f540f4
RM
1395collation, and stores the transformed string in the array @var{to}. Up
1396to @var{size} characters (including a terminating null character) are
1397stored.
1398
1399The behavior is undefined if the strings @var{to} and @var{from}
1400overlap; see @ref{Copying and Concatenation}.
1401
1402The return value is the length of the entire transformed string. This
1403value is not affected by the value of @var{size}, but if it is greater
a5113b14
UD
1404or equal than @var{size}, it means that the transformed string did not
1405entirely fit in the array @var{to}. In this case, only as much of the
1406string as actually fits was stored. To get the whole transformed
1407string, call @code{strxfrm} again with a bigger output array.
28f540f4
RM
1408
1409The transformed string may be longer than the original string, and it
1410may also be shorter.
1411
1412If @var{size} is zero, no characters are stored in @var{to}. In this
1413case, @code{strxfrm} simply returns the number of characters that would
1414be the length of the transformed string. This is useful for determining
8a2f1f5b
UD
1415what size the allocated array should be. It does not matter what
1416@var{to} is if @var{size} is zero; @var{to} may even be a null pointer.
1417@end deftypefun
1418
1419@comment wchar.h
1420@comment ISO
1421@deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
1422The function @code{wcsxfrm} transforms wide character string @var{wfrom}
1423using the collation transformation determined by the locale currently
1424selected for collation, and stores the transformed string in the array
1425@var{wto}. Up to @var{size} wide characters (including a terminating null
1426character) are stored.
1427
1428The behavior is undefined if the strings @var{wto} and @var{wfrom}
1429overlap; see @ref{Copying and Concatenation}.
1430
1431The return value is the length of the entire transformed wide character
1432string. This value is not affected by the value of @var{size}, but if
1433it is greater or equal than @var{size}, it means that the transformed
1434wide character string did not entirely fit in the array @var{wto}. In
1435this case, only as much of the wide character string as actually fits
1436was stored. To get the whole transformed wide character string, call
1437@code{wcsxfrm} again with a bigger output array.
1438
1439The transformed wide character string may be longer than the original
1440wide character string, and it may also be shorter.
1441
1442If @var{size} is zero, no characters are stored in @var{to}. In this
1443case, @code{wcsxfrm} simply returns the number of wide characters that
1444would be the length of the transformed wide character string. This is
1445useful for determining what size the allocated array should be (remember
1446to multiply with @code{sizeof (wchar_t)}). It does not matter what
1447@var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer.
28f540f4
RM
1448@end deftypefun
1449
1450Here is an example of how you can use @code{strxfrm} when
1451you plan to do many comparisons. It does the same thing as the previous
1452example, but much faster, because it has to transform each string only
1453once, no matter how many times it is compared with other strings. Even
1454the time needed to allocate and free storage is much less than the time
1455we save, when there are many strings.
1456
1457@smallexample
1458struct sorter @{ char *input; char *transformed; @};
1459
1460/* @r{This is the comparison function used with @code{qsort}}
1461 @r{to sort an array of @code{struct sorter}.} */
1462
1463int
1464compare_elements (struct sorter *p1, struct sorter *p2)
1465@{
1466 return strcmp (p1->transformed, p2->transformed);
1467@}
1468
1469/* @r{This is the entry point---the function to sort}
1470 @r{strings using the locale's collating sequence.} */
1471
1472void
1473sort_strings_fast (char **array, int nstrings)
1474@{
1475 struct sorter temp_array[nstrings];
1476 int i;
1477
1478 /* @r{Set up @code{temp_array}. Each element contains}
1479 @r{one input string and its transformed string.} */
1480 for (i = 0; i < nstrings; i++)
1481 @{
1482 size_t length = strlen (array[i]) * 2;
a5113b14 1483 char *transformed;
f2ea0f5b 1484 size_t transformed_length;
28f540f4
RM
1485
1486 temp_array[i].input = array[i];
1487
a5113b14
UD
1488 /* @r{First try a buffer perhaps big enough.} */
1489 transformed = (char *) xmalloc (length);
1490
1491 /* @r{Transform @code{array[i]}.} */
1492 transformed_length = strxfrm (transformed, array[i], length);
1493
1494 /* @r{If the buffer was not large enough, resize it}
1495 @r{and try again.} */
1496 if (transformed_length >= length)
28f540f4 1497 @{
a5113b14
UD
1498 /* @r{Allocate the needed space. +1 for terminating}
1499 @r{@code{NUL} character.} */
1500 transformed = (char *) xrealloc (transformed,
1501 transformed_length + 1);
1502
1503 /* @r{The return value is not interesting because we know}
1504 @r{how long the transformed string is.} */
dd7d45e8
UD
1505 (void) strxfrm (transformed, array[i],
1506 transformed_length + 1);
28f540f4 1507 @}
a5113b14
UD
1508
1509 temp_array[i].transformed = transformed;
28f540f4
RM
1510 @}
1511
1512 /* @r{Sort @code{temp_array} by comparing transformed strings.} */
1513 qsort (temp_array, sizeof (struct sorter),
1514 nstrings, compare_elements);
1515
1516 /* @r{Put the elements back in the permanent array}
1517 @r{in their sorted order.} */
1518 for (i = 0; i < nstrings; i++)
1519 array[i] = temp_array[i].input;
1520
1521 /* @r{Free the strings we allocated.} */
1522 for (i = 0; i < nstrings; i++)
1523 free (temp_array[i].transformed);
1524@}
1525@end smallexample
1526
8a2f1f5b
UD
1527The interesting part of this code for the wide character version would
1528look like this:
1529
1530@smallexample
1531void
1532sort_strings_fast (wchar_t **array, int nstrings)
1533@{
1534 @dots{}
1535 /* @r{Transform @code{array[i]}.} */
1536 transformed_length = wcsxfrm (transformed, array[i], length);
1537
1538 /* @r{If the buffer was not large enough, resize it}
1539 @r{and try again.} */
1540 if (transformed_length >= length)
1541 @{
1542 /* @r{Allocate the needed space. +1 for terminating}
1543 @r{@code{NUL} character.} */
1544 transformed = (wchar_t *) xrealloc (transformed,
1545 (transformed_length + 1)
1546 * sizeof (wchar_t));
1547
1548 /* @r{The return value is not interesting because we know}
1549 @r{how long the transformed string is.} */
1550 (void) wcsxfrm (transformed, array[i],
1551 transformed_length + 1);
1552 @}
1553 @dots{}
1554@end smallexample
1555
1556@noindent
1557Note the additional multiplication with @code{sizeof (wchar_t)} in the
1558@code{realloc} call.
1559
1560@strong{Compatibility Note:} The string collation functions are a new
976780fd 1561feature of @w{ISO C90}. Older C dialects have no equivalent feature.
8a2f1f5b
UD
1562The wide character versions were introduced in @w{Amendment 1} to @w{ISO
1563C90}.
28f540f4 1564
b4012b75 1565@node Search Functions
28f540f4
RM
1566@section Search Functions
1567
1568This section describes library functions which perform various kinds
1569of searching operations on strings and arrays. These functions are
1570declared in the header file @file{string.h}.
1571@pindex string.h
1572@cindex search functions (for strings)
1573@cindex string search functions
1574
1575@comment string.h
f65fd747 1576@comment ISO
28f540f4
RM
1577@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
1578This function finds the first occurrence of the byte @var{c} (converted
1579to an @code{unsigned char}) in the initial @var{size} bytes of the
1580object beginning at @var{block}. The return value is a pointer to the
1581located byte, or a null pointer if no match was found.
1582@end deftypefun
1583
8a2f1f5b
UD
1584@comment wchar.h
1585@comment ISO
1586@deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
1587This function finds the first occurrence of the wide character @var{wc}
1588in the initial @var{size} wide characters of the object beginning at
1589@var{block}. The return value is a pointer to the located wide
1590character, or a null pointer if no match was found.
1591@end deftypefun
1592
87b56f36
UD
1593@comment string.h
1594@comment GNU
1595@deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c})
1596Often the @code{memchr} function is used with the knowledge that the
1597byte @var{c} is available in the memory block specified by the
1598parameters. But this means that the @var{size} parameter is not really
1599needed and that the tests performed with it at runtime (to check whether
1600the end of the block is reached) are not needed.
1601
1602The @code{rawmemchr} function exists for just this situation which is
1603surprisingly frequent. The interface is similar to @code{memchr} except
1604that the @var{size} parameter is missing. The function will look beyond
1605the end of the block pointed to by @var{block} in case the programmer
6be569a4 1606made an error in assuming that the byte @var{c} is present in the block.
87b56f36
UD
1607In this case the result is unspecified. Otherwise the return value is a
1608pointer to the located byte.
1609
1610This function is of special interest when looking for the end of a
1611string. Since all strings are terminated by a null byte a call like
1612
1613@smallexample
1614 rawmemchr (str, '\0')
1615@end smallexample
1616
8a2f1f5b 1617@noindent
87b56f36
UD
1618will never go beyond the end of the string.
1619
1620This function is a GNU extension.
1621@end deftypefun
1622
ca747856
RM
1623@comment string.h
1624@comment GNU
1625@deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size})
1626The function @code{memrchr} is like @code{memchr}, except that it searches
1627backwards from the end of the block defined by @var{block} and @var{size}
1628(instead of forwards from the front).
4efcb713
UD
1629
1630This function is a GNU extension.
a2d63612 1631@end deftypefun
ca747856 1632
28f540f4 1633@comment string.h
f65fd747 1634@comment ISO
28f540f4
RM
1635@deftypefun {char *} strchr (const char *@var{string}, int @var{c})
1636The @code{strchr} function finds the first occurrence of the character
1637@var{c} (converted to a @code{char}) in the null-terminated string
1638beginning at @var{string}. The return value is a pointer to the located
1639character, or a null pointer if no match was found.
1640
1641For example,
1642@smallexample
1643strchr ("hello, world", 'l')
1644 @result{} "llo, world"
1645strchr ("hello, world", '?')
1646 @result{} NULL
a5113b14 1647@end smallexample
28f540f4
RM
1648
1649The terminating null character is considered to be part of the string,
1650so you can use this function get a pointer to the end of a string by
0520adde
FB
1651specifying a null character as the value of the @var{c} argument.
1652
1653When @code{strchr} returns a null pointer, it does not let you know
1654the position of the terminating null character it has found. If you
1655need that information, it is better (but less portable) to use
1656@code{strchrnul} than to search for it a second time.
8a2f1f5b
UD
1657@end deftypefun
1658
1659@comment wchar.h
1660@comment ISO
1661@deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, int @var{wc})
1662The @code{wcschr} function finds the first occurrence of the wide
1663character @var{wc} in the null-terminated wide character string
1664beginning at @var{wstring}. The return value is a pointer to the
1665located wide character, or a null pointer if no match was found.
1666
1667The terminating null character is considered to be part of the wide
1668character string, so you can use this function get a pointer to the end
1669of a wide character string by specifying a null wude character as the
1670value of the @var{wc} argument. It would be better (but less portable)
1671to use @code{wcschrnul} in this case, though.
28f540f4
RM
1672@end deftypefun
1673
1674@comment string.h
87b56f36 1675@comment GNU
0e4ee106
UD
1676@deftypefun {char *} strchrnul (const char *@var{string}, int @var{c})
1677@code{strchrnul} is the same as @code{strchr} except that if it does
ec28fc7c 1678not find the character, it returns a pointer to string's terminating
0e4ee106 1679null character rather than a null pointer.
8a2f1f5b
UD
1680
1681This function is a GNU extension.
1682@end deftypefun
1683
1684@comment wchar.h
1685@comment GNU
1686@deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc})
1687@code{wcschrnul} is the same as @code{wcschr} except that if it does not
1688find the wide character, it returns a pointer to wide character string's
1689terminating null wide character rather than a null pointer.
1690
1691This function is a GNU extension.
28f540f4
RM
1692@end deftypefun
1693
ec28fc7c 1694One useful, but unusual, use of the @code{strchr}
ee2752ea
UD
1695function is when one wants to have a pointer pointing to the NUL byte
1696terminating a string. This is often written in this way:
1697
1698@smallexample
1699 s += strlen (s);
1700@end smallexample
1701
1702@noindent
1703This is almost optimal but the addition operation duplicated a bit of
1704the work already done in the @code{strlen} function. A better solution
1705is this:
1706
1707@smallexample
1708 s = strchr (s, '\0');
1709@end smallexample
1710
1711There is no restriction on the second parameter of @code{strchr} so it
1712could very well also be the NUL character. Those readers thinking very
1713hard about this might now point out that the @code{strchr} function is
8c474db5 1714more expensive than the @code{strlen} function since we have two abort
1f77f049 1715criteria. This is right. But in @theglibc{} the implementation of
0e4ee106 1716@code{strchr} is optimized in a special way so that @code{strchr}
8c474db5 1717actually is faster.
ee2752ea 1718
28f540f4 1719@comment string.h
f65fd747 1720@comment ISO
28f540f4
RM
1721@deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
1722The function @code{strrchr} is like @code{strchr}, except that it searches
1723backwards from the end of the string @var{string} (instead of forwards
1724from the front).
1725
1726For example,
1727@smallexample
1728strrchr ("hello, world", 'l')
1729 @result{} "ld"
1730@end smallexample
1731@end deftypefun
1732
8a2f1f5b
UD
1733@comment wchar.h
1734@comment ISO
1735@deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{c})
1736The function @code{wcsrchr} is like @code{wcschr}, except that it searches
1737backwards from the end of the string @var{wstring} (instead of forwards
1738from the front).
1739@end deftypefun
1740
28f540f4 1741@comment string.h
f65fd747 1742@comment ISO
28f540f4
RM
1743@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
1744This is like @code{strchr}, except that it searches @var{haystack} for a
1745substring @var{needle} rather than just a single character. It
1746returns a pointer into the string @var{haystack} that is the first
1747character of the substring, or a null pointer if no match was found. If
1748@var{needle} is an empty string, the function returns @var{haystack}.
1749
1750For example,
1751@smallexample
1752strstr ("hello, world", "l")
1753 @result{} "llo, world"
1754strstr ("hello, world", "wo")
1755 @result{} "world"
1756@end smallexample
1757@end deftypefun
1758
8a2f1f5b
UD
1759@comment wchar.h
1760@comment ISO
1761@deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
1762This is like @code{wcschr}, except that it searches @var{haystack} for a
1763substring @var{needle} rather than just a single wide character. It
1764returns a pointer into the string @var{haystack} that is the first wide
1765character of the substring, or a null pointer if no match was found. If
1766@var{needle} is an empty string, the function returns @var{haystack}.
1767@end deftypefun
1768
1769@comment wchar.h
1770@comment XPG
1771@deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
5bd4d368 1772@code{wcswcs} is an deprecated alias for @code{wcsstr}. This is the
8a2f1f5b
UD
1773name originally used in the X/Open Portability Guide before the
1774@w{Amendment 1} to @w{ISO C90} was published.
1775@end deftypefun
1776
28f540f4 1777
0e4ee106 1778@comment string.h
8a2f1f5b 1779@comment GNU
0e4ee106
UD
1780@deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle})
1781This is like @code{strstr}, except that it ignores case in searching for
1782the substring. Like @code{strcasecmp}, it is locale dependent how
1783uppercase and lowercase characters are related.
1784
1785
1786For example,
1787@smallexample
d6868416 1788strcasestr ("hello, world", "L")
0e4ee106 1789 @result{} "llo, world"
d6868416 1790strcasestr ("hello, World", "wo")
0e4ee106
UD
1791 @result{} "World"
1792@end smallexample
1793@end deftypefun
1794
1795
28f540f4
RM
1796@comment string.h
1797@comment GNU
63551311 1798@deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
28f540f4
RM
1799This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
1800arrays rather than null-terminated strings. @var{needle-len} is the
1801length of @var{needle} and @var{haystack-len} is the length of
1802@var{haystack}.@refill
1803
1804This function is a GNU extension.
1805@end deftypefun
1806
1807@comment string.h
f65fd747 1808@comment ISO
28f540f4
RM
1809@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
1810The @code{strspn} (``string span'') function returns the length of the
1811initial substring of @var{string} that consists entirely of characters that
1812are members of the set specified by the string @var{skipset}. The order
1813of the characters in @var{skipset} is not important.
1814
1815For example,
1816@smallexample
1817strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
1818 @result{} 5
1819@end smallexample
8a2f1f5b
UD
1820
1821Note that ``character'' is here used in the sense of byte. In a string
1822using a multibyte character encoding (abstract) character consisting of
1823more than one byte are not treated as an entity. Each byte is treated
1824separately. The function is not locale-dependent.
1825@end deftypefun
1826
1827@comment wchar.h
1828@comment ISO
1829@deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset})
1830The @code{wcsspn} (``wide character string span'') function returns the
1831length of the initial substring of @var{wstring} that consists entirely
1832of wide characters that are members of the set specified by the string
1833@var{skipset}. The order of the wide characters in @var{skipset} is not
1834important.
28f540f4
RM
1835@end deftypefun
1836
1837@comment string.h
f65fd747 1838@comment ISO
28f540f4
RM
1839@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
1840The @code{strcspn} (``string complement span'') function returns the length
1841of the initial substring of @var{string} that consists entirely of characters
1842that are @emph{not} members of the set specified by the string @var{stopset}.
1843(In other words, it returns the offset of the first character in @var{string}
1844that is a member of the set @var{stopset}.)
1845
1846For example,
1847@smallexample
1848strcspn ("hello, world", " \t\n,.;!?")
1849 @result{} 5
1850@end smallexample
8a2f1f5b
UD
1851
1852Note that ``character'' is here used in the sense of byte. In a string
1853using a multibyte character encoding (abstract) character consisting of
1854more than one byte are not treated as an entity. Each byte is treated
1855separately. The function is not locale-dependent.
1856@end deftypefun
1857
1858@comment wchar.h
1859@comment ISO
1860@deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
1861The @code{wcscspn} (``wide character string complement span'') function
1862returns the length of the initial substring of @var{wstring} that
1863consists entirely of wide characters that are @emph{not} members of the
1864set specified by the string @var{stopset}. (In other words, it returns
1865the offset of the first character in @var{string} that is a member of
1866the set @var{stopset}.)
28f540f4
RM
1867@end deftypefun
1868
1869@comment string.h
f65fd747 1870@comment ISO
28f540f4
RM
1871@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
1872The @code{strpbrk} (``string pointer break'') function is related to
1873@code{strcspn}, except that it returns a pointer to the first character
1874in @var{string} that is a member of the set @var{stopset} instead of the
1875length of the initial substring. It returns a null pointer if no such
1876character from @var{stopset} is found.
1877
1878@c @group Invalid outside the example.
1879For example,
1880
1881@smallexample
1882strpbrk ("hello, world", " \t\n,.;!?")
1883 @result{} ", world"
1884@end smallexample
1885@c @end group
8a2f1f5b
UD
1886
1887Note that ``character'' is here used in the sense of byte. In a string
1888using a multibyte character encoding (abstract) character consisting of
1889more than one byte are not treated as an entity. Each byte is treated
1890separately. The function is not locale-dependent.
1891@end deftypefun
1892
1893@comment wchar.h
1894@comment ISO
1895@deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
1896The @code{wcspbrk} (``wide character string pointer break'') function is
1897related to @code{wcscspn}, except that it returns a pointer to the first
1898wide character in @var{wstring} that is a member of the set
1899@var{stopset} instead of the length of the initial substring. It
1900returns a null pointer if no such character from @var{stopset} is found.
28f540f4
RM
1901@end deftypefun
1902
0e4ee106
UD
1903
1904@subsection Compatibility String Search Functions
1905
1906@comment string.h
1907@comment BSD
1908@deftypefun {char *} index (const char *@var{string}, int @var{c})
1909@code{index} is another name for @code{strchr}; they are exactly the same.
1910New code should always use @code{strchr} since this name is defined in
1911@w{ISO C} while @code{index} is a BSD invention which never was available
1912on @w{System V} derived systems.
1913@end deftypefun
1914
1915@comment string.h
1916@comment BSD
1917@deftypefun {char *} rindex (const char *@var{string}, int @var{c})
1918@code{rindex} is another name for @code{strrchr}; they are exactly the same.
1919New code should always use @code{strrchr} since this name is defined in
1920@w{ISO C} while @code{rindex} is a BSD invention which never was available
1921on @w{System V} derived systems.
1922@end deftypefun
1923
b4012b75 1924@node Finding Tokens in a String
28f540f4
RM
1925@section Finding Tokens in a String
1926
28f540f4
RM
1927@cindex tokenizing strings
1928@cindex breaking a string into tokens
1929@cindex parsing tokens from a string
1930It's fairly common for programs to have a need to do some simple kinds
1931of lexical analysis and parsing, such as splitting a command string up
1932into tokens. You can do this with the @code{strtok} function, declared
1933in the header file @file{string.h}.
1934@pindex string.h
1935
1936@comment string.h
f65fd747 1937@comment ISO
8a2f1f5b 1938@deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters})
28f540f4
RM
1939A string can be split into tokens by making a series of calls to the
1940function @code{strtok}.
1941
1942The string to be split up is passed as the @var{newstring} argument on
1943the first call only. The @code{strtok} function uses this to set up
1944some internal state information. Subsequent calls to get additional
1945tokens from the same string are indicated by passing a null pointer as
1946the @var{newstring} argument. Calling @code{strtok} with another
1947non-null @var{newstring} argument reinitializes the state information.
1948It is guaranteed that no other library function ever calls @code{strtok}
1949behind your back (which would mess up this internal state information).
1950
1951The @var{delimiters} argument is a string that specifies a set of delimiters
1952that may surround the token being extracted. All the initial characters
1953that are members of this set are discarded. The first character that is
1954@emph{not} a member of this set of delimiters marks the beginning of the
1955next token. The end of the token is found by looking for the next
1956character that is a member of the delimiter set. This character in the
1957original string @var{newstring} is overwritten by a null character, and the
1958pointer to the beginning of the token in @var{newstring} is returned.
1959
1960On the next call to @code{strtok}, the searching begins at the next
1961character beyond the one that marked the end of the previous token.
1962Note that the set of delimiters @var{delimiters} do not have to be the
1963same on every call in a series of calls to @code{strtok}.
1964
1965If the end of the string @var{newstring} is reached, or if the remainder of
1966string consists only of delimiter characters, @code{strtok} returns
1967a null pointer.
8a2f1f5b 1968
8a2f1f5b
UD
1969Note that ``character'' is here used in the sense of byte. In a string
1970using a multibyte character encoding (abstract) character consisting of
1971more than one byte are not treated as an entity. Each byte is treated
1972separately. The function is not locale-dependent.
1973@end deftypefun
1974
1975@comment wchar.h
1976@comment ISO
1977@deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const char *@var{delimiters})
1978A string can be split into tokens by making a series of calls to the
1979function @code{wcstok}.
1980
1981The string to be split up is passed as the @var{newstring} argument on
1982the first call only. The @code{wcstok} function uses this to set up
1983some internal state information. Subsequent calls to get additional
1984tokens from the same wide character string are indicated by passing a
1985null pointer as the @var{newstring} argument. Calling @code{wcstok}
1986with another non-null @var{newstring} argument reinitializes the state
1987information. It is guaranteed that no other library function ever calls
1988@code{wcstok} behind your back (which would mess up this internal state
1989information).
1990
1991The @var{delimiters} argument is a wide character string that specifies
1992a set of delimiters that may surround the token being extracted. All
1993the initial wide characters that are members of this set are discarded.
1994The first wide character that is @emph{not} a member of this set of
1995delimiters marks the beginning of the next token. The end of the token
1996is found by looking for the next wide character that is a member of the
1997delimiter set. This wide character in the original wide character
1998string @var{newstring} is overwritten by a null wide character, and the
1999pointer to the beginning of the token in @var{newstring} is returned.
2000
2001On the next call to @code{wcstok}, the searching begins at the next
2002wide character beyond the one that marked the end of the previous token.
2003Note that the set of delimiters @var{delimiters} do not have to be the
2004same on every call in a series of calls to @code{wcstok}.
2005
2006If the end of the wide character string @var{newstring} is reached, or
2007if the remainder of string consists only of delimiter wide characters,
2008@code{wcstok} returns a null pointer.
2009
2010Note that ``character'' is here used in the sense of byte. In a string
2011using a multibyte character encoding (abstract) character consisting of
2012more than one byte are not treated as an entity. Each byte is treated
2013separately. The function is not locale-dependent.
28f540f4
RM
2014@end deftypefun
2015
8a2f1f5b
UD
2016@strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string
2017they is parsing, you should always copy the string to a temporary buffer
2018before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying and
2019Concatenation}). If you allow @code{strtok} or @code{wcstok} to modify
2020a string that came from another part of your program, you are asking for
2021trouble; that string might be used for other purposes after
2022@code{strtok} or @code{wcstok} has modified it, and it would not have
2023the expected value.
28f540f4
RM
2024
2025The string that you are operating on might even be a constant. Then
8a2f1f5b
UD
2026when @code{strtok} or @code{wcstok} tries to modify it, your program
2027will get a fatal signal for writing in read-only memory. @xref{Program
2028Error Signals}. Even if the operation of @code{strtok} or @code{wcstok}
2029would not require a modification of the string (e.g., if there is
1f77f049 2030exactly one token) the string can (and in the @glibcadj{} case will) be
8a2f1f5b 2031modified.
28f540f4
RM
2032
2033This is a special case of a general principle: if a part of a program
2034does not have as its purpose the modification of a certain data
2035structure, then it is error-prone to modify the data structure
2036temporarily.
2037
8a2f1f5b
UD
2038The functions @code{strtok} and @code{wcstok} are not reentrant.
2039@xref{Nonreentrancy}, for a discussion of where and why reentrancy is
2040important.
28f540f4
RM
2041
2042Here is a simple example showing the use of @code{strtok}.
2043
2044@comment Yes, this example has been tested.
2045@smallexample
2046#include <string.h>
2047#include <stddef.h>
2048
2049@dots{}
2050
5649a1d6 2051const char string[] = "words separated by spaces -- and, punctuation!";
28f540f4 2052const char delimiters[] = " .,;:!-";
5649a1d6 2053char *token, *cp;
28f540f4
RM
2054
2055@dots{}
2056
5649a1d6
UD
2057cp = strdupa (string); /* Make writable copy. */
2058token = strtok (cp, delimiters); /* token => "words" */
28f540f4
RM
2059token = strtok (NULL, delimiters); /* token => "separated" */
2060token = strtok (NULL, delimiters); /* token => "by" */
2061token = strtok (NULL, delimiters); /* token => "spaces" */
2062token = strtok (NULL, delimiters); /* token => "and" */
2063token = strtok (NULL, delimiters); /* token => "punctuation" */
2064token = strtok (NULL, delimiters); /* token => NULL */
2065@end smallexample
a5113b14 2066
1f77f049 2067@Theglibc{} contains two more functions for tokenizing a string
8a2f1f5b
UD
2068which overcome the limitation of non-reentrancy. They are only
2069available for multibyte character strings.
a5113b14
UD
2070
2071@comment string.h
2072@comment POSIX
2073@deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
dd7d45e8
UD
2074Just like @code{strtok}, this function splits the string into several
2075tokens which can be accessed by successive calls to @code{strtok_r}.
2076The difference is that the information about the next token is stored in
2077the space pointed to by the third argument, @var{save_ptr}, which is a
2078pointer to a string pointer. Calling @code{strtok_r} with a null
2079pointer for @var{newstring} and leaving @var{save_ptr} between the calls
2080unchanged does the job without hindering reentrancy.
a5113b14 2081
976780fd 2082This function is defined in POSIX.1 and can be found on many systems
a5113b14
UD
2083which support multi-threading.
2084@end deftypefun
2085
2086@comment string.h
2087@comment BSD
2088@deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
0050ad5f
UD
2089This function has a similar functionality as @code{strtok_r} with the
2090@var{newstring} argument replaced by the @var{save_ptr} argument. The
2091initialization of the moving pointer has to be done by the user.
2092Successive calls to @code{strsep} move the pointer along the tokens
2093separated by @var{delimiter}, returning the address of the next token
2094and updating @var{string_ptr} to point to the beginning of the next
2095token.
2096
2097One difference between @code{strsep} and @code{strtok_r} is that if the
2098input string contains more than one character from @var{delimiter} in a
2099row @code{strsep} returns an empty string for each pair of characters
2100from @var{delimiter}. This means that a program normally should test
2101for @code{strsep} returning an empty string before processing it.
9afc8a59 2102
a5113b14
UD
2103This function was introduced in 4.3BSD and therefore is widely available.
2104@end deftypefun
2105
2106Here is how the above example looks like when @code{strsep} is used.
2107
2108@comment Yes, this example has been tested.
2109@smallexample
2110#include <string.h>
2111#include <stddef.h>
2112
2113@dots{}
2114
5649a1d6 2115const char string[] = "words separated by spaces -- and, punctuation!";
a5113b14
UD
2116const char delimiters[] = " .,;:!-";
2117char *running;
2118char *token;
2119
2120@dots{}
2121
5649a1d6 2122running = strdupa (string);
a5113b14
UD
2123token = strsep (&running, delimiters); /* token => "words" */
2124token = strsep (&running, delimiters); /* token => "separated" */
2125token = strsep (&running, delimiters); /* token => "by" */
2126token = strsep (&running, delimiters); /* token => "spaces" */
9afc8a59
UD
2127token = strsep (&running, delimiters); /* token => "" */
2128token = strsep (&running, delimiters); /* token => "" */
2129token = strsep (&running, delimiters); /* token => "" */
a5113b14 2130token = strsep (&running, delimiters); /* token => "and" */
9afc8a59 2131token = strsep (&running, delimiters); /* token => "" */
a5113b14 2132token = strsep (&running, delimiters); /* token => "punctuation" */
9afc8a59 2133token = strsep (&running, delimiters); /* token => "" */
a5113b14
UD
2134token = strsep (&running, delimiters); /* token => NULL */
2135@end smallexample
b4012b75 2136
ec28fc7c
UD
2137@comment string.h
2138@comment GNU
2139@deftypefun {char *} basename (const char *@var{filename})
2140The GNU version of the @code{basename} function returns the last
9442cd75 2141component of the path in @var{filename}. This function is the preferred
ec28fc7c
UD
2142usage, since it does not modify the argument, @var{filename}, and
2143respects trailing slashes. The prototype for @code{basename} can be
2144found in @file{string.h}. Note, this function is overriden by the XPG
2145version, if @file{libgen.h} is included.
2146
2147Example of using GNU @code{basename}:
2148
2149@smallexample
2150#include <string.h>
2151
2152int
2153main (int argc, char *argv[])
2154@{
2155 char *prog = basename (argv[0]);
2156
2157 if (argc < 2)
2158 @{
2159 fprintf (stderr, "Usage %s <arg>\n", prog);
2160 exit (1);
2161 @}
2162
2163 @dots{}
2164@}
2165@end smallexample
2166
2167@strong{Portability Note:} This function may produce different results
2168on different systems.
2169
2170@end deftypefun
2171
2172@comment libgen.h
2173@comment XPG
2174@deftypefun {char *} basename (char *@var{path})
2175This is the standard XPG defined @code{basename}. It is similar in
2176spirit to the GNU version, but may modify the @var{path} by removing
2177trailing '/' characters. If the @var{path} is made up entirely of '/'
2178characters, then "/" will be returned. Also, if @var{path} is
2179@code{NULL} or an empty string, then "." is returned. The prototype for
e4a5f77d 2180the XPG version can be found in @file{libgen.h}.
ec28fc7c
UD
2181
2182Example of using XPG @code{basename}:
2183
2184@smallexample
2185#include <libgen.h>
2186
2187int
2188main (int argc, char *argv[])
2189@{
2190 char *prog;
2191 char *path = strdupa (argv[0]);
2192
2193 prog = basename (path);
2194
2195 if (argc < 2)
2196 @{
2197 fprintf (stderr, "Usage %s <arg>\n", prog);
2198 exit (1);
2199 @}
2200
2201 @dots{}
2202
2203@}
2204@end smallexample
2205@end deftypefun
2206
2207@comment libgen.h
2208@comment XPG
2209@deftypefun {char *} dirname (char *@var{path})
2210The @code{dirname} function is the compliment to the XPG version of
2211@code{basename}. It returns the parent directory of the file specified
2212by @var{path}. If @var{path} is @code{NULL}, an empty string, or
2213contains no '/' characters, then "." is returned. The prototype for this
2214function can be found in @file{libgen.h}.
2215@end deftypefun
0e4ee106
UD
2216
2217@node strfry
2218@section strfry
2219
2220The function below addresses the perennial programming quandary: ``How do
2221I take good data in string form and painlessly turn it into garbage?''
2222This is actually a fairly simple task for C programmers who do not use
1f77f049
JM
2223@theglibc{} string functions, but for programs based on @theglibc{},
2224the @code{strfry} function is the preferred method for
0e4ee106
UD
2225destroying string data.
2226
2227The prototype for this function is in @file{string.h}.
2228
2229@comment string.h
2230@comment GNU
ec28fc7c 2231@deftypefun {char *} strfry (char *@var{string})
0e4ee106
UD
2232
2233@code{strfry} creates a pseudorandom anagram of a string, replacing the
2234input with the anagram in place. For each position in the string,
2235@code{strfry} swaps it with a position in the string selected at random
2236(from a uniform distribution). The two positions may be the same.
2237
2238The return value of @code{strfry} is always @var{string}.
2239
1f77f049 2240@strong{Portability Note:} This function is unique to @theglibc{}.
0e4ee106
UD
2241
2242@end deftypefun
2243
2244
2245@node Trivial Encryption
2246@section Trivial Encryption
2247@cindex encryption
2248
2249
2250The @code{memfrob} function converts an array of data to something
2251unrecognizable and back again. It is not encryption in its usual sense
2252since it is easy for someone to convert the encrypted data back to clear
2253text. The transformation is analogous to Usenet's ``Rot13'' encryption
2254method for obscuring offensive jokes from sensitive eyes and such.
2255Unlike Rot13, @code{memfrob} works on arbitrary binary data, not just
2256text.
2257@cindex Rot13
2258
2259For true encryption, @xref{Cryptographic Functions}.
2260
2261This function is declared in @file{string.h}.
2262@pindex string.h
2263
2264@comment string.h
2265@comment GNU
2266@deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length})
2267
2268@code{memfrob} transforms (frobnicates) each byte of the data structure
2269at @var{mem}, which is @var{length} bytes long, by bitwise exclusive
2270oring it with binary 00101010. It does the transformation in place and
2271its return value is always @var{mem}.
2272
2273Note that @code{memfrob} a second time on the same data structure
2274returns it to its original state.
2275
2276This is a good function for hiding information from someone who doesn't
2277want to see it or doesn't want to see it very much. To really prevent
2278people from retrieving the information, use stronger encryption such as
2279that described in @xref{Cryptographic Functions}.
2280
1f77f049 2281@strong{Portability Note:} This function is unique to @theglibc{}.
0e4ee106
UD
2282
2283@end deftypefun
2284
b4012b75
UD
2285@node Encode Binary Data
2286@section Encode Binary Data
2287
2288To store or transfer binary data in environments which only support text
2289one has to encode the binary data by mapping the input bytes to
2290characters in the range allowed for storing or transfering. SVID
dd7d45e8
UD
2291systems (and nowadays XPG compliant systems) provide minimal support for
2292this task.
b4012b75
UD
2293
2294@comment stdlib.h
2295@comment XPG
2296@deftypefun {char *} l64a (long int @var{n})
dd7d45e8 2297This function encodes a 32-bit input value using characters from the
290639c3 2298basic character set. It returns a pointer to a 7 character buffer which
dd7d45e8
UD
2299contains an encoded version of @var{n}. To encode a series of bytes the
2300user must copy the returned string to a destination buffer. It returns
2301the empty string if @var{n} is zero, which is somewhat bizarre but
2302mandated by the standard.@*
2303@strong{Warning:} Since a static buffer is used this function should not
5649a1d6 2304be used in multi-threaded programs. There is no thread-safe alternative
dd7d45e8
UD
2305to this function in the C library.@*
2306@strong{Compatibility Note:} The XPG standard states that the return
2307value of @code{l64a} is undefined if @var{n} is negative. In the GNU
2308implementation, @code{l64a} treats its argument as unsigned, so it will
2309return a sensible encoding for any nonzero @var{n}; however, portable
2310programs should not rely on this.
b4012b75 2311
dd7d45e8
UD
2312To encode a large buffer @code{l64a} must be called in a loop, once for
2313each 32-bit word of the buffer. For example, one could do something
2314like this:
5649a1d6
UD
2315
2316@smallexample
2317char *
2318encode (const void *buf, size_t len)
2319@{
2320 /* @r{We know in advance how long the buffer has to be.} */
2321 unsigned char *in = (unsigned char *) buf;
2322 char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
290639c3 2323 char *cp = out, *p;
5649a1d6
UD
2324
2325 /* @r{Encode the length.} */
dd7d45e8 2326 /* @r{Using `htonl' is necessary so that the data can be}
290639c3
UD
2327 @r{decoded even on machines with different byte order.}
2328 @r{`l64a' can return a string shorter than 6 bytes, so }
2329 @r{we pad it with encoding of 0 (}'.'@r{) at the end by }
2330 @r{hand.} */
dd7d45e8 2331
290639c3
UD
2332 p = stpcpy (cp, l64a (htonl (len)));
2333 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2334
2335 while (len > 3)
2336 @{
2337 unsigned long int n = *in++;
2338 n = (n << 8) | *in++;
2339 n = (n << 8) | *in++;
2340 n = (n << 8) | *in++;
2341 len -= 4;
290639c3
UD
2342 p = stpcpy (cp, l64a (htonl (n)));
2343 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2344 @}
2345 if (len > 0)
2346 @{
2347 unsigned long int n = *in++;
2348 if (--len > 0)
2349 @{
2350 n = (n << 8) | *in++;
2351 if (--len > 0)
2352 n = (n << 8) | *in;
2353 @}
290639c3 2354 cp = stpcpy (cp, l64a (htonl (n)));
5649a1d6
UD
2355 @}
2356 *cp = '\0';
2357 return out;
2358@}
2359@end smallexample
2360
2361It is strange that the library does not provide the complete
dd7d45e8
UD
2362functionality needed but so be it.
2363
2364@end deftypefun
5649a1d6 2365
b4012b75
UD
2366To decode data produced with @code{l64a} the following function should be
2367used.
2368
5649a1d6
UD
2369@comment stdlib.h
2370@comment XPG
b4012b75
UD
2371@deftypefun {long int} a64l (const char *@var{string})
2372The parameter @var{string} should contain a string which was produced by
dd7d45e8
UD
2373a call to @code{l64a}. The function processes at least 6 characters of
2374this string, and decodes the characters it finds according to the table
2375below. It stops decoding when it finds a character not in the table,
2376rather like @code{atoi}; if you have a buffer which has been broken into
2377lines, you must be careful to skip over the end-of-line characters.
2378
2379The decoded number is returned as a @code{long int} value.
b4012b75 2380@end deftypefun
b13927da 2381
dd7d45e8
UD
2382The @code{l64a} and @code{a64l} functions use a base 64 encoding, in
2383which each character of an encoded string represents six bits of an
2384input word. These symbols are used for the base 64 digits:
2385
2386@multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx}
2387@item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7
2388@item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1}
2389 @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5}
2390@item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9}
2391 @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D}
2392@item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H}
2393 @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L}
2394@item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P}
2395 @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T}
2396@item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X}
2397 @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b}
2398@item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f}
2399 @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j}
2400@item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n}
2401 @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r}
2402@item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v}
2403 @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z}
2404@end multitable
2405
2406This encoding scheme is not standard. There are some other encoding
2407methods which are much more widely used (UU encoding, MIME encoding).
2408Generally, it is better to use one of these encodings.
2409
b13927da
UD
2410@node Argz and Envz Vectors
2411@section Argz and Envz Vectors
2412
5649a1d6 2413@cindex argz vectors (string vectors)
b13927da
UD
2414@cindex string vectors, null-character separated
2415@cindex argument vectors, null-character separated
2416@dfn{argz vectors} are vectors of strings in a contiguous block of
2417memory, each element separated from its neighbors by null-characters
2418(@code{'\0'}).
2419
5649a1d6 2420@cindex envz vectors (environment vectors)
b13927da
UD
2421@cindex environment vectors, null-character separated
2422@dfn{Envz vectors} are an extension of argz vectors where each element is a
5649a1d6 2423name-value pair, separated by a @code{'='} character (as in a Unix
b13927da
UD
2424environment).
2425
2426@menu
2427* Argz Functions:: Operations on argz vectors.
2428* Envz Functions:: Additional operations on environment vectors.
2429@end menu
2430
2431@node Argz Functions, Envz Functions, , Argz and Envz Vectors
2432@subsection Argz Functions
2433
2434Each argz vector is represented by a pointer to the first element, of
2435type @code{char *}, and a size, of type @code{size_t}, both of which can
2436be initialized to @code{0} to represent an empty argz vector. All argz
2437functions accept either a pointer and a size argument, or pointers to
2438them, if they will be modified.
2439
2440The argz functions use @code{malloc}/@code{realloc} to allocate/grow
2441argz vectors, and so any argz vector creating using these functions may
2442be freed by using @code{free}; conversely, any argz function that may
2443grow a string expects that string to have been allocated using
2444@code{malloc} (those argz functions that only examine their arguments or
2445modify them in place will work on any sort of memory).
2446@xref{Unconstrained Allocation}.
2447
2448All argz functions that do memory allocation have a return type of
2449@code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an
2450allocation error occurs.
2451
2452@pindex argz.h
2453These functions are declared in the standard include file @file{argz.h}.
2454
5649a1d6
UD
2455@comment argz.h
2456@comment GNU
b13927da 2457@deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len})
5649a1d6 2458The @code{argz_create} function converts the Unix-style argument vector
b13927da
UD
2459@var{argv} (a vector of pointers to normal C strings, terminated by
2460@code{(char *)0}; @pxref{Program Arguments}) into an argz vector with
2461the same elements, which is returned in @var{argz} and @var{argz_len}.
2462@end deftypefun
2463
5649a1d6
UD
2464@comment argz.h
2465@comment GNU
b13927da
UD
2466@deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len})
2467The @code{argz_create_sep} function converts the null-terminated string
2468@var{string} into an argz vector (returned in @var{argz} and
49c091e5 2469@var{argz_len}) by splitting it into elements at every occurrence of the
b13927da
UD
2470character @var{sep}.
2471@end deftypefun
2472
5649a1d6
UD
2473@comment argz.h
2474@comment GNU
b13927da
UD
2475@deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{arg_len})
2476Returns the number of elements in the argz vector @var{argz} and
2477@var{argz_len}.
2478@end deftypefun
2479
5649a1d6
UD
2480@comment argz.h
2481@comment GNU
b13927da
UD
2482@deftypefun {void} argz_extract (char *@var{argz}, size_t @var{argz_len}, char **@var{argv})
2483The @code{argz_extract} function converts the argz vector @var{argz} and
5649a1d6 2484@var{argz_len} into a Unix-style argument vector stored in @var{argv},
b13927da
UD
2485by putting pointers to every element in @var{argz} into successive
2486positions in @var{argv}, followed by a terminator of @code{0}.
2487@var{Argv} must be pre-allocated with enough space to hold all the
2488elements in @var{argz} plus the terminating @code{(char *)0}
2489(@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)}
2490bytes should be enough). Note that the string pointers stored into
2491@var{argv} point into @var{argz}---they are not copies---and so
2492@var{argz} must be copied if it will be changed while @var{argv} is
2493still active. This function is useful for passing the elements in
2494@var{argz} to an exec function (@pxref{Executing a File}).
2495@end deftypefun
2496
5649a1d6
UD
2497@comment argz.h
2498@comment GNU
b13927da
UD
2499@deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep})
2500The @code{argz_stringify} converts @var{argz} into a normal string with
2501the elements separated by the character @var{sep}, by replacing each
2502@code{'\0'} inside @var{argz} (except the last one, which terminates the
2503string) with @var{sep}. This is handy for printing @var{argz} in a
2504readable manner.
2505@end deftypefun
2506
5649a1d6
UD
2507@comment argz.h
2508@comment GNU
b13927da
UD
2509@deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str})
2510The @code{argz_add} function adds the string @var{str} to the end of the
2511argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and
2512@code{*@var{argz_len}} accordingly.
2513@end deftypefun
2514
5649a1d6
UD
2515@comment argz.h
2516@comment GNU
b13927da
UD
2517@deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim})
2518The @code{argz_add_sep} function is similar to @code{argz_add}, but
49c091e5 2519@var{str} is split into separate elements in the result at occurrences of
b13927da 2520the character @var{delim}. This is useful, for instance, for
5649a1d6 2521adding the components of a Unix search path to an argz vector, by using
b13927da
UD
2522a value of @code{':'} for @var{delim}.
2523@end deftypefun
2524
5649a1d6
UD
2525@comment argz.h
2526@comment GNU
b13927da
UD
2527@deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len})
2528The @code{argz_append} function appends @var{buf_len} bytes starting at
2529@var{buf} to the argz vector @code{*@var{argz}}, reallocating
2530@code{*@var{argz}} to accommodate it, and adding @var{buf_len} to
2531@code{*@var{argz_len}}.
2532@end deftypefun
2533
5649a1d6
UD
2534@comment argz.h
2535@comment GNU
30aa5785 2536@deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry})
b13927da
UD
2537If @var{entry} points to the beginning of one of the elements in the
2538argz vector @code{*@var{argz}}, the @code{argz_delete} function will
2539remove this entry and reallocate @code{*@var{argz}}, modifying
2540@code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as
2541destructive argz functions usually reallocate their argz argument,
2542pointers into argz vectors such as @var{entry} will then become invalid.
2543@end deftypefun
2544
5649a1d6
UD
2545@comment argz.h
2546@comment GNU
b13927da
UD
2547@deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry})
2548The @code{argz_insert} function inserts the string @var{entry} into the
2549argz vector @code{*@var{argz}} at a point just before the existing
2550element pointed to by @var{before}, reallocating @code{*@var{argz}} and
2551updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before}
2552is @code{0}, @var{entry} is added to the end instead (as if by
2553@code{argz_add}). Since the first element is in fact the same as
2554@code{*@var{argz}}, passing in @code{*@var{argz}} as the value of
2555@var{before} will result in @var{entry} being inserted at the beginning.
2556@end deftypefun
2557
5649a1d6
UD
2558@comment argz.h
2559@comment GNU
b13927da
UD
2560@deftypefun {char *} argz_next (char *@var{argz}, size_t @var{argz_len}, const char *@var{entry})
2561The @code{argz_next} function provides a convenient way of iterating
2562over the elements in the argz vector @var{argz}. It returns a pointer
2563to the next element in @var{argz} after the element @var{entry}, or
2564@code{0} if there are no elements following @var{entry}. If @var{entry}
2565is @code{0}, the first element of @var{argz} is returned.
2566
2567This behavior suggests two styles of iteration:
2568
2569@smallexample
2570 char *entry = 0;
2571 while ((entry = argz_next (@var{argz}, @var{argz_len}, entry)))
2572 @var{action};
2573@end smallexample
2574
2575(the double parentheses are necessary to make some C compilers shut up
2576about what they consider a questionable @code{while}-test) and:
2577
2578@smallexample
2579 char *entry;
2580 for (entry = @var{argz};
2581 entry;
2582 entry = argz_next (@var{argz}, @var{argz_len}, entry))
2583 @var{action};
2584@end smallexample
2585
2586Note that the latter depends on @var{argz} having a value of @code{0} if
2587it is empty (rather than a pointer to an empty block of memory); this
2588invariant is maintained for argz vectors created by the functions here.
2589@end deftypefun
2590
d705269e
UD
2591@comment argz.h
2592@comment GNU
2593@deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}})
49c091e5 2594Replace any occurrences of the string @var{str} in @var{argz} with
d705269e
UD
2595@var{with}, reallocating @var{argz} as necessary. If
2596@var{replace_count} is non-zero, @code{*@var{replace_count}} will be
2597incremented by number of replacements performed.
2598@end deftypefun
2599
b13927da
UD
2600@node Envz Functions, , Argz Functions, Argz and Envz Vectors
2601@subsection Envz Functions
2602
2603Envz vectors are just argz vectors with additional constraints on the form
2604of each element; as such, argz functions can also be used on them, where it
2605makes sense.
2606
2607Each element in an envz vector is a name-value pair, separated by a @code{'='}
2608character; if multiple @code{'='} characters are present in an element, those
2609after the first are considered part of the value, and treated like all other
2610non-@code{'\0'} characters.
2611
2612If @emph{no} @code{'='} characters are present in an element, that element is
2613considered the name of a ``null'' entry, as distinct from an entry with an
2614empty value: @code{envz_get} will return @code{0} if given the name of null
2615entry, whereas an entry with an empty value would result in a value of
2616@code{""}; @code{envz_entry} will still find such entries, however. Null
2617entries can be removed with @code{envz_strip} function.
2618
2619As with argz functions, envz functions that may allocate memory (and thus
2620fail) have a return type of @code{error_t}, and return either @code{0} or
2621@code{ENOMEM}.
2622
2623@pindex envz.h
2624These functions are declared in the standard include file @file{envz.h}.
2625
5649a1d6
UD
2626@comment envz.h
2627@comment GNU
b13927da
UD
2628@deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
2629The @code{envz_entry} function finds the entry in @var{envz} with the name
2630@var{name}, and returns a pointer to the whole entry---that is, the argz
2631element which begins with @var{name} followed by a @code{'='} character. If
2632there is no entry with that name, @code{0} is returned.
2633@end deftypefun
2634
5649a1d6
UD
2635@comment envz.h
2636@comment GNU
b13927da
UD
2637@deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
2638The @code{envz_get} function finds the entry in @var{envz} with the name
2639@var{name} (like @code{envz_entry}), and returns a pointer to the value
2640portion of that entry (following the @code{'='}). If there is no entry with
2641that name (or only a null entry), @code{0} is returned.
2642@end deftypefun
2643
5649a1d6
UD
2644@comment envz.h
2645@comment GNU
b13927da
UD
2646@deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value})
2647The @code{envz_add} function adds an entry to @code{*@var{envz}}
2648(updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name
2649@var{name}, and value @var{value}. If an entry with the same name
2650already exists in @var{envz}, it is removed first. If @var{value} is
2651@code{0}, then the new entry will the special null type of entry
2652(mentioned above).
2653@end deftypefun
2654
5649a1d6
UD
2655@comment envz.h
2656@comment GNU
b13927da
UD
2657@deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override})
2658The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz},
2659as if with @code{envz_add}, updating @code{*@var{envz}} and
2660@code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2}
2661will supersede those with the same name in @var{envz}, otherwise not.
2662
2663Null entries are treated just like other entries in this respect, so a null
2664entry in @var{envz} can prevent an entry of the same name in @var{envz2} from
2665being added to @var{envz}, if @var{override} is false.
2666@end deftypefun
2667
5649a1d6
UD
2668@comment envz.h
2669@comment GNU
b13927da
UD
2670@deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len})
2671The @code{envz_strip} function removes any null entries from @var{envz},
2672updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2673@end deftypefun
This page took 0.608537 seconds and 5 git commands to generate.