Bug in mbsrtowcs?
Jeff Johnston
jjohnstn@redhat.com
Fri Feb 13 20:54:00 GMT 2009
Corinna Vinschen wrote:
> Hi,
>
> while I'm looking into implementing the new SUSv4 functions wcsnrtombs
> and mbsnrtowcs, I started puzzeling over a strange piece of code in
> mbsrtowcs:
>
> while (n > 0)
> {
> bytes = _mbrtowc_r (r, ptr, *src, nms, ps);
> [...]
> else if (bytes == -2)
> *src += MB_CUR_MAX;
> else [...]
> }
>
> So, if the byte sequence starting at *src is an incomplete multibyte
> char, *src is skipped by MB_CUR_MAX and the loop continues.
>
> Hang on. If _mbrtowc_r encounters an incomplete MB char then it does
> not form an invalid character so there's no reason to return with -1 and
> set errno to EILSEQ. However, it also doesn't form a *valid* character,
> it's just incomplete. Thus it must be the start of the last character
> at the end of the input string.
>
>
This code is there because it means that the character has redundant
shift state. From mbrtowc:
(*size_t*)-2
If the next /n/ bytes contribute to an incomplete but potentially
valid character, and all /n/ bytes have been processed (no value is
stored). When /n/ has at least the value of the {MB_CUR_MAX} macro,
this case can only occur if /s/ points at a sequence of redundant
shift sequences (for implementations with state-dependent encodings).
In our case, n is MB_CUR_MAX so it must be redundant shift sequence.
The state is stored so if we increase the src pointer, it should
continue where it left off.
-- Jeff J.
More information about the Newlib
mailing list