Bug in mbsrtowcs?

Jeff Johnston jjohnstn@redhat.com
Fri Feb 13 20:54:00 GMT 2009


Corinna Vinschen wrote:
> Hi,
>
> while I'm looking into implementing the new SUSv4 functions wcsnrtombs
> and mbsnrtowcs, I started puzzeling over a strange piece of code in
> mbsrtowcs:
>
>   while (n > 0)
>     {
>       bytes = _mbrtowc_r (r, ptr, *src, nms, ps);
>       [...]
>       else if (bytes == -2)
>         *src += MB_CUR_MAX;
>       else [...]
>     }
>
> So, if the byte sequence starting at *src is an incomplete multibyte
> char, *src is skipped by MB_CUR_MAX and the loop continues.
>
> Hang on.  If _mbrtowc_r encounters an incomplete MB char then it does
> not form an invalid character so there's no reason to return with -1 and
> set errno to EILSEQ.  However, it also doesn't form a *valid* character,
> it's just incomplete.  Thus it must be the start of the last character
> at the end of the input string.
>
>   
This code is there because it means that the character has redundant 
shift state.  From mbrtowc:

(*size_t*)-2
    If the next /n/ bytes contribute to an incomplete but potentially
    valid character, and all /n/ bytes have been processed (no value is
    stored). When /n/ has at least the value of the {MB_CUR_MAX} macro,
    this case can only occur if /s/ points at a sequence of redundant
    shift sequences (for implementations with state-dependent encodings).

In our case, n is MB_CUR_MAX so it must be redundant shift sequence.  
The state is stored so if we increase the src pointer, it should 
continue where it left off.

-- Jeff J.




More information about the Newlib mailing list