[PATCH] Reset converter state after second wchar_t output (Bug 25734)

Carlos O'Donell carlos@redhat.com
Mon Mar 30 14:34:25 GMT 2020

On 3/30/20 8:11 AM, Andreas Schwab wrote:
> On Mär 27 2020, Carlos O'Donell wrote:
>> An input BIG5-HKSCS character may be converted into at most 2 whcar_t
>> characters.
> Could someone please file an interpretation request for POSIX, what
> should happen in that two wchar_t case?  I think a case could be made
> that mbrtowc should return -1/EILSEQ, but that also has implications for
> mbsrtowcs, since that is defined in terms of repeated application of
> mbrtowc.

Tom had already reached out to ISO C WG14 to discuss this since there is
harmonization there between C and POSIX.

The current thinking in WG14 is that BIG5-HKSCS violates the definition
of "wide character" because the wchar_t value cannot represent the
original character in the locale and so the semantics underlying
BIG5-HKSCS will not map to the current API designs.

Florian raised a similar issue in May of 2019 and the general feedback
at that time was that BIG5-HKSCS is simply not supported by ISO C.
I expect the same answer from POSIX which is harmonized with ISO C in
this case.

If BIG5-HKSCS is not supported, then the standard will have nothing to
say about which values can be returned after the first or second input
bytes are read.


(a) Do not change the current converter. We return 2 consumed bytes in
    the first conversion, and 0 on the second.

(b) Look ahead and split the conversion. We return 1 consumed in the
    first conversion, and 1 on the second. This prevents us from returning
    0 which may be interpreted as a L'\0'. This leads to false assumption
    that the user could stop the conversion at this point and modify the

(c) Design something new. Return (size_t) -3 indicating a One-to-Many
    conversion is happening and that there is more output to be generated.

Do you still want me to file an interpretation request with POSIX?


