[PATCH] Reset converter state after second wchar_t output (Bug 25734)

Carlos O'Donell carlos@redhat.com
Mon Mar 30 14:34:25 GMT 2020


On 3/30/20 8:11 AM, Andreas Schwab wrote:
> On Mär 27 2020, Carlos O'Donell wrote:
> 
>> An input BIG5-HKSCS character may be converted into at most 2 whcar_t
>> characters.
> 
> Could someone please file an interpretation request for POSIX, what
> should happen in that two wchar_t case?  I think a case could be made
> that mbrtowc should return -1/EILSEQ, but that also has implications for
> mbsrtowcs, since that is defined in terms of repeated application of
> mbrtowc.

Tom had already reached out to ISO C WG14 to discuss this since there is
harmonization there between C and POSIX.

The current thinking in WG14 is that BIG5-HKSCS violates the definition
of "wide character" because the wchar_t value cannot represent the
original character in the locale and so the semantics underlying
BIG5-HKSCS will not map to the current API designs.

Florian raised a similar issue in May of 2019 and the general feedback
at that time was that BIG5-HKSCS is simply not supported by ISO C.
I expect the same answer from POSIX which is harmonized with ISO C in
this case.

If BIG5-HKSCS is not supported, then the standard will have nothing to
say about which values can be returned after the first or second input
bytes are read.

Options:

(a) Do not change the current converter. We return 2 consumed bytes in
    the first conversion, and 0 on the second.

(b) Look ahead and split the conversion. We return 1 consumed in the
    first conversion, and 1 on the second. This prevents us from returning
    0 which may be interpreted as a L'\0'. This leads to false assumption
    that the user could stop the conversion at this point and modify the
    input.

(c) Design something new. Return (size_t) -3 indicating a One-to-Many
    conversion is happening and that there is more output to be generated.

Do you still want me to file an interpretation request with POSIX?

-- 
Cheers,
Carlos.



More information about the Libc-alpha mailing list