This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Handle surrogate pairs in c16rtomb (bug 23794, DR#488, C2X)
- From: Florian Weimer <fweimer at redhat dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: <libc-alpha at sourceware dot org>
- Date: Fri, 19 Oct 2018 07:42:48 +0200
- Subject: Re: Handle surrogate pairs in c16rtomb (bug 23794, DR#488, C2X)
- References: <alpine.DEB.2.21.1810182138270.4093@digraph.polyomino.org.uk>
* Joseph Myers:
> + /* This is not a low surrogate; ensure an EILSEQ error by
> + trying to decode the high surrogate as a wide character on
> + its own. */
> + wc = ps->__value.__wch;
Is this an attempt to support CESU-8 in the future, which is a
UTF-8-style multi-byte encoding for UCS-2, so it can encode lone
surrogate pairs? Or just something to reduce code size?
> + /* Test errors for invalid conversions. */
> + static const char16_t err_cases[][2] =
> + {
> + /* High surrogate followed by non-surrogate. */
> + { 0xd800, 0x1 },
> + /* High surrogate followed by another high surrogate. */
> + { 0xd800, 0xd800 },
> + /* Low surrogate not following high surrogate. */
> + { 0xdc00, 0 }
You could add a test for a low surrogate/high surrogate sequence.
Most of the TEST_VERIFY comparisons could use TEST_COMPARE for improved
error diagnostics.
Looks good otherwise. Thanks.
Florian