This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Handle surrogate pairs in c16rtomb (bug 23794, DR#488, C2X)


* Joseph Myers:

> +	/* This is not a low surrogate; ensure an EILSEQ error by
> +	   trying to decode the high surrogate as a wide character on
> +	   its own.  */
> +	wc = ps->__value.__wch;

Is this an attempt to support CESU-8 in the future, which is a
UTF-8-style multi-byte encoding for UCS-2, so it can encode lone
surrogate pairs?  Or just something to reduce code size?

> +  /* Test errors for invalid conversions.  */
> +  static const char16_t err_cases[][2] =
> +    {
> +      /* High surrogate followed by non-surrogate.  */
> +      { 0xd800, 0x1 },
> +      /* High surrogate followed by another high surrogate.  */
> +      { 0xd800, 0xd800 },
> +      /* Low surrogate not following high surrogate.  */
> +      { 0xdc00, 0 }

You could add a test for a low surrogate/high surrogate sequence.

Most of the TEST_VERIFY comparisons could use TEST_COMPARE for improved
error diagnostics.

Looks good otherwise.  Thanks.

Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]