This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Handle surrogate pairs in c16rtomb (bug 23794, DR#488, C2X)

From: Florian Weimer <fweimer at redhat dot com>
To: Joseph Myers <joseph at codesourcery dot com>
Cc: <libc-alpha at sourceware dot org>
Date: Fri, 19 Oct 2018 07:42:48 +0200
Subject: Re: Handle surrogate pairs in c16rtomb (bug 23794, DR#488, C2X)
References: <alpine.DEB.2.21.1810182138270.4093@digraph.polyomino.org.uk>

* Joseph Myers:

> +	/* This is not a low surrogate; ensure an EILSEQ error by
> +	   trying to decode the high surrogate as a wide character on
> +	   its own.  */
> +	wc = ps->__value.__wch;

Is this an attempt to support CESU-8 in the future, which is a
UTF-8-style multi-byte encoding for UCS-2, so it can encode lone
surrogate pairs?  Or just something to reduce code size?

> +  /* Test errors for invalid conversions.  */
> +  static const char16_t err_cases[][2] =
> +    {
> +      /* High surrogate followed by non-surrogate.  */
> +      { 0xd800, 0x1 },
> +      /* High surrogate followed by another high surrogate.  */
> +      { 0xd800, 0xd800 },
> +      /* Low surrogate not following high surrogate.  */
> +      { 0xdc00, 0 }

You could add a test for a low surrogate/high surrogate sequence.

Most of the TEST_VERIFY comparisons could use TEST_COMPARE for improved
error diagnostics.

Looks good otherwise.  Thanks.

Florian

Follow-Ups:
- Re: Handle surrogate pairs in c16rtomb (bug 23794, DR#488, C2X)
  - From: Joseph Myers

References:
- Handle surrogate pairs in c16rtomb (bug 23794, DR#488, C2X)
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]