[RFC] Add new C.UTF-8 locale.

Florian Weimer fweimer@redhat.com
Mon Jun 29 08:36:35 GMT 2020


* Andreas Schwab:

> On Jun 21 2020, Carlos O'Donell via Libc-alpha wrote:
>
>> +  /* Three byte range.  */
>> +  if (cp >= 0x800 && cp <= 0xffff)
>
> Should that exclude the surrogate area?

I don't think so, for consistency with:

+    Note that old glibc UTF-8 charmap left the surrogates commented out.
+    We keep the surrogate entries because we want to be able to sort the
+    invalid values into a consistent location.

This refers to the entries for <UD800>, not the multibyte sequences.
I think we should aim for consistency between strcoll and wcscoll even
for invalid sequences.

Thanks,
Florian



More information about the Libc-alpha mailing list