[RFC] Add new C.UTF-8 locale.

Florian Weimer fweimer@redhat.com
Mon Jun 22 21:33:16 GMT 2020

* Carlos O'Donell:

> However, after considering this more deeply I think we can actually
> handle this differently.
> Consider the following:
> (a) Currently the full collation with weights is 28MiB of data.
>     This is too big for most container deployments of C.UTF-8.
> (b) If we agree that surrogate pairs would be invalid UTF-8 anyway,
>     then we can use the equivalent of LC_COLLATE set to C to get code
>     point ordering, with the understanding that surrogate pairs if
>     present would sort into their code point ordering.
> In general this would allow a full C.UTF-8 with code point ordering
> that doesn't take up 28MiB with weight data that isn't really required.
> This suggestion was made by Rich Felker (musl) and Peter
> Eisentraut (postgresql).
> I'm going to see if I can hack up a C.UTF-8 that uses only sorting of
> the first byte to get full code point sorting.
> Thoughts?

I'm worried you still need tables to get a working wcscoll.  But
otherwise, the plan sounds fine.


More information about the Libc-alpha mailing list