[RFC] Add new C.UTF-8 locale.
Mon Jun 22 21:33:16 GMT 2020
* Carlos O'Donell:
> However, after considering this more deeply I think we can actually
> handle this differently.
> Consider the following:
> (a) Currently the full collation with weights is 28MiB of data.
> This is too big for most container deployments of C.UTF-8.
> (b) If we agree that surrogate pairs would be invalid UTF-8 anyway,
> then we can use the equivalent of LC_COLLATE set to C to get code
> point ordering, with the understanding that surrogate pairs if
> present would sort into their code point ordering.
> In general this would allow a full C.UTF-8 with code point ordering
> that doesn't take up 28MiB with weight data that isn't really required.
> This suggestion was made by Rich Felker (musl) and Peter
> Eisentraut (postgresql).
> I'm going to see if I can hack up a C.UTF-8 that uses only sorting of
> the first byte to get full code point sorting.
I'm worried you still need tables to get a working wcscoll. But
otherwise, the plan sounds fine.
More information about the Libc-alpha