[PATCH 2/2] Add new C.UTF-8 locale (Bug 17318)

Carlos O'Donell carlos@redhat.com
Mon Jun 29 04:22:48 GMT 2020

Patch is an xz compressed attachment because otherwise is is ~15MiB
of test data that contains almost all valid UTF-8 characters.

8< --- 8< --- 8<
We add a new C.UTF-8 locale.  This locale is not builtin to glibc,
but is provided as a distinct locale.  The locale provides full
support for UTF-8 and this includes full code point sorting via
collation.  Unfortuantely given the present implementation in glibc
this results in 28MiB of LC_COLLATE data for all possible Unicode
code points.  Future improvements may reduce this size. Such
improvements likely require a shortcut for the collation data that
relies on C.UTF-8 single-byte sorting being equivalent to strcmp.

The new locale is NOT added to SUPPORTED.  Test data for almost all
code points (minus those not supported by collate-test) is provided
in C.UTF-8.in, and this verifies full code point sorting is working.
The next step is to reduce LC_COLLATE to a manageable size before we
enable the locale in SUPPORTED. Currently the C.UTF-8 testing can
add ~5-7 minutes to the locale testing (collate-test, and xfrm-test
twice) so we don't enable this either until we can parallelize the
sort-test test. Testing sort-test with C.UTF-8 passes cleanly.

No regressions on x86_64 or i686.
 locale/programs/charmap.c              |    170 +-
 localedata/C.UTF-8.in                  | 852388 ++++++++++++++++++++++
 localedata/charmaps/UTF-8              |   4396 +-
 localedata/locales/C                   |    192 +
 localedata/locales/i18n_ctype          |      2 +-
 localedata/locales/tr_TR               |      2 +-
 localedata/locales/translit_circle     |      2 +-
 localedata/locales/translit_cjk_compat |      2 +-
 localedata/locales/translit_combining  |      2 +-
 localedata/locales/translit_compat     |      2 +-
 localedata/locales/translit_font       |      2 +-
 localedata/locales/translit_fraction   |      2 +-
 localedata/unicode-gen/utf8_gen.py     |    174 +-
 13 files changed, 853557 insertions(+), 3779 deletions(-)
 create mode 100644 localedata/C.UTF-8.in
 create mode 100644 localedata/locales/C
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Add-new-C.UTF-8-locale-Bug-17318.patch.xz
Type: application/x-xz
Size: 815252 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200629/2ae059a1/attachment-0001.xz>

More information about the Libc-alpha mailing list