[PATCH 2/2] Add new C.UTF-8 locale (Bug 17318)
Carlos O'Donell
carlos@redhat.com
Mon Jun 29 04:22:48 GMT 2020
Patch is an xz compressed attachment because otherwise is is ~15MiB
of test data that contains almost all valid UTF-8 characters.
8< --- 8< --- 8<
We add a new C.UTF-8 locale. This locale is not builtin to glibc,
but is provided as a distinct locale. The locale provides full
support for UTF-8 and this includes full code point sorting via
collation. Unfortuantely given the present implementation in glibc
this results in 28MiB of LC_COLLATE data for all possible Unicode
code points. Future improvements may reduce this size. Such
improvements likely require a shortcut for the collation data that
relies on C.UTF-8 single-byte sorting being equivalent to strcmp.
The new locale is NOT added to SUPPORTED. Test data for almost all
code points (minus those not supported by collate-test) is provided
in C.UTF-8.in, and this verifies full code point sorting is working.
The next step is to reduce LC_COLLATE to a manageable size before we
enable the locale in SUPPORTED. Currently the C.UTF-8 testing can
add ~5-7 minutes to the locale testing (collate-test, and xfrm-test
twice) so we don't enable this either until we can parallelize the
sort-test test. Testing sort-test with C.UTF-8 passes cleanly.
No regressions on x86_64 or i686.
---
locale/programs/charmap.c | 170 +-
localedata/C.UTF-8.in | 852388 ++++++++++++++++++++++
localedata/charmaps/UTF-8 | 4396 +-
localedata/locales/C | 192 +
localedata/locales/i18n_ctype | 2 +-
localedata/locales/tr_TR | 2 +-
localedata/locales/translit_circle | 2 +-
localedata/locales/translit_cjk_compat | 2 +-
localedata/locales/translit_combining | 2 +-
localedata/locales/translit_compat | 2 +-
localedata/locales/translit_font | 2 +-
localedata/locales/translit_fraction | 2 +-
localedata/unicode-gen/utf8_gen.py | 174 +-
13 files changed, 853557 insertions(+), 3779 deletions(-)
create mode 100644 localedata/C.UTF-8.in
create mode 100644 localedata/locales/C
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Add-new-C.UTF-8-locale-Bug-17318.patch.xz
Type: application/x-xz
Size: 815252 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200629/2ae059a1/attachment-0001.xz>
More information about the Libc-alpha
mailing list