[PATCH v4 0/4] Add new C.UTF-8 locale (Bug 17318)
Carlos O'Donell
carlos@redhat.com
Wed Apr 28 13:00:29 GMT 2021
In order to make implementing the C.UTF-8 locale easier there are
several steps that should be taken before the locale is added:
1) Implement wide ellipsis range handling for UTF-8 to simplify
the LC_COLLATE description in the locale.
2) Update the UTF-8 charmap processing to include all code points
(excluding surrogates) and make use of the wide ellipsis ranges.
4) Regenerate the UTF-8 character map with the new characters
for full code point coverage.
The new C.UTF-8 locale is not added to SUPPORTED because it is
28MiB in size due to the size of the weights array in LC_COLLATE
for the full set of code points. Before we can make C.UTF-8
supported we must simplify the weights processing to use strcmp
and remove the weights array from the binary data. To some extent
this is a reference implementation from which we can test a newer
version or a builtin version that has the size and performance
we expect.
Carlos O'Donell (4):
Add support for processing wide ellipsis ranges in UTF-8.
Update UTF-8 charmap processing.
Regenerate localedata files.
Add generic C.UTF-8 locale (Bug 17318)
locale/programs/charmap.c | 174 +-
localedata/C.UTF-8.in | 156 +
localedata/Makefile | 2 +
localedata/charmaps/UTF-8 | 4396 ++++--------------------
localedata/locales/C | 188 +
localedata/locales/i18n_ctype | 2 +-
localedata/locales/tr_TR | 2 +-
localedata/locales/translit_circle | 2 +-
localedata/locales/translit_cjk_compat | 2 +-
localedata/locales/translit_combining | 2 +-
localedata/locales/translit_compat | 2 +-
localedata/locales/translit_font | 2 +-
localedata/locales/translit_fraction | 2 +-
localedata/unicode-gen/utf8_gen.py | 133 +-
14 files changed, 1288 insertions(+), 3777 deletions(-)
create mode 100644 localedata/C.UTF-8.in
create mode 100644 localedata/locales/C
--
2.26.3
More information about the Libc-alpha
mailing list