[PATCH v4 0/4] Add new C.UTF-8 locale (Bug 17318)

Carlos O'Donell carlos@redhat.com
Wed Apr 28 13:00:29 GMT 2021


In order to make implementing the C.UTF-8 locale easier there are
several steps that should be taken before the locale is added:

1) Implement wide ellipsis range handling for UTF-8 to simplify
   the LC_COLLATE description in the locale.
2) Update the UTF-8 charmap processing to include all code points
   (excluding surrogates) and make use of the wide ellipsis ranges.
4) Regenerate the UTF-8 character map with the new characters
   for full code point coverage.

The new C.UTF-8 locale is not added to SUPPORTED because it is
28MiB in size due to the size of the weights array in LC_COLLATE
for the full set of code points. Before we can make C.UTF-8
supported we must simplify the weights processing to use strcmp
and remove the weights array from the binary data. To some extent
this is a reference implementation from which we can test a newer
version or a builtin version that has the size and performance
we expect.

Carlos O'Donell (4):
  Add support for processing wide ellipsis ranges in UTF-8.
  Update UTF-8 charmap processing.
  Regenerate localedata files.
  Add generic C.UTF-8 locale (Bug 17318)

 locale/programs/charmap.c              |  174 +-
 localedata/C.UTF-8.in                  |  156 +
 localedata/Makefile                    |    2 +
 localedata/charmaps/UTF-8              | 4396 ++++--------------------
 localedata/locales/C                   |  188 +
 localedata/locales/i18n_ctype          |    2 +-
 localedata/locales/tr_TR               |    2 +-
 localedata/locales/translit_circle     |    2 +-
 localedata/locales/translit_cjk_compat |    2 +-
 localedata/locales/translit_combining  |    2 +-
 localedata/locales/translit_compat     |    2 +-
 localedata/locales/translit_font       |    2 +-
 localedata/locales/translit_fraction   |    2 +-
 localedata/unicode-gen/utf8_gen.py     |  133 +-
 14 files changed, 1288 insertions(+), 3777 deletions(-)
 create mode 100644 localedata/C.UTF-8.in
 create mode 100644 localedata/locales/C

-- 
2.26.3



More information about the Libc-alpha mailing list