This is the mail archive of the
mailing list for the glibc project.
Re: Should glibc provide a builtin C.UTF-8 locale?
- From: Paul Eggert <eggert at cs dot ucla dot edu>
- To: Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Wed, 11 Feb 2015 16:57:49 -0800
- Subject: Re: Should glibc provide a builtin C.UTF-8 locale?
- Authentication-results: sourceware.org; auth=none
- References: <54DB8243 dot 3050903 at redhat dot com>
Carlos O'Donell wrote:
Is anyone opposed to having glibc contain a builtin C.UTF-8 locale?
This locale would have the same rules as the C locale when set for
In reading followups it seems this point wasn't entirely clear. I took it to
mean that "C.utf8" is like "C" except with UTF-8 encoding, so that (for example)
there are only 26 alphabetic characters in "C.utf8". This should allow a
compact implementation, which uses the same (small) character tables for both
the C and the C.utf8 locales.
Others, however, seem to be thinking that the new locale would use bigger tables
that encompass all Unicode characters, so that there would be thousands of
alphabetic characters. This also sounds useful, for applications that need to
know whether a character is a Unicode letter regardless of language. Many
applications, though, don't need this extra information, and would work well
with the more-compact approach.
This suggests that we add two locales: "C.utf8" could be a minimal locale that
is as close as possible to the "C" locale while adding UTF-8, and "i18n.utf8"
could be a bigger locale, basically the i18n locale of ISO/IEC TR 30112. The
"C.utf8" locale could easily be built into glibc for performance, just as "C"
is; the "i18n.utf8" locale could use tables compiled with localedef like all the