This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Should glibc provide a builtin C.UTF-8 locale?


Carlos O'Donell wrote:
Is anyone opposed to having glibc contain a builtin C.UTF-8 locale?
This locale would have the same rules as the C locale when set for
LC_ALL.

In reading followups it seems this point wasn't entirely clear. I took it to mean that "C.utf8" is like "C" except with UTF-8 encoding, so that (for example) there are only 26 alphabetic characters in "C.utf8". This should allow a compact implementation, which uses the same (small) character tables for both the C and the C.utf8 locales.

Others, however, seem to be thinking that the new locale would use bigger tables that encompass all Unicode characters, so that there would be thousands of alphabetic characters. This also sounds useful, for applications that need to know whether a character is a Unicode letter regardless of language. Many applications, though, don't need this extra information, and would work well with the more-compact approach.

This suggests that we add two locales: "C.utf8" could be a minimal locale that is as close as possible to the "C" locale while adding UTF-8, and "i18n.utf8" could be a bigger locale, basically the i18n locale of ISO/IEC TR 30112. The "C.utf8" locale could easily be built into glibc for performance, just as "C" is; the "i18n.utf8" locale could use tables compiled with localedef like all the other locales.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]