This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Should glibc provide a builtin C.UTF-8 locale?

From: Paul Eggert <eggert at cs dot ucla dot edu>
To: Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Wed, 11 Feb 2015 16:57:49 -0800
Subject: Re: Should glibc provide a builtin C.UTF-8 locale?
Authentication-results: sourceware.org; auth=none
References: <54DB8243 dot 3050903 at redhat dot com>

Carlos O'Donell wrote:

Is anyone opposed to having glibc contain a builtin C.UTF-8 locale?
This locale would have the same rules as the C locale when set for
LC_ALL.

In reading followups it seems this point wasn't entirely clear. I took it tomean that "C.utf8" is like "C" except with UTF-8 encoding, so that (for example)there are only 26 alphabetic characters in "C.utf8". This should allow acompact implementation, which uses the same (small) character tables for boththe C and the C.utf8 locales.

Others, however, seem to be thinking that the new locale would use bigger tablesthat encompass all Unicode characters, so that there would be thousands ofalphabetic characters. This also sounds useful, for applications that need toknow whether a character is a Unicode letter regardless of language. Manyapplications, though, don't need this extra information, and would work wellwith the more-compact approach.

This suggests that we add two locales: "C.utf8" could be a minimal locale thatis as close as possible to the "C" locale while adding UTF-8, and "i18n.utf8"could be a bigger locale, basically the i18n locale of ISO/IEC TR 30112. The"C.utf8" locale could easily be built into glibc for performance, just as "C"is; the "i18n.utf8" locale could use tables compiled with localedef like all theother locales.

Follow-Ups:
- Re: Should glibc provide a builtin C.UTF-8 locale?
  - From: Carlos O'Donell
- Re: Should glibc provide a builtin C.UTF-8 locale?
  - From: Mike Frysinger

References:
- Should glibc provide a builtin C.UTF-8 locale?
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]