This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Should glibc provide a builtin C.UTF-8 locale?

On Wed, Feb 11, 2015 at 11:24:35AM -0500, Carlos O'Donell wrote:
> Is anyone opposed to having glibc contain a builtin C.UTF-8 locale?
> This locale would have the same rules as the C locale when set for
> The locale would provide sensible fallback for developers that need
> UTF-8 but until C.UTF-8 was provided, could not rely upon it.
> My best guess is that it will take ~1.5MB of data to include the
> UTF-8 locale in the runtime. If you do it right this is shared
> for all processes, and give you, in this the 20th century, a fallback
> that is sensible for all developers of all languages.
> We have had on-and-off requests for this for years as UTF-8 has become
> the defacto standard.
> The most recent request is from the Python 3 folks who want to be able
> to assume there is some kind of UTF-8 support in the system regardless
> of the installed locales.
> Is this the right way forward? Or should we tell the distributions
> that it is their responsibility to ship and always provide a C.UTF-8?
> Comments?

I'm highly in favor of this, but I wonder why it requires so much
data. Am I correct in assuming that's for case mappings and character
classes? How would static linking be affected? It's possible to
represent this data in much smaller size, -- it's about 8k in musl --
but doing so requires significantly different data structures from
what glibc uses, and the case-mapping is significantly slower than
what some users would like/expect. But perhaps there's some middle
ground and a way glibc could represent its C.UTF-8 locale without the
full weight you're looking at.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]