This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Making an embedded UTF-8 C locale
- From: Roger Leigh <rleigh at codelibre dot net>
- To: libc-alpha at sourceware dot org
- Date: Sun, 20 Sep 2009 23:05:35 +0100
- Subject: Making an embedded UTF-8 C locale
Hi folks,
I've just subscribed to the list. Just to introduce myself, I'm
a Debian developer with several interests, one of which is
distribution-wide support of UCS and UTF-8. This mail is about
UTF-8 support in the glibc C locale.
What I'd like to do/propose is twofold:
1) In addition to the "C" locale hardcoded into glibc, I'd like
to additionally provide a "C.UTF-8" locale. This would be
identical to the standard C locale and would remain POSIX
compliant, with the exception that the locale codeset would
be UTF-8 instead of ASCII.
2) At some future point, I'd like to make the "C.UTF-8" locale
the default "C" locale, but that's not really the goal right
at this point.
This came out of the discussion in this bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776
and this thread on debian-devel:
http://lists.debian.org/debian-devel/2009/08/msg00311.html
http://lists.debian.org/debian-devel/2009/08/msg00413.html
To summarise, there is a need for a standard default UTF-8 locale
that can be relied upon to be present at all times. For the above,
these were needed at package build time where a particular package
needed a UTF-8 locale for its build, and for a system service at
system startup (before /usr gets mounted). Having the locale
embedded directly into glibc would allow it to be used right from
starting init, and would be something that could be relied upon
to always be present. Right now, you need to know in advance the
name of a UTF-8 locale, but it can't be relied upon to be present
on all systems, and it isn't present before /usr is mounted.
I've spent some time looking through the glibc sources to look at
making a patch for this, but I'm afraid I'm insufficiently
familiar with the sources and internal locale data structures to
take a good stab at it. Could anyone point me at any documentation
of this, if available or provide any pointers for where to get
started?
One thing I was unsure of is if the C locale source files were
created by hand or had been generated by a tool at some point
in the past from the locale data files and, if so, if the same
process could be used to generate UTF-8 equivalents?
Many thanks,
Roger
--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.