This is the mail archive of the
mailing list for the Cygwin project.
Re: "C" UTF-8 trouble
According to Corinna Vinschen on 10/7/2009 3:03 AM:
>> Unfortunately that's not the case for emacs.
> As for Emacs, I'm wondering if it shouldn't be changed to set its locale
> according to setlocale(LC_CTYPE,NULL) instead, given what POSIX says.
Yes, we should raise this as an upstream bug in emacs.
> Urgh. So we have to change nl_langinfo in newlib as well. Do we have
> to return "US-ASCII" if charset is "ASCII", or is it sufficient to
> return __locale_charset() as you did, thus returning "ASCII" for "ASCII"?
Gettext ships (well, used to ship, until recently disabling it for cygwin
1.5 because of lacking locale support) a charset.alias file, which mapped
arbitrary nl_langinfo(CODESET) values into canonical forms. I think you
are free to return whatever string is easiest, as long as it is documented
as one of the accepted aliases in that file. But as to a canonical name,
gettext prefers "ASCII", not "US-ASCII".
gnulib also has a function locale_charset, called by a number of packages
(coreutils, tar, findutils, ...), which uses nl_langinfo(CODESET), so
those packages are all depending on learning "UTF-8" if we are in the
> For a start, here's a first untested cut at newlib's locale.c, which
> allows us to add any desired mechanism to switch the default locale.
> If you agree to this, I'll propose it on the newlib list.
POSIX does say that the default is implementation-defined, so we have at
least a chance of convincing newlib that we need a hook to let us do our
implementation definition (whether it be by file or otherwise).
Don't work too hard, make some time for fun as well!
Eric Blake email@example.com