This is the mail archive of the
mailing list for the Cygwin project.
Re: "C" UTF-8 trouble
On Oct 7 11:08, Andy Koppe wrote:
> 2009/10/7 Corinna Vinschen:
> > Urgh. ?So we have to change nl_langinfo in newlib as well. ?Do we have
> > to return "US-ASCII" if charset is "ASCII", or is it sufficient to
> > return __locale_charset() as you did, thus returning "ASCII" for "ASCII"?
> I'd assume so, but WWLD?
int main ()
setlocale (LC_ALL, "");
l = nl_langinfo (CODESET);
printf ("%s\n", l);
$ LANG=C.UTF-8 ./nll
$ LANG=ja_JP ./nll
$ LANG=ru_RU ./nll
$ LANG=ru_UA ./nll
$ LANG=zh_CN ./nll
$ LANG=zh_TW ./nll
Sigh. Do we really need a translation table?
> > And what about stuff like "eucJP" vs. "EUCJP"? ?The charset in newlib
> > is always uppercase right now.
> Hmm. There's also the KOI8s, which turn into CP2866.
At least, from the above it looks like all uppercase. The KOI8s would
be covered by a translation table.
The problem is, we *must* draw a line somewhere. Otherwise it will turn
out that we're not finished with this stuff, unless we have all
implemented exactly as on Linux. That puts the 1.7.1 release off to
> > As for Emacs, I'm wondering if it shouldn't be changed to set its locale
> > according to setlocale(LC_CTYPE,NULL) instead, given what POSIX says.
> Well, yes, but good luck with that. When Ken Brown raised the ^? vs ^H
> issue, they told him that sending ^H for backspace should be
> considered a bug.
That's a SEP, IMHO.
> > I, too, think this is a good idea. ?__get_locale_env() should be changed
> > to return "C.UTF-8".
> > It would be nice to check /etc/defaults/locale in __get_locale_env() as
> > well, but I'm a bit reluctant to do that. ?It means, every invocation of
> > a Cygwin process has to open that file if the environment isn't set.
> > Talking about performance...
> > Alternatively, the first invocation of Cygwin in a process tree could
> > try to read this file only.
> Agreed with the last point, but I think setenv("LANG",...) at the
> first invocation of Cygwin is a better and simpler solution than
> changing __get_locale_env(), because:
Not exactly simpler. At the places where the first invocation of a
Cygwin process tree is handled, there's no such thing as a POSIX
> - it solves the emacs isssue
> - applications will get the same result from setlocale(,"") and
> reading the environment variables themselves, so apps that do the
> latter don't have to be changed- it's more like Linux
> - it doesn't require a newlib change
But it requires a Cygwin change, so the difference is not that big.
And the actual implementation where to get the default locale from
is still open.
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com