"C" UTF-8 trouble

Corinna Vinschen corinna-cygwin@cygwin.com
Tue Oct 6 14:12:00 GMT 2009

On Oct  6 12:34, Andy Koppe wrote:
> 2009/10/5 Corinna Vinschen:
> >> Vim and emacs both appear to have a hardcoded assumption that the
> >> default "C" locale is 8-bit only. Since the "C" locale now defaults to
> >> UTF-8, this means that non-ASCII characters don't work out-of-the-box
> >> after all. :(
> >>
> >> Strictly speaking, vim and emacs are wrong to do this, because they
> >> should be leaving the charset up to setlocale and the multibyte
> >> conversion functions. But if these two treat "C" specially, we
> >> probably have to assume that others do the same and consider this a
> >> de-facto standard.
> >>
> >> They're both fine, however, if the locale is set to "C.UTF-8" or any
> >> other explicit UTF-8 locale. Therefore, here's one way to address this
> >> issue that avoids patching such apps:
> >>
> >> When the Windows environment is translated at DLL startup, and if LANG
> >> is not already set, set it to "C.UTF-8". This has the same semantics
> >> as plain "C", and LC_ALL as well as the specfic LC_* variables would
> >> still override it if set. Yet apps such as emacs and vi wouldn't make
> >> any undue assumptions.
> >
> > Before changing Cygwin, doesn't `set encoding=utf-8' in vim help?
> Yes, as does invoking it with 'LANG=C.UTF-8 vim'. But that wasn't the
> point: users shouldn't have to hunt down a solution for this (and
> complain along the way).

Sure.  The problem is what do we do?  POSIX requires that the default
locale is "C".  Do you actually propose to change the environment at
process startup, along these lines:

  if (!getenv ("LC_ALL") && !getenv ("LC_CTYPE", && !getenv ("LANG"))
    setenv ("LC_CTYPE", "C.UTF-8", 0);



Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

More information about the Cygwin-developers mailing list