"C" UTF-8 trouble

Andy Koppe andy.koppe@gmail.com
Tue Oct 6 11:34:00 GMT 2009

2009/10/5 Corinna Vinschen:
>> Vim and emacs both appear to have a hardcoded assumption that the
>> default "C" locale is 8-bit only. Since the "C" locale now defaults to
>> UTF-8, this means that non-ASCII characters don't work out-of-the-box
>> after all. :(
>> Strictly speaking, vim and emacs are wrong to do this, because they
>> should be leaving the charset up to setlocale and the multibyte
>> conversion functions. But if these two treat "C" specially, we
>> probably have to assume that others do the same and consider this a
>> de-facto standard.
>> They're both fine, however, if the locale is set to "C.UTF-8" or any
>> other explicit UTF-8 locale. Therefore, here's one way to address this
>> issue that avoids patching such apps:
>> When the Windows environment is translated at DLL startup, and if LANG
>> is not already set, set it to "C.UTF-8". This has the same semantics
>> as plain "C", and LC_ALL as well as the specfic LC_* variables would
>> still override it if set. Yet apps such as emacs and vi wouldn't make
>> any undue assumptions.
> Before changing Cygwin, doesn't `set encoding=utf-8' in vim help?

Yes, as does invoking it with 'LANG=C.UTF-8 vim'. But that wasn't the
point: users shouldn't have to hunt down a solution for this (and
complain along the way).


More information about the Cygwin-developers mailing list