[1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
Wed May 13 15:18:00 GMT 2009
On May 13 15:54, Andy Koppe wrote:
> > - why do you need to touch the filename at all? I haven't read all of it. Is
> > the UTF-16 on disk and we need to work around UTF-16 being intractable as C
> > string?
> Yes. If you simply treated each UTF-16 symbol as two chars, you'd get
> unintended NULs and slashes. For starters, the upper halves of all
> ISO-8859-1 characters are NUL in UTF-16. And even without that, the
> resulting filenames would be completely unusable.
Right. That's the crux when using UTF-16 filenames but many different
multibyte codepages. In contrast to a system in which the filename is
just a byte stream, we have to perform widechar to multibyte conversion
and outside of the UTF-8 domain, every other conversion is lossy.
For the time being, I applied a patch to Cygwin which should ease the
I followed the suggestion to use UTF-8 for internal conversions when the
locale is set to "C". This will also be used as default conversion when
converting the Windows environment from UTF-16 to multibyte, unless the
environment contains a valid LC_ALL/LC_CTYPE/LANG setting. The current
working directory was also potentially unusable, if an application
switched the locale. Now the CWD is re-evaluated after a setlocale call.
I'm sure this change doesn't fix all problems, but this worked much better
in my environment when using japanese and chinese characters in filenames.
There are a few other changes to the Cygwin DLL in the loop, but I will
update Cygwin 1.7 end of the week.
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
More information about the Cygwin