This is the mail archive of the
mailing list for the Cygwin project.
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
My idea is as follows:
1) separate mbtowc/wctomb function entries to library usage and
system usage. (__mbtowc/__wctomb & __sys_mbtowc/__sys_wctomb)
2) If call setlocale(LC_CTYPE) by locale != "C", then lib == sys.
3) If call setlocale(LC_CTYPE) by locale == "C", then sys is set by
LC_ALL/LC_CTYPE/LANG. If LC_ALL/LC_CTYPE/LANG are not set, use UTF-8
Cygwin startup call setlocale(LC_CTYPE, "C") at winsup/cygwin/dcrt0.cc.
I think that the result is as follows:
lib = ascii converter, sys = UTF-8 converter.
2) LANG=xx_XX.ENCODING & not call setlocale.
lib = ascii converter, sys = ENCODING converter.
3) LANG=xx_XX.ENCODING & call setlocale(LC_ALL, "").
lib = ENCODING converter, sys = ENCODING converter.
I think that [cat `read_dir_entry_and_print_app`] works correctly above all.
I am writing this patch and test code now.
> One problem can't be solved this way: ?If an application fetches
> and stores a filename, then switches the locale, and then tries
> to use the filename in another system call, the filename is
> potentially broken.
If the application switches the encoding while processing, I think
that the problem is a responsibility of the application.
2009/5/13 Corinna Vinschen <email@example.com>:
> On May 12 19:37, Corinna Vinschen wrote:
>> On May 13 02:29, IWAMURO Motonori wrote:
>> > I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8.
>> > There are three reasons:
>> That's an interesting thought. ?Do you have a patch and, if so, did you
>> try it? ?Does it, for instance, help for the issue reported in the
>> thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html?
> After examining the issue Lenik reported in the above thread, I'm at
> a loss how to solve this problem in a generic way.
> The problem is that the filename changes dependent on the character
> set used in $LANG. ?The reason is that every time a multibyte filename
> has to be generated, it has to be converted from UTF-16 to multibyte.
> For instance, taking one of the filename from Lenik's example. ?It's
> stored on the filesystem as the UTF-16 sequence \u684c \u9762. ?If I set
> LANG to en_US.UTF-8, a readdir(2) call returns the multibyte sequence
> ?0xe6 0xa1 0x8c 0xe9 0x9d 0xa2
> If I set LANG to en_US.GBK, `ls' returns the filename
> ?0xd7 0xc0 0xc3 0xe6
> And in case LANG=C, `ls' returns
> ?0x0e 0xe6 0xa1 0x8c 0x0e 0xe9 0x9d 0xa2
> So, dependent on the character set setting in the application, the idea
> of the filename differs. ?That's not exactly helpful for interoperability
> between different applications.
> I can think of two potential solutions to fix this problem:
> (1) Always return filenames in UTF-8 encoding and pretend that UTF-8
> ? ?is the way files are stored on disk. ?That results in unchangable
> ? ?filenames which are always valid.
> ? ?But what if an application sets LANG="xxxx.SJIS" and tries to create
> ? ?a file using SJIS character encoding? ?Should the file be created
> ? ?using the SJIS->UTF-16 conversion or should open fail with EILSEQ?
> ? ?That's not good.
> (2) If none of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, then
> ? ?Cygwin uses the LC_CTYPE setting which corresponds to the current
> ? ?codepage. ?If one of $LC_ALL/$LC_CTYPE/$LANG is set in the environment,
> ? ?Cygwin uses that to convert pathnames. ?If the application uses
> ? ?setlocale, Cygwin uses that setting to convert pathnames.
> ? ?One problem can't be solved this way: ?If an application fetches
> ? ?and stores a filename, then switches the locale, and then tries
> ? ?to use the filename in another system call, the filename is
> ? ?potentially broken.
> Any better ideas?
> Corinna Vinschen ? ? ? ? ? ? ? ? ?Please, send mails regarding Cygwin to
> Cygwin Project Co-Leader ? ? ? ? ?cygwin AT cygwin DOT com
> Red Hat
> Unsubscribe info: ? ? ?http://cygwin.com/ml/#unsubscribe-simple
> Problem reports: ? ? ? http://cygwin.com/problems.html
> Documentation: ? ? ? ? http://cygwin.com/docs.html
> FAQ: ? ? ? ? ? ? ? ? ? http://cygwin.com/faq/
IWAMURO Motnori <http://vmi.jp/>
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html