This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Cygwin programs doesn't support non-ASCII filenames


(This mail is encoded in utf-8)

On 2009-5-9 18:02, Corinna Vinschen wrote:
[Repeated and additional question.  I accidentally sent this as PM.
  Sorry about that.  Let's keep this on the list, please]

On May 9 11:43, Lenik wrote:
(My system locale is zh_CN)

What ANSI codepage is that?


And what OEM codepage uses the console Window by default?
`chcp' shows codepage is 937
I don't know what's difference between ANSI codepage and OEM codepage.


1, test path
     >>>  set LANG=&  cygpath -am .
     C:/Profiles/Shecti/??????

     >>>  set LANG=zh_CN.GBK&  cygpath -am .
     C:/Profiles/Shecti/??????

     >>>  set LANG=C&  cygpath -am .
     C:/Profiles/Shecti/ÃÃÃÃ

Can you please give us the exact name of the directory in either UTF-8 or UTF-16 notation?
The two chinese characters encoding in:
GB2312: d7 c0 c3 e6
UTF-8: e6 a1 8c e9 9d a2
Unicode: \u684c \u9762


2, the `test' utility
     >>>  set LANG=&  bash -c "D=$(cygpath -am .); if [ -d $D ]; then echo
ok $D; else echo fail $D; fi"
     fail C:/Profiles/Shecti/??????

What you're actually testing here all the time is cygpath in the first place. If you stop using cygpath, start a bash shell and use the Cygwin commands with the paths in POSIX notation, you would have much less trouble. Cygwin is a POSIX emulation layer, after all.

Well, I test the pathnames using cygpath because I want to get absolute path so the chinese characters will be included in this test, and I can't type these characters in the console window. The second reason is, I associated .sh file type with bash, as:
.sh=C:\lam\sys\cygwin-1.7\bin\bash -c "$(cygpath -u '%0') %*"


This is a new test don't use cygpath:
    C:\Profiles\Shecti> set LANG=& bash -c "cat äå"
    cat: äå: No such file or directory

    C:\Profiles\Shecti> set LANG=zh_CN.GB2312& bash -c "cat äå"
    cat: äå: No such file or directory

    C:\Profiles\Shecti> set LANG=zh_CN.GBK& bash -c "cat äå"
    123

    C:\Profiles\Shecti> set LANG=zh_CN.UTF-8& bash -c "cat äå"
    123

    C:\Profiles\Shecti> set LANG=& bash -c "d äå"
    /mnt/c/Profiles/Shecti/äå doesn't exist!

    C:\Profiles\Shecti> set LANG=zh_CN.GBK& bash -c "d äå"
    /mnt/c/Profiles/Shecti/äå doesn't exist!

    C:\Profiles\Shecti> set LANG=zh_CN.UTF-8& bash -c "d äå"
    /mnt/c/Profiles/Shecti/äå doesn't exist!

The same result, it shows that `cat' from binutils can support locale well, while `d' isn't.

If you give me the above information I'll look into fixing cygpath.

     The GB2312 charset is a subset of GBK charset, and the characters `
??????' is included in GB2312 charset. So in this example, GB2312 SHOULD
WORK.

Sorry, no. It's documented that GBK is supported, GB2312 isn't. From what I read about GB2312 it's not actually a subset of GBK in terms of character definitions, it's just a subset in terms of supported characters. AFAICS, GB2312 uses chars< 0x7f in multibyte sequences which is not feasible for Cygwin. We could support EUC-CN, which seems to be another way to encode GB2312 chars, but I'm not exactly willing to add that now. I'd rather stabilize what we have now and add further charset support in a later, official 1.7 release.

So you can use LANG=zh_CN.GBK, but not LANG=zh_CN.GB2312.  It's just
treated as invalid input.  Better: Use LANG=zh_CN.UTF-8.

Yes, GB2312 is a subset in terms of supported characters. Is there anyway to know the default locale of current cygwin installation? From the test I found that `unset LANG' and `set LANG=zh_CN.GB2312' just get the same results, so I thought that GB2312 is the default locale.

And, I'd like to use UTF-8 too, but I won't chcp to 65001, this will introduce a lot of new problems when deploy to customers' machines. while most programs and files are encoded in GB2312 in the real world.

Lenik


-- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]