This is the mail archive of the
mailing list for the Cygwin project.
Windows NTFS UCS2 characters
- From: John Love-Jensen <eljay at adobe dot com>
- To: <cygwin at cygwin dot com>
- Date: Thu, 30 Nov 2006 10:42:56 -0600
- Subject: Windows NTFS UCS2 characters
Hi Cygwin folks,
I have a Windows file on NTFS named (using \uXXXX representation):
# ls -alb xxx_*_xxx.txt
ls: xxx_\305_A\260_\305_xxx.txt: No such file or directory
Windows sees it just fine. The bash *-expansion is expanding it to
/something/... just not a good something it appears.
I can select the file in Explorer, I can double click on it to edit it. Use
MS-Notepad (shudder -- Cygwin's Vim's can't see the file either, neither
passed on the command line nor through Vim's explorer; I don't have a
Windows native Vim/gVim to test) to put some text in it. Save it.
But Cygwin / bash / ls finds that filename unpalatable. Hmmm.
# echo -n xxx_*_xxx.txt | xxd -g 1
78 78 78 5F C5 5F 41 B0 5F C5 5F 78 78 78 2E 74 78 74
x x x _ Ao _ A ^o _ Ao _ x x x . t x t
(The character representation line was typed in by me, not xxd. Using Ao to
represent the A-with-overcircle, ^o combining overcircle.)
I presume Cygwin's bash operates using UTF8 encoded POSIX filenames. I
expect the name should have been expanded as:
78 78 78 5F E2 84 AB 5F 41 CC 8A 5F C3 85 5F 78 78 78 2E 74 78 74
^^^^^^^^ ^^^^^ ^^^^^
E2 84 AB is UTF8 for \u212B
CC 8A is UTF8 for \u030A
C3 85 is UTF8 for \u00C5
(Assuming I didn't mess up)
Hmmm. Yep, it appears that xxx_*_xxx.txt is expanding funny.
# ls -alb -n xxx_$'\xE2\x84\xAB'_A$'\xCC\x8A'_$'\xC3\x85'_xxx.txt
ls: xxx_\342\204\253_A\314\212_\303\205_xxx.txt: No such file or directory
Drat. Still no love. So even if hand fed the UTF8 representation, ls is
not able to digest the name. (Assuming I didn't mess up.)
Is there some sort of UCS2 or UTF8 or Unicode compatibility setting I need
to set for Cygwin to be able to work in Window's NTFS environment, when some
filenames have some arbitrary UCS2 (Unicode 1.x, of course) characters?
I presume that somewhere something is set to CP1252 and causing grief.
Hmmm, I don't have LANG nor LC_ALL (or any other LC_xxx) set. Maybe that's
my problem. [Tries it.] Nope -- or I didn't do it correctly.
I can always fallback to use scripts for CMD.EXE to manipulate these files;
but I'd rather be able to do it in my Bash shell scripts.
Please don't suggest Interix, SFU or MKS alternatives. Those are fine
products, I'm sure, but I'm not interested.
/* MSVS8: cl test.c */
/* Create file name that Cygwin does not like. */
HANDLE h = CreateFileW(
GENERIC_READ | GENERIC_WRITE,
if (h == INVALID_HANDLE_VALUE)
fprintf(stderr, "Invalid handle\n");
fprintf(stderr, "Successfully opened\n");
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html