1.7.0-48: [BUG] Passing characters above 128 from bash command line

Edward Lam edward@sidefx.com
Fri May 29 00:15:00 GMT 2009

Alexey Borzenkov wrote:
> On Thu, May 28, 2009 at 7:28 PM, Edward Lam <edward@sidefx.com> wrote:
>> PS. In case you haven't noticed, copyright.txt is not a long file. It
>> consists of a single byte, 0xA9.
> Did you try utf-8 encoding copyright.txt? Perhaps your locale is utf-8
> and the encoder fails.

How is one supposed to determine one's locale in cygwin? I do NOT have 
LANG, or any of the LC environment variables set. I even tried 
explicitly setting LANG=C and it still fails.

The problem does seem to stem from the new UTF-8 support in cygwin 1.7. 
However, I think something is going on here that is unexpected because 
trying something similar on Linux has no problems. To confirm that it 
was an UTF-8 related problem, let me repeat the steps slightly 
differently again. Here we assume that I've already got bug.exe compiled 
which simply prints out its arguments.

$ export LANG=C

$ ./bug arg1 "before `cat copyright.txt` after" arg3
0: E:\cygwin1.7\tmp\bug.exe
1: arg1
2: before

*Notice that argc is 3 when it should be 4!*

$ piconv -f iso-8859-1 -t utf8 < copyright.txt > fubar.txt

$ ./bug arg1 "before `cat fubar.txt` after" arg3
0: E:\cygwin1.7\tmp\bug.exe
1: arg1
2: before © after
3: arg3

*So now everything works because I converted the character into UTF-8.*

I think what this points to is some form of invalid source encoding of 
the command line argument when spawning NATIVE applications.

Here's what happens when I try to compile bug.c using cygwin's gcc:

$ gcc bug.c -o bug-gcc.exe

$ ./bug-gcc arg1 "before `cat copyright.txt` after" arg3
0: ./bug-gcc
1: arg1
2: before © after
3: arg3

So there seems to be some sort of special marshaling of the command line 
arguments that only works when spawning cygwin apps, but breaks when 
running under native apps.


