1.7.0-48: [BUG] Passing characters above 128 from bash command line

Edward Lam edward@sidefx.com
Wed Jun 3 13:18:00 GMT 2009


Corinna Vinschen wrote:
> On May 29 17:21, Edward Lam wrote:
>> 
>> I think the problem I'm running into is: - I give cygwin 1.7's bash
>> a string that is in my system default code page. - cygwin 1.7
>> thinks the string is actually UTF-8 and tries to convert it as
>> UTF-8 into UTF-16, resulting in a truncated command line that is 
>> passed to child process.
> 
> The question is, what do you expect?  I know, you expect that it
> "just works", but that's not as easy as you might assume,
> unfortunately.

Yes, Alexey and I had a lengthy argument on this thread already.
Disagreements on the default LANG behaviour notwithstanding, I think
that it still should NOT truncate, substituting the invalid character
with something else instead.

Here's a quote from Alexey previously on this thread:

"In my opinion: truncation is a bug (should use replacement character,
or fail exec altogether), expecting utf-8 is not"

Wikipedia has several suggestions on how to handle invalid UTF-8 byte 
sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the 
rule that uses the replacement character.

> Yoy get the idea.  The character 0xa9 has no meaning in itself.  It
> only has a meaning when you consider the character set or codepage in
> which you use this character.
...
 > How is anybody supposed to know that the file which consists
 > of the single byte 0xa9 has *any* meaning at all?  Why should it be
 > the copyright sign, of all things?

What I was attempting to do was to have NO conversion. In the
real case that I into this, the "bug.exe" was the one to properly
interpret what the byte 0xA9 meant from the command line. Yes, I know
there are several workarounds.

> If we default to the ANSI codepage, you will have the same problem,
> just upside down.  In both cases you will have even more problems if
> you start using characters not available in your default codepage.

This is where I disagreed with Alexey. What we're really arguing here is 
whether which default will run into the least problems for the most 
common usage. This is subjective of course.

-Edward

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list