1.7.0-48: [BUG] Passing characters above 128 from bash command line
Edward Lam
edward@sidefx.com
Wed Jun 3 13:18:00 GMT 2009
Corinna Vinschen wrote:
> On May 29 17:21, Edward Lam wrote:
>>
>> I think the problem I'm running into is: - I give cygwin 1.7's bash
>> a string that is in my system default code page. - cygwin 1.7
>> thinks the string is actually UTF-8 and tries to convert it as
>> UTF-8 into UTF-16, resulting in a truncated command line that is
>> passed to child process.
>
> The question is, what do you expect? I know, you expect that it
> "just works", but that's not as easy as you might assume,
> unfortunately.
Yes, Alexey and I had a lengthy argument on this thread already.
Disagreements on the default LANG behaviour notwithstanding, I think
that it still should NOT truncate, substituting the invalid character
with something else instead.
Here's a quote from Alexey previously on this thread:
"In my opinion: truncation is a bug (should use replacement character,
or fail exec altogether), expecting utf-8 is not"
Wikipedia has several suggestions on how to handle invalid UTF-8 byte
sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the
rule that uses the replacement character.
> Yoy get the idea. The character 0xa9 has no meaning in itself. It
> only has a meaning when you consider the character set or codepage in
> which you use this character.
...
> How is anybody supposed to know that the file which consists
> of the single byte 0xa9 has *any* meaning at all? Why should it be
> the copyright sign, of all things?
What I was attempting to do was to have NO conversion. In the
real case that I into this, the "bug.exe" was the one to properly
interpret what the byte 0xA9 meant from the command line. Yes, I know
there are several workarounds.
> If we default to the ANSI codepage, you will have the same problem,
> just upside down. In both cases you will have even more problems if
> you start using characters not available in your default codepage.
This is where I disagreed with Alexey. What we're really arguing here is
whether which default will run into the least problems for the most
common usage. This is subjective of course.
-Edward
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
More information about the Cygwin
mailing list