1.7.0-48: [BUG] Passing characters above 128 from bash command line
Christopher Faylor
cgf-use-the-mailinglist-please@cygwin.com
Wed Jun 3 17:01:00 GMT 2009
On Wed, Jun 03, 2009 at 12:55:57PM -0400, Edward Lam wrote:
>Corinna Vinschen wrote:
>> On Jun 3 12:02, Christopher Faylor wrote:
>>> On Wed, Jun 03, 2009 at 04:27:55PM +0200, Corinna Vinschen wrote:
>>>> On Jun 3 09:18, Edward Lam wrote:
>>>>> Corinna Vinschen wrote:
>>>>>> The question is, what do you expect? [...]
>>>>> [...]
>>>>> Wikipedia has several suggestions on how to handle invalid UTF-8 byte
>>>>> sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the
>>>>> rule that uses the replacement character.
>>>> Chris implemented using the invalid code point solution. The discussion
>>>> in http://www.mail-archive.com/linux-utf8@nl.linux.org/msg00080.html
>>>> supports this solution. What's missing so far is the way back, from
>>>> an invalid single second half of a surrogate pair in the 0xDCxx range
>>>> back to the correct byte value. I'm just looking into that.
>>> The way back was not, AFAIK, needed for Cygwin programs. I don't think
>>> there is a valid way back for Windows programs.
>>
>> The way back is not needed for the argv handling in Cygwin, but it
>> gets necessary if you converted to UTF-16 in other circumstances.
>> It's not much of a problem since the way back is a no-brainer, in
>> contrast to the conversion to UTF-16.
>
>What is the current state of affairs in cygwin 1.7.0-48? Is the invalid
>code point solution currently being used when converting the command
>line to UTF-16 when spawning non-cygwin processes? What I'm trying to
>understand is where the command line truncation is taking place, in the
>parent or child process.
>
>If the truncation is happening in the child process because of the
>invalid code point, then perhaps we should consider using the
>replacement character solution when spawning non-cygwin child processes.
>IMHO, having a bad character is better than having a truncated command
>line. At least, the problem (invalid UTF-8) then becomes more obvious.
As Corinna said above: "Chris implemented using the invalid code point
solution"
That's what is in Cygwin's CVS and in the latest snapshot.
cgf
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
More information about the Cygwin
mailing list