1.7.0-48: [BUG] Passing characters above 128 from bash command line

Edward Lam edward@sidefx.com
Sat May 30 04:55:00 GMT 2009


Alexey Borzenkov wrote:
 > No, the bug is not that it gets wrong number of arguments. In fact,
 > Windows has no concept of arguments, only C runtime does, which parses
 > the command line. If command line is truncated, then C runtime will
 > have missing arguments when it tries to parse it.

Sorry, I had meant to comment on this previously but hit send too soon.

I think the problem I'm running into is:
- I give cygwin 1.7's bash a string that is in my system default code page.
- cygwin 1.7 thinks the string is actually UTF-8 and tries to convert it 
as UTF-8 into UTF-16, resulting in a truncated command line that is 
passed to child process.

Here's some more investigation:

$ cat bug.c
#include <stdio.h>

int wmain(int argc, wchar_t *argv[], wchar_t *envp[])
{
     int i;
     for (i = 0; i < argc; i++)
         wprintf(L"%d: %s\n", i, argv[i]);
     return 0;
}

... and compiled using MSVC ....

$ ./bug arg1 "before `cat copyright.txt` after" arg3
0: E:\cygwin1.7\tmp\bug.exe
1: arg1
2: before

So note that even when I'm seems to be an UNICODE-AWARE child process, 
I'm still getting a truncated command line. In fact, call 
GetCommandLineW() directly seems to give a truncated command line
as well.

Regards,
-Edward

Alexey Borzenkov wrote:
> On Sat, May 30, 2009 at 12:10 AM, Edward Lam <edward@sidefx.com> wrote:
>> Thanks for explaining the UTF8 changes in cygwin 1.7. However, the decision
>> to use UTF-8 for the C locale is questionable.
> 
> Not at all, because utf-8, as far as I understand, is used for
> communication with the system in this context, and does not force
> anything to the application. Most modern unixes use utf-8 nowadays, it
> means that even if you have a C locale your terminal outputs text in
> utf-8, your input is utf-8, your filenames are utf-8 (well, not
> really, but the rest of the system sees them that way). Same stuff
> here, except that launching non-cygwin processes is communication with
> the system as well, and it needs conversion. And where is conversion
> there is always possible loss of data. One way or the other.
> 
>> It seems to me that it would be much safer to use the SYSTEM DEFAULT code
>> page (ie. the return value of the system GetACP() function) for CYGWIN
>> instead, ensuring compatibility for the large class native Windows
>> applications that are non-Unicode, non-CodePage aware.
> 
> It might be safe for you, but not for other people. If you have a
> Russian default codepage and ever need to work with chineese/japanese
> filenames and cygwin uses default codepage for filesystem operations
> (as in 1.5 right now), then you are really screwed. In my opinion
> utf-8 is a silver bullet here, and I'm very glad it went that way.
> 
>> I think it's very bad that changing LANG can result in a truncated *command
>> line*, that has nothing to do with printf. The printf in the code was just
>> for testing. The HUGE bug is that the application gets the  WRONG NUMBER OF
>> ARGUMENTS.
> 
> No, the bug is not that it gets wrong number of arguments. In fact,
> Windows has no concept of arguments, only C runtime does, which parses
> the command line. If command line is truncated, then C runtime will
> have missing arguments when it tries to parse it.
> 
> I mentioned wprintf because recently I was wondering why
> mkpasswd/mkgroup had a strange truncating behavior with russian
> usernames and it turned out that wprintf, when it can't encode some
> characters, stops right there and returns an error code. But, honesly,
> who ever checks return codes from printf?
> 
> Here might be something similar. When constructing command line some
> function is called and can't encode some character, returns error
> status, but it's never checked, and you get truncated command line.
> 
> And btw, I'm not cygwin developer here, I'm just a speculating user
> right now, because I haven't been searching this problem in the code.
> 
> --
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
> Problem reports:       http://cygwin.com/problems.html
> Documentation:         http://cygwin.com/docs.html
> FAQ:                   http://cygwin.com/faq/
> 


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list