Unconsistent command-line parsing in case of UTF-8 quoted arguments

Kaz Kylheku (Cygwin) 743-406-3965@kylheku.com
Mon Oct 19 02:32:11 GMT 2020


On 2020-10-14 14:47, Jérôme Froissart wrote:
>> The choice of GetCommandLineA was for illustration purposes;
>> had I used GetCommandLineW I would not be able to printf
>> using %ls under CMD.EXE, because of code page issues. However
>> here is a modified version of the test program that uses
>> GetCommandLineW.

[ ... ]

>> billziss@xps:~/Projects/t$ ./cyg.exe "foo bar" "Domain\Jérôme"
>> 0022 "   0043 C   003a :   005c \   0055 U   0073 s   0065 e   0072 r
>> 0073 s   005c \   0062 b   0069 i   006c l   006c l   007a z   0069 i
>> 0073 s   0073 s   005c \   0050 P   0072 r   006f o   006a j   0065 e
>> 0063 c   0074 t   0073 s   005c \   0074 t   005c \   0063 c   0079 y
>> 0067 g   002e .   0065 e   0078 x   0065 e   0022 "

[ ... ]

>> C:\Users\billziss\Projects\t>cyg.exe "foo bar" "Domain\Jérôme"
>> 0063 c   0079 y   0067 g   002e .   0065 e   0078 x   0065 e   0020
>> 0020     0022 "   0066 f   006f o   006f o   0020     0062 b   0061 a
>> 0072 r   0022 "   0020     0022 "   0044 D   006f o   006d m   0061 a
>> 0069 i   006e n   005c \   004a J   00e9 .   0072 r   00f4 .   006d m
>> 0065 e   0022 "

Aha! There is a hint of a problem here. Firstly, the command lines
are obviously different.

The Cygwin one starts with a quote that we did not see, wrapping
the full path to the executable:

   "C:\Users\billziss\Projects\t\cyg.exe"

It ends there. Why is that? I'm guessing that the command line was
tokenized destructively; a null character was written.

But under cmd.exe, we see the whole command line, without any null
character having been written in it. Moreover, the program name just
appears as the original relative path cyg.exe with no quotes.

What a mess. :)





More information about the Cygwin mailing list