Unconsistent command-line parsing in case of UTF-8 quoted arguments
Jérôme Froissart
software@froissart.eu
Fri Oct 2 21:40:12 GMT 2020
Hello,
By discussing a merge request on another project [1], I think
billziss-gh found a weirdness in the way Cygwin parses the command
line arguments when non-ASCII characters come into play.
EXPECTED BEHAVIOUR:
cygwin should parse the following command line
binary.exe --non-ascii "charaçtérs" --ascii "nothing-fancy-here"
as
argv = ["binary.exe",
"--non-ascii",
"chara\xXX\xXXt\xXX\xXXrs",
"--ascii",
"nothing-fancy-here"]
// \xXX\xXX being the UTF-8 encoding of the special characters,
but this does not really matter here
before calling main()
ACTUAL BEHAVIOUR:
it parses it as
argv = ["binary.exe",
"--non-ascii",
"\"chara\xXX\xXXt\xXX\xXXrs\"", // mind the unstripped
quotes here...
"--ascii",
"nothing-fancy-here" // ...but not here
]
It looks that words containing UTF-8 characters are not properly
stripped when they are surrounded by quotes, unlinke ASCII words.
More examples and a better description is available at [1] (thanks to
billziss-gh for his analysis, much more thorough than mine)
For the record, we wrote a work-around in our specific program, but
handling this issue in Cygwin might be a better way to solve it.
[1]: https://github.com/billziss-gh/sshfs-win/pull/208 (Checking for
quotes around non-ascii usernames passed by Windows)
Thanks for your help! In case you didn't have time, please tell me
where to look at, and I might try to fix it myself and send a patch
proposal if that is easy enough (I have never read Cygwin's code yet).
Jérôme
More information about the Cygwin
mailing list