B20.1 sh and bash command line parsing question

Rob Tulloh tulloh@dev.tivoli.com
Wed Aug 25 15:08:00 GMT 1999

Ciao a tutti!

Chris Faylor wrote:
> I don't know if this helps, but cygwin uses the doubled quote character
> '""' to indicate a single quote.  It doesn't recognize the backslash
> syntax.
> I actually had everything coded to understand the backslash when it was
> brought to my attention that a normal MS-DOS application relies on
> doubling of quotes to quote a quote.  I know that this is not consistent

Why are you following MS-DOS for a WIN32-based implementation? The rules
(according to the Microsoft docs) are clearly stated as this:

-- snip --
Microsoft C startup code uses the following rules when interpreting
arguments given on the operating system command line:  
·	Arguments are delimited by white space, which is either a space or a
·	A string surrounded by double quotation marks is interpreted as a
single argument, regardless of white space contained within. A quoted
string can be embedded in an argument. Note that the caret (^) is not
recognized as an escape character or delimiter. 
·	A double quotation mark preceded by a backslash, \", is interpreted as
a literal double quotation mark (").
·	Backslashes are interpreted literally, unless they immediately precede
a double quotation mark.
·	If an even number of backslashes is followed by a double quotation
mark, then one backslash (\) is placed in the argv array for every pair
of backslashes (\\), and the double quotation mark (") is interpreted as
a string delimiter.
·	If an odd number of backslashes is followed by a double quotation
mark, then one backslash (\) is placed in the argv array for every pair
of backslashes (\\) and the double quotation mark is interpreted as an
escape sequence by the remaining backslash, causing a literal double
quotation mark (") to be placed in argv.
This list illustrates the rules above by showing the interpreted result
passed to argv for several examples of command-line arguments. The
output listed in the second, third, and fourth columns is from the
ARGS.C program that follows the list.
Command-Line Input	argv[1]	argv[2]	argv[3]

"a b c" d e	        a b c	d	e
"ab\"c" "\\" d 	        ab"c	\	d
a\\\b d"e f"g h	        a\\\b	de fg	h
a\\\"b c d	        a\"b	c	d
a\\\\"b c" d e	        a\\b c	d	e

-- snip --

As I discussed with Earnie by e-mail this afternoon, the burden
of quoting a command line should be on the caller and this allows
the runtime to do what is natural. So, when calling through
the Microsoft C-runtime, you must quote using it's rules so that
the command line arrives to the shell as the shell expects it to.
Let me state an example:

In order to print the string \"hi there\", you must do this for
Microsoft command shell under Windows NT or Windows 98:

echo "\"hi there\""

This is because of rule 3 above in the Microsoft docs. Now, if I wish
to have the same echo command run via sh.exe, I would normally write
this from within the shell runtime:

echo "\\\"hi there\\\""

However if I want to do the same thing by way of the Microsoft
I must follow the rules of the environment I am currently using. So in
order to run 'sh -c' and pass it the echo command, I must write the
in such a way that it will survie the Microsoft C runtime parsing and
arrive as the shell expects it (\\\"hi there\\\"):

sh -c "echo \\\\\\\"hi there\\\\\\\""

How does this work? I interpret it this way. We are trying to pass
the string \\\"hi there\\\" to shell where it will parsed using
the shell's normal rules for parsing. In order to accomplish this
we need to protect the string so it will pass through the Microsoft
C runtime and arrive to the shell as needed. To do this, we need
to follow rules 3 and 6 above. Rule 3 states that \" is an escape
and rule 6 states we need an odd number of backslashes in front
of a doublequote in order to get the escape sequence we desire.

The algorithm to compute this is is trivial. Simply add a backslash
for each doublequote encountered and then also add a backslash to each
backslash that precedes a double quote. In this example, we need to
add 4 backslashes (1 for the " and 3 for \). When executed, this
does exactly what you would expect:

C:\Apps\tivoli\BIN>sh -c "echo \\\\\\\"hi there\\\\\\\""
\"hi there\"

Please note that the sh above is NOT cygwin sh.exe. Instead, this
is the Tivoli native port of bash to NT. The Tivoli port of sh.exe
assumes that the caller will do all the necessary work to pad
a command line to pass to it. Note that this same logic works
when you are 'inside' the shell context. Since shell doesn't care about
the caller, you then write a more natural shell expression when you
are running in the shell context already. Using the same shell
on the same system, I can now do this:

bash$ echo "\\\"hi there\\\""
\"hi there\"

So, any attempt to circumvent/interpret the runtime's handling of
argument parsing
seems wrong to me. The caller's should reverse parse according to
the rules of the runtime they are invoking from and then everything
works exactly as you would expect. If I am a caller using the cygwin
runtime, I would follow whatever rules are in place for cygwin.dll.
If I am a WIN32 caller, I must follow the rules of the Microsoft
C runtime.

My gut feeling (without having
looked at the code) is that cygwin is using GetCommandLine() to fetch
the raw command string passed to the shell. It then is probably trying
do it's own parsing on this command line and it does not seem to
be following any rules that make sense from a Unix or WIN32 perspective.
Therefore, cygwin has achieved zero compatibility with WIN32 or Unix 
which I think limits it's usefulness. OK, I guess if you are inside
cygwin's runtime, perhaps you are compatible with Unix, but that is only
have the battle when you are trying to create solutions that must
interact seamlessly with WIN32 native code. Agree?

You say "" is the escape clause for an embedded "? Why follow DOS? Is
this a DOS subsystem or a WIN32 subsystem?

> with our handling of upper and lower case wildcard characters on the
> command line since it means that, in essence, we have different command
> line paradigms for quoting and wildcards but now we have backwards
> compatibility issues to worry about.

Definitely! I saw the backwards compatiblity issue is much larger
here. Do you want folks to be able to write code once or to have
to ifdef between cygwin and Unix?

> So, maybe someday we'll bite the bullet and make this consistent but,
> for now, that's how things work.

Please, let's fix it!

> Chris Faylor
> Win32 Manager
> Cygnus Solutions

Thanks for the feedback. I hope I have made my points strong enough for
you to think about changing the implementation to match what makes sense
for both Unix users and Microsoft users. 

Rob Tulloh
Tivoli Managment Framework - Staff Engineer
Tivoli Systems (Rome Lab)

> On Wed, Aug 25, 1999 at 01:04:50PM +0200, Rob Tulloh wrote:
> >As a maintainer of GNU make on WIN32 platforms, I am constantly asked
> >why Cygwin sh.exe and/or bash.exe don't work correctly when called
> >from make. I have a hack that forces all shell commands to be written
> >to a temp file and then run via 'sh file'. However, I don't like
> >this as it represents a unnecessary performance hit to make on WIN32
> >platforms. I have a simple test case that shows the problems.
> >
> >I am trying to work out why sh.exe and bash.exe are not able to
> >be succesfully invoked via 'sh -c' from CreateProcess() for
> >all cases of command line. I have
> >a simple test example that can be run from the NT command prompt
> >which demonstrates the problem. There are 6 lines below which
> >use 3 shells to execute 2 different command lines:
> >
> >> i:/apps/work/cygnus/CYGWIN~1/H-I586~1/bin/sh.exe -c "if [ -d \"c:/temp\" ] ; then echo \"hi\" ; fi"
> >> i:/apps/work/cygnus/CYGWIN~1/H-I586~1/bin/bash.exe -c "if [ -d \"c:/temp\" ] ; then echo \"hi\" ; fi"
> >> i:/tools/gk/bin/sh.exe -c "if [ -d \"c:/temp\" ] ; then echo \"hi\" ; fi"
> >> i:/apps/work/cygnus/CYGWIN~1/H-I586~1/bin/sh.exe -c "if [ -d \"c:/temp\" ] ; then echo \"hi there\" ; fi"
> >> i:/apps/work/cygnus/CYGWIN~1/H-I586~1/bin/bash.exe -c "if [ -d \"c:/temp\" ] ; then echo \"hi there\" ; fi"
> >> i:/tools/gk/bin/sh.exe -c "if [ -d \"c:/temp\" ] ; then echo \"hi there\" ; fi"
> >
> >The first 2 shells are the ones from the Cygwin B20.1 distribution. The
> >3rd shell is Tivoli's custom port of GNU bash to Windows NT.
> >
> >> I:\apps\work\cygnus\cygwin-b20>c:\temp\sp2.bat
> >>
> >> I:\apps\work\cygnus\cygwin-b20>i:/apps/work/cygnus/CYGWIN~1/H-I586~1/bin/sh.exe
> >> -c "if [ -d \"c:/temp\" ] ; then echo \"hi\" ; fi"
> >> hi
> >>
> >> I:\apps\work\cygnus\cygwin-b20>i:/apps/work/cygnus/CYGWIN~1/H-I586~1/bin/bash.ex
> >> e -c "if [ -d \"c:/temp\" ] ; then echo \"hi\" ; fi"
> >> hi
> >>
> >> I:\apps\work\cygnus\cygwin-b20>i:/tools/gk/bin/sh.exe -c "if [ -d \"c:/temp\" ]
> >> ; then echo \"hi\" ; fi"
> >> hi
> >>
> >> I:\apps\work\cygnus\cygwin-b20>i:/apps/work/cygnus/CYGWIN~1/H-I586~1/bin/sh.exe
> >> -c "if [ -d \"c:/temp\" ] ; then echo \"hi there\" ; fi"
> >> Syntax error: Unterminated quoted string
> >>
> >> I:\apps\work\cygnus\cygwin-b20>i:/apps/work/cygnus/CYGWIN~1/H-I586~1/bin/bash.ex
> >> e -c "if [ -d \"c:/temp\" ] ; then echo \"hi there\" ; fi"
> >> there\ ; fi: -c: line 1: unexpected EOF while looking for matching `"'
> >> there\ ; fi: -c: line 2: syntax error: unexpected end of file
> >>
> >> I:\apps\work\cygnus\cygwin-b20>i:/tools/gk/bin/sh.exe -c "if [ -d \"c:/temp\" ]
> >> ; then echo \"hi there\" ; fi"
> >> hi there
> >
> >
> >Notice how as soon as white space ("hi there") is introduced into a
> >string
> >embedded in the command line that
> >the parser breaks down and fails to parse the string correctly. I would
> >have thought that the parsing rules would follow the Microsoft C runtime
> >rules for argument parsing since it should be possible to invoke Cygwin
> >commands from CreateProcess() (natively from Win32) rather than having
> >to rely on fork/exec/whatever in cygwin.dll.
> >
> >Note that I have made the parsing logic within make work for Tivoli's
> >custom
> >port of bash and also for the MKS version of sh. I am not able to figure
> >out what magic is needed to make this work with Cygwin sh or bash. I am
> >looking
> >for insight on what the parsing algorithm is and how to invoke commands
> >from
> >WIN32 so that sh/bash can parse them as I would expect.
> >
> >Comments?
> >
> >Thank you,

Want to unsubscribe from this list?
Send a message to cygwin-unsubscribe@sourceware.cygnus.com

More information about the Cygwin mailing list