Bug 27594

Summary: build processes broken by changed space handling
Product: binutils Reporter: Thomas Wolff <towo>
Component: binutilsAssignee: Not yet assigned to anyone <unassigned>
Status: UNCONFIRMED ---    
Severity: normal CC: eliz, johannes.schindelin, martin, nickc
Priority: P2    
Version: 2.36   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Proposed patch

Description Thomas Wolff 2021-03-17 12:35:58 UTC
Recent change in response to #4356 apparently breaks some build processes, 
see https://github.com/msys2/MSYS2-packages/issues/2379

Whatever the solution will be, it should be consistent with the manual page of windres;
option --preprocessor program
could arguably be interpreted to take just a "program" as an argument but its own description contradicts that:
> The default preprocessor argument is "gcc -E -xc-header -DRC_INVOKED".

option --preprocessor-arg option
is described "to specify additional text to be passed to preprocessor on its command line" so it should accept multiple arguments but is restricted to one argument only per option, resulting in unpleasantly lengthy command lines in makefiles that would use this option per argument

My suggestion is to apply some obvious magic to option --preprocessor; 
everything following a space-separated dash, like " -" in "path to gcc -x -y", should be used as a set of arguments as before, so confining the new space-in-pathname support to the actual command.
Comment 1 Johannes Schindelin 2021-03-17 14:11:01 UTC
As I stated in https://github.com/msys2/MSYS2-packages/issues/2379#issuecomment-801061261, the common way to address this is type of issue is to require the path with spaces to be quoted. Something like this:

windres --preprocessor '"c:\\Program Files\\gcc" -E "-Ic:\Path With Spaces\include" -Dfoo'

That way, existing users of the `--preprocessor` option won't be broken, even if they just wanted to recreate the default behavior (https://sourceware.org/binutils/docs/binutils/windres.html claims that the default value of `--preprocessor` is `gcc -E -xc-header -DRC_INVOKED`, but the bug in question prevents that very value from ever being accepted correctly by `windres`, it will misinterpret it as a path with spaces instead).
Comment 2 Martin Storsjö 2021-04-29 11:59:45 UTC
+1, there's predecent that windres takes this parameter as a supposedly-pre-quoted/escaped string that is passed as-is to popen().

The same also goes for other preprocessor arguments, like -D and --preprocessor-arg - they are passed almost (will elaborate below) as-is on to popen(). This means that any caller that wants to pass e.g. a define for a quoted string needs to escape quotes doubly - see e.g. https://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=blob;f=windows/windres-options;h=779fddec305d1e78f1e5c3123683b3c380e4a82e;hb=4b1a76b8e7f718fb23eb1a48cd1be208cfff6c2a for an example of projects in the wild, doing extra escaping when passing defines to windres.

A minimal example of that double quoting:
$ cat test1.rc
STRINGTABLE {
  1 STRING1
}
$ x86_64-w64-mingw32-windres -DSTRING1=\"foo\" test1.rc
x86_64-w64-mingw32-windres: test1.rc:2: syntax error
$ x86_64-w64-mingw32-windres -DSTRING1=\\\"foo\\\" test1.rc
[..snip..]
STRINGTABLE MOVEABLE PURE DISCARDABLE
BEGIN
  1, "foo"
END


Windres does try to escape one kind of char in the provided argument strings; it tries to escape spaces with a backslash. Unfortunately, as the shells (invoked by popen) differ between unix and windows (one can't backslash escape spaces in windows command lines, the argument has to be quoted), this only actually works when run on unix:

$ x86_64-w64-mingw32-windres "-DSTRING1=\\\"foo bar\\\"" test1.rc
[..snip..]
STRINGTABLE MOVEABLE PURE DISCARDABLE
BEGIN
  1, "foo bar"
END

If you try to run the same on windows (built as mingw executables), it fails:

$ windres test1.rc "-DSTRING1=\\\"foo bar\\\""
gcc: error: bar": Invalid argument
C:\msys64\mingw64\bin\windres.exe: preprocessing failed.


Furthermore, other things that differ between how shells interpret things in popen():

)$ x86_64-w64-mingw32-windres -DSTRING1=\\\"foo\\\\\\\\bar\\\" test1.rc -v
Using `x86_64-w64-mingw32-gcc -E -xc -DRC_INVOKED -DSTRING1=\"foo\\\\bar\" test1.rc'
Using popen to read preprocessor output
[..snip..]
STRINGTABLE MOVEABLE PURE DISCARDABLE
BEGIN 
  1, "foo\\bar"
END

But on windows:

$ windres test1.rc -DSTRING1=\\\"foo\\\\\\\\bar\\\" -v
[..snip..]
STRINGTABLE MOVEABLE PURE DISCARDABLE
BEGIN
  1, "foo\\\\bar"
END
Using `C:\msys64\mingw64\bin\gcc -E -xc -DRC_INVOKED -DSTRING1=\"foo\\\\bar\" test1.rc'


The last few are hypothetical. However the habit of double-escaping quotes for such strings exists in the wild, and should be documented as such. The fact that it differs between platforms exactly how one should do the double escaping is problematic too, when it comes to more complicated strings.

(I recently implemented llvm-windres, and went to some lengths to mimic the GNU windres behaviour of these options, to make it work as a drop-in replacement for how existing projects in the wild call windres.)
Comment 3 Sourceware Commits 2021-04-29 12:12:11 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5edb8e3f5ad8d74a83fc0df7f6e4514eed0aa77f

commit 5edb8e3f5ad8d74a83fc0df7f6e4514eed0aa77f
Author: Nick Clifton <nickc@redhat.com>
Date:   Thu Apr 29 13:11:29 2021 +0100

    Correct the text describing windres's --processor option.
    
     PR 27594
     * doc/binutils.texi (windres): Correct the description of the
     default value of the --preprocessor argument.
Comment 4 Nick Clifton 2021-04-29 13:58:29 UTC
Created attachment 13403 [details]
Proposed patch

Is anyone able to try out this patch as a possible workaround for this issue ?

It implements Thomas' suggestion that a space followed by a dash not be considered as a reason for quoting the preprocessor command string.
Comment 5 Thomas Wolff 2021-04-29 15:24:34 UTC
Unfortunately the patch doesn't fix the issue, as it only decides how to handle the whole string but does not break at that position.
Also please note Johannes' suggestion to prefer a more state-of-the-art approach.
My suggestion is to first revert the regression and then implement quote detection as he suggested.
Comment 6 Thomas Wolff 2021-05-10 04:20:20 UTC
Please fix this by simply reverting to the 2.35 version.
Some testing reveals that the solution proposed by Johannes Schindelin actually worked already in 2.35, i.e. spaces can be embedded by quoting, e.g.
windres -c 65001 --preprocessor '"/my bin/gcc" "-E"'
so there was no need for the 2.36 patch at all in the first place.
Comment 7 Sourceware Commits 2021-05-10 10:28:59 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=749c700282097cf679ff019a9674d7c762f48619

commit 749c700282097cf679ff019a9674d7c762f48619
Author: Thomas Wolff <towo@towo.net>
Date:   Mon May 10 11:28:15 2021 +0100

    Restore old behaviour of windres so that options containing spaces are not enclosed in double quotes.
    
            PR 4356
            PR 26865
            PR 27594
            * windres.c (quot): Revert previous delta.  Do not use double
            quotes when spaces are detected in options.
            * doc/binutils.texi (windres): Remove suggestion that the
            --preprocessor option can take arguments.
Comment 8 Eli Zaretskii 2021-05-10 14:02:20 UTC
Reverting that change is IMO a mistake.

The breakage of the build process cited in the original report is due to incorrect usage: it gives --preprocess a full shell command, instead of just the executable name (the command-line arguments for the preprocessor should come via --preprocessor-arg, not via --preprocessor.  So saying that the change should be reverted because it breaks that build is invalid: the build script should be fixed to use `windres` correctly.

Let's start from the beginning: if the argument to the --preprocessor option includes space characters, how is this supposed to work with `windres`?  Quoting 'like this' will not work when `windres` is invoked from cmd.exe, it will only work when invoked from MSYS Bash, so it cannot be the solution.  Quoting with backslashes also doesn't work in cmd.exe.  The suggestion to pre-quote the preprocessor is also unreliable, because when `windres` is run from cmd.exe or from the native MinGW Make, the program's startup code unquotes the argument, so the result will depend on how many "unquoting" levels the command line is subject to.

The only quoting method that works in both Posix shells and cmd.exe is "like this".  Which is what the change to `quot` did, and now we are back to square one.

It is IMO incorrect to break native invocation from cmd.exe so that broken scripts and "pre-quoting" in Posix shell manner could work.

Please restore the original change in `quot`.  If that causes some trouble not covered by my comments above, please describe them, and please let's try to find a solution that doesn't break quoting from cmd.exe again.
Comment 9 Martin Storsjö 2021-05-10 14:58:38 UTC
(In reply to Eli Zaretskii from comment #8)
> Quoting with backslashes also doesn't work in cmd.exe.  The suggestion to
> pre-quote the preprocessor is also unreliable, because when `windres` is run
> from cmd.exe or from the native MinGW Make, the program's startup code
> unquotes the argument, so the result will depend on how many "unquoting"
> levels the command line is subject to.

I would beg to differ here; when any shell, be it either bash or cmd.exe, interprets a command line, it does split and unescape the command line depending on that shell's rules.

It's true that the actual splitting and unescaping happens at different levels (in the caller, in the case of unix shells, and in the callee in the case of a win32 executable), but the basic principle is still the same; if I want to call a tool and have it receive a specific argument, I have to quote that argument according to the current shell's rules (and those rules differ with respect to certain details).

So if I want to call windres so that it receives the argument "path to/gcc.exe" -E -DRC_INVOKED, I can call windres with "\"path to/gcc.exe\" -E -DRC_INVOKED" in cmd.exe and in bash - that particular quoting happens to work in both. In bash I could also call it with single quotes and not needing to escape the double quotes, as '"path to/gcc.exe" -E -DRC_INVOKED'.


Thomas - the patch that was applied today, in https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=749c700282097cf679ff019a9674d7c762f48619, doesn't that adjust an entirely different thing? The quot() function isn't called for --preprocessor arguments, it's only called for --preprocessor-arg and -D and similar. It still just wraps the whole --preprocessor argument in double quotes, here: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/windres.c;h=2b626a29fb68ada1cc012a267a60d72c1335bea9;hb=23182ac0d832477d316547ec2a758d22b43d0837#l889


I'm not arguing that what we have right now is pretty - it's not.

But the core issue - that the --preprocessor argument used to accept multiple arguments (and where spaces would have to be pre-quoted) - do you want to willingly break that, which multiple existing projects rely on? I agree that the new interpretation of the option would be more straightforward to deal with - but it is a breaking change that affects multiple projects. I'd rather try to more clearly document the existing original behaviour and how to deal with it.



Eli - separately, the new quoting behaviour (that was reverted now, needlessly?) is, in principle a good thing, but it's still missing to quote a few tricky cases - namely backslashes and double quotes. And fixing that would be breaking roughly a similar number of projects.

If I want to preprocess with the option e.g. -DSTRING="quotedstring", I want windres to call popen with -DSTRING=\"quotedstring\", so I need to call windres with -DSTRING=\\\"quotedstring\\\". So it would in theory be great if quot() would handle proper escaping of backslashes and double quotes, but if we'd do that change, we'd be breaking those existing users that have codified that case of double escaping for such strings.
Comment 10 Eli Zaretskii 2021-05-10 16:38:45 UTC
(In reply to Martin Storsjö from comment #9)
> (In reply to Eli Zaretskii from comment #8)
> > Quoting with backslashes also doesn't work in cmd.exe.  The suggestion to
> > pre-quote the preprocessor is also unreliable, because when `windres` is run
> > from cmd.exe or from the native MinGW Make, the program's startup code
> > unquotes the argument, so the result will depend on how many "unquoting"
> > levels the command line is subject to.
> 
> I would beg to differ here; when any shell, be it either bash or cmd.exe,
> interprets a command line, it does split and unescape the command line
> depending on that shell's rules.
> 
> It's true that the actual splitting and unescaping happens at different
> levels (in the caller, in the case of unix shells, and in the callee in the
> case of a win32 executable), but the basic principle is still the same; if I
> want to call a tool and have it receive a specific argument, I have to quote
> that argument according to the current shell's rules (and those rules differ
> with respect to certain details).

Up until here, full agreement.

> So if I want to call windres so that it receives the argument "path
> to/gcc.exe" -E -DRC_INVOKED, I can call windres with "\"path to/gcc.exe\" -E
> -DRC_INVOKED" in cmd.exe and in bash - that particular quoting happens to
> work in both. In bash I could also call it with single quotes and not
> needing to escape the double quotes, as '"path to/gcc.exe" -E -DRC_INVOKED'.

Here is where I disagree: the "\"path to/gcc.exe\" or '"path to/gcc.exe"' is NOT quoting, it's double quoting.  Quoting is just "path to/gcc.exe" or (for Posix shells) 'path to/gcc.exe'.

> The quot() function isn't called for --preprocessor
> arguments, it's only called for --preprocessor-arg and -D and similar. It
> still just wraps the whole --preprocessor argument in double quotes, here:
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/windres.c;
> h=2b626a29fb68ada1cc012a267a60d72c1335bea9;
> hb=23182ac0d832477d316547ec2a758d22b43d0837#l889

Yes, and thus I don't understand why the change to `quot` was reverted.  If that function's job is to quote something, then it should quote correctly on MS-Windows, and the only sensible quoting on MS-Windows is "like this".  If someone wants to argue that arguments with whitespace should not be quoted, then why do we still call `quot` on Unix?

> But the core issue - that the --preprocessor argument used to accept
> multiple arguments (and where spaces would have to be pre-quoted) - do you
> want to willingly break that, which multiple existing projects rely on? 

You just said that the change in `quot` doesn't affect --preprocessor, so how can the change in `quot` break that (incorrect) usage of --preprocessor?  Why was it reverted?  What am I missing here?

> Eli - separately, the new quoting behaviour (that was reverted now,
> needlessly?) is, in principle a good thing, but it's still missing to quote
> a few tricky cases - namely backslashes and double quotes. And fixing that
> would be breaking roughly a similar number of projects.

The code does attempt to quote double quotes, but if that code is buggy, please show the specific use case where it breaks.  If the code is incomplete, let's fix that, by all means.

As for backslashes: they don't need to be quoted on native Windows, only if you are using a Posix shell, in which case the user should have typed the original file name or string argument with backslashes already doubled, right?

> If I want to preprocess with the option e.g. -DSTRING="quotedstring", I want
> windres to call popen with -DSTRING=\"quotedstring\", so I need to call
> windres with -DSTRING=\\\"quotedstring\\\".

I don't think I agree.  The user should say -DSTRING="quotedstring" and the code which invokes `popen` should then take care of quoting any arguments that have embedded whitespace.  Otherwise, we will have to know exactly how many "unquoting" levels will the string undergo, and add quotes for each level.  That way lies madness, because it is not practical to do that -- you never know how many such levels will the string undergo.  The only sane way is for the code which is going to pass a string to a shell command to quote it if needed, and as appropriate to the command to be invoked.

> So it would in theory be great
> if quot() would handle proper escaping of backslashes and double quotes, but
> if we'd do that change, we'd be breaking those existing users that have
> codified that case of double escaping for such strings.

I still want to see those existing users which we will break.  I suspect that either they are already broken (and use loopholes due to bugs in the implementation), or will simply be unaffected.

So by all means, let's talk about those use cases one by one, and see what we want to do about them.
Comment 11 Martin Storsjö 2021-05-10 16:47:32 UTC
(In reply to Eli Zaretskii from comment #10)
> > So if I want to call windres so that it receives the argument "path
> > to/gcc.exe" -E -DRC_INVOKED, I can call windres with "\"path to/gcc.exe\" -E
> > -DRC_INVOKED" in cmd.exe and in bash - that particular quoting happens to
> > work in both. In bash I could also call it with single quotes and not
> > needing to escape the double quotes, as '"path to/gcc.exe" -E -DRC_INVOKED'.
> 
> Here is where I disagree: the "\"path to/gcc.exe\" or '"path to/gcc.exe"' is
> NOT quoting, it's double quoting.  Quoting is just "path to/gcc.exe" or (for
> Posix shells) 'path to/gcc.exe'.

Yes, that's true: We need to make sure windres receives a pre-quoted string, thus the caller has to double quote it.

> > The quot() function isn't called for --preprocessor
> > arguments, it's only called for --preprocessor-arg and -D and similar. It
> > still just wraps the whole --preprocessor argument in double quotes, here:
> > https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/windres.c;
> > h=2b626a29fb68ada1cc012a267a60d72c1335bea9;
> > hb=23182ac0d832477d316547ec2a758d22b43d0837#l889
> 
> Yes, and thus I don't understand why the change to `quot` was reverted.  If
> that function's job is to quote something, then it should quote correctly on
> MS-Windows, and the only sensible quoting on MS-Windows is "like this".  If
> someone wants to argue that arguments with whitespace should not be quoted,
> then why do we still call `quot` on Unix?

Indeed, I think that was a mistake in the revert. I'll try to follow up with a patch for reverting the right thing in a couple hours, and for clarifying the documentation regarding it - to codify the existing status quo regarding it.

> > But the core issue - that the --preprocessor argument used to accept
> > multiple arguments (and where spaces would have to be pre-quoted) - do you
> > want to willingly break that, which multiple existing projects rely on? 
> 
> You just said that the change in `quot` doesn't affect --preprocessor, so
> how can the change in `quot` break that (incorrect) usage of --preprocessor?
> Why was it reverted?  What am I missing here?

Yeah the change in quot doesn't affect --preprocessor indeed. The "you" in this section was more of a generic towards the project, not you (singular) in particular.

> > Eli - separately, the new quoting behaviour (that was reverted now,
> > needlessly?) is, in principle a good thing, but it's still missing to quote
> > a few tricky cases - namely backslashes and double quotes. And fixing that
> > would be breaking roughly a similar number of projects.
> 
> The code does attempt to quote double quotes, but if that code is buggy,
> please show the specific use case where it breaks.  If the code is
> incomplete, let's fix that, by all means.
> 
> As for backslashes: they don't need to be quoted on native Windows, only if
> you are using a Posix shell, in which case the user should have typed the
> original file name or string argument with backslashes already doubled,
> right?
> 
> > If I want to preprocess with the option e.g. -DSTRING="quotedstring", I want
> > windres to call popen with -DSTRING=\"quotedstring\", so I need to call
> > windres with -DSTRING=\\\"quotedstring\\\".
> 
> I don't think I agree.  The user should say -DSTRING="quotedstring" and the
> code which invokes `popen` should then take care of quoting any arguments
> that have embedded whitespace.  Otherwise, we will have to know exactly how
> many "unquoting" levels will the string undergo, and add quotes for each
> level.  That way lies madness, because it is not practical to do that -- you
> never know how many such levels will the string undergo.  The only sane way
> is for the code which is going to pass a string to a shell command to quote
> it if needed, and as appropriate to the command to be invoked.
>
> > So it would in theory be great
> > if quot() would handle proper escaping of backslashes and double quotes, but
> > if we'd do that change, we'd be breaking those existing users that have
> > codified that case of double escaping for such strings.
> 
> I still want to see those existing users which we will break.  I suspect
> that either they are already broken (and use loopholes due to bugs in the
> implementation), or will simply be unaffected.
> 
> So by all means, let's talk about those use cases one by one, and see what
> we want to do about them.

Yes I agree that the most sane thing to do would be to have the user specify -DSTRING="quotedstring", but there are projects that have ended up working around this, so fixing this case does break them.

I'll follow up with a separate bug for that topic in a couple hours, with more examples and details, to avoid derailing this one further.
Comment 12 Eli Zaretskii 2021-05-10 17:13:58 UTC
(In reply to Martin Storsjö from comment #11)
> > Here is where I disagree: the "\"path to/gcc.exe\" or '"path to/gcc.exe"' is
> > NOT quoting, it's double quoting.  Quoting is just "path to/gcc.exe" or (for
> > Posix shells) 'path to/gcc.exe'.
> 
> Yes, that's true: We need to make sure windres receives a pre-quoted string,
> thus the caller has to double quote it.

That means `windres` will be the only program which requires such strange invocation rules.  Every other program which does something similar doesn't require such "over-quoting".

And I don't really understand why you are willing to require such strange rules: is that only because of some build scripts that use such broken multiple quoting?  IOW, instead of fixing things (and the fixes are really simple), we are forever doomed to be bug-compatible with them?  And only on Windows on top of that?

> Indeed, I think that was a mistake in the revert. I'll try to follow up with
> a patch for reverting the right thing in a couple hours, and for clarifying
> the documentation regarding it - to codify the existing status quo regarding
> it.

Thanks.

> Yes I agree that the most sane thing to do would be to have the user specify
> -DSTRING="quotedstring", but there are projects that have ended up working
> around this, so fixing this case does break them.

Those projects should get their act together, if you ask me.  It isn't hard, really.  Not my call, obviously, but still...
Comment 13 Eli Zaretskii 2021-05-10 17:45:26 UTC
I think I see what caused the breakage of the MSYS2 build script.  It wasn't the change in `quot`, it was a previous commit 21c33bc, which attempted to fix PR 26865.  The change in that commit encloses the --preprocessor argument in double quotes if it includes whitespace.  Which is correct if the argument is a single file, as it should be.  But if it is used to pass a complete preprocessor shell command, and the scripts which used that loophole were working around the invalid usage by pre-quoting the argument, then indeed those scripts will break.

So bottom line: the revert reverted the wrong commit.
Comment 14 Eli Zaretskii 2021-05-10 18:39:23 UTC
And one more comment: using pre-quoted shell command (as opposed to just the preprocessor file name) in --preprocessor will not work with --use-temp-file, because the code in `run_cmd` breaks the command into argv[] array, and will treat the quoted string ("gcc -E ....") as a file name to invoke.  This trick only works in the default pipe mode, because then the command is not broken into argv[] elements.

So I really think this trick should not be supported, definitely not when it invalidates correct usage.
Comment 15 Martin Storsjö 2021-05-10 19:14:35 UTC
(In reply to Eli Zaretskii from comment #13)
> So bottom line: the revert reverted the wrong commit.

Yup, exactly.

(In reply to Eli Zaretskii from comment #12)
> That means `windres` will be the only program which requires such strange
> invocation rules.  Every other program which does something similar doesn't
> require such "over-quoting".

Yes, there's no denying that it's an uncommon parameter definition.

> And I don't really understand why you are willing to require such strange
> rules: is that only because of some build scripts that use such broken
> multiple quoting?  IOW, instead of fixing things (and the fixes are really
> simple), we are forever doomed to be bug-compatible with them?  And only on
> Windows on top of that?

None of this is Windows specific; you've been able to call it with --preprocessor "gcc -xc -E -DRC_INVOKED ..." in the same way on both unix and Windows for the last 10 years. And if one wanted to specify the tool with a path with spaces (prior to the 2.36 release), that path would have to be double quoted - both on unix and Windows. (Admittedly, paths with spaces are much rarer on unix.)

Regarding arguing for/against this, I think existing cases where projects are using --preprocessor to pass multiple arguments are much much more common than cases where one wants to specify the executable with a path that needs to be quoted. And by going with the historical behaviour of the option, those users are left working, and there's still one way (which is uncomfortable though) to pass paths with spaces.

In many cases, it's indeed simple to rewrite things in a way that works with both old and new windres. In one case, https://git.videolan.org/?p=ffmpeg.git;a=blob;f=ffbuild/common.mak;h=32f5b997b5979c9799aafdf88d84fb4b6677ec80;hb=f8d910e90f599f338438833dfc92e2f1915ce414#l93, a makefile variable containing a number of preprocessor options (in most cases something like "-MMD -MF <arg> -MT <otherarg>") is passed in that way; rewriting that to use --preprocessor-arg requires a bit more changes.


I'm not dead set against changing it, but if changed, I would prefer that it isn't swept under the rug as "all these users were misusing the option" but own up to that the old usage was a seemingly valid way of using the option (the documentation almost hinted that it was supposed to be used that way) and we're willingly breaking those users for a clearer definition of the option.

> > Yes I agree that the most sane thing to do would be to have the user specify
> > -DSTRING="quotedstring", but there are projects that have ended up working
> > around this, so fixing this case does break them.
> 
> Those projects should get their act together, if you ask me.  It isn't hard,
> really.  Not my call, obviously, but still...

Not at all actually, so far (in releases) there was no way of doing that that didn't involve double quoting. But that's for a separate bug report.

(In reply to Eli Zaretskii from comment #14)
> And one more comment: using pre-quoted shell command (as opposed to just the
> preprocessor file name) in --preprocessor will not work with
> --use-temp-file, because the code in `run_cmd` breaks the command into
> argv[] array, and will treat the quoted string ("gcc -E ....") as a file
> name to invoke.  This trick only works in the default pipe mode, because
> then the command is not broken into argv[] elements.

No, I don't think so. Before 2.36, a pre-quoted argument in --preproessor would have ended up as <<<"path to/gcc.exe" -xc -E ...>>> in the cmd string, and run_cmd would split it correctly. (In 2.36, such arguments to --preprocessor don't work though.)

The command string, assembled from --preprocessor arguments, -D/-I and other options, either gets split into an argv[] array by run_cmd, or passed as such to popen() (where the shell does the same splitting), and arguments with spaces in the string need to be quoted the same in both cases.

So the cmd string (which is concatenated from the output of the 'quot' function, and the 'quot' function quotes things in a platform specific way) is defined to be escaped/quoted for the current platform shell - and the 'run_cmd' function would need to mirror those platform specific quoting nuances (e.g. regarding backslash/quote escaping).
Comment 16 Eli Zaretskii 2021-05-10 19:35:22 UTC
(In reply to Martin Storsjö from comment #15)
> (In reply to Eli Zaretskii from comment #13)
> > And one more comment: using pre-quoted shell command (as opposed to just the
> > preprocessor file name) in --preprocessor will not work with
> > --use-temp-file, because the code in `run_cmd` breaks the command into
> > argv[] array, and will treat the quoted string ("gcc -E ....") as a file
> > name to invoke.  This trick only works in the default pipe mode, because
> > then the command is not broken into argv[] elements.
> 
> No, I don't think so. Before 2.36, a pre-quoted argument in --preproessor
> would have ended up as <<<"path to/gcc.exe" -xc -E ...>>> in the cmd string,
> and run_cmd would split it correctly. (In 2.36, such arguments to
> --preprocessor don't work though.)

Terrific.  So we'd need to tell users say "\"path to gcc\" arguments...".  How marvellous.  Who can ever remember such syntax.
Comment 17 Thomas Wolff 2021-05-10 21:23:04 UTC
This is getting picky. I wouldn't say there's double quoting involved.
The string given to the -preprocessor arguments seems to be simply fed into a call of popen(), and that should be fine. It may contain quotes as needed. Quoting the whole argument again in the shell is not really double-quoting in this sense.
So "\"path to gcc\" arguments..." is neither a trick nor anything special to remember, it's normal syntax in such use cases.
Whether the wrong commit was reverted or not I cannot judge right now, I was only asking to restore the 2.35 behaviour which is consistent with general POSIX string handling and which supports everything anyone needs. Breaking such support after years is what's generally called a regression and it's generally and rightfully frowned upon.
Comment 18 Eli Zaretskii 2021-05-11 12:13:14 UTC
(In reply to Thomas Wolff from comment #17)
> This is getting picky. I wouldn't say there's double quoting involved.
> The string given to the -preprocessor arguments seems to be simply fed into
> a call of popen(), and that should be fine. It may contain quotes as needed.

The user isn't supposed to know that the program calls `popen`, nor should that knowledge change how users pass arguments to programs that need quoting.

Moreover, `windres` calls `popen` only in its default mode; you could change that with the --use-temp-file option, in which case the preprocessor will be invoked via `pexecute`, and that has a different set of implementation-defined perculiarities wrt quoted parts of the command.

> Quoting the whole argument again in the shell is not really double-quoting
> in this sense.

Only if you pass an argument that needs to remain quoted inside the program, as in -DSTRING=\"foo\".  Here, you want the preprocessor to #define STRING to have the value "foo", with the quotes.  The case which triggered this, where the quotes were used for file names or complete command lines, is not such a case: there the quotes were used to protect whitespace.

> So "\"path to gcc\" arguments..." is neither a trick nor anything special to
> remember, it's normal syntax in such use cases.

I disagree, see above.

> Whether the wrong commit was reverted or not I cannot judge right now, I was
> only asking to restore the 2.35 behaviour which is consistent with general
> POSIX string handling and which supports everything anyone needs. Breaking
> such support after years is what's generally called a regression and it's
> generally and rightfully frowned upon.

I disagree here as well.  If someone used the tool incorrectly, in direct contradiction to its docs, then backward compatibility doesn't apply, IMO.  But again, it isn't my call.
Comment 19 Martin Storsjö 2021-05-11 12:22:29 UTC
(In reply to Eli Zaretskii from comment #18)
> Moreover, `windres` calls `popen` only in its default mode; you could change
> that with the --use-temp-file option, in which case the preprocessor will be
> invoked via `pexecute`, and that has a different set of
> implementation-defined perculiarities wrt quoted parts of the command.

When windres calls pexecute, windres itself splits the string (which is in a quoted/escaped form suitable for passing to popen) in the run_cmd function, so run_cmd is supposed to split/unescape things in the same way as the shell would. So ideally, the distinction between those two souldn't be visible externally.

> > Whether the wrong commit was reverted or not I cannot judge right now, I was
> > only asking to restore the 2.35 behaviour which is consistent with general
> > POSIX string handling and which supports everything anyone needs. Breaking
> > such support after years is what's generally called a regression and it's
> > generally and rightfully frowned upon.
> 
> I disagree here as well.  If someone used the tool incorrectly, in direct
> contradiction to its docs, then backward compatibility doesn't apply, IMO. 

This is indeed the core of the issue.

Up until the 2.35 version, the option was documented as "This option may be used to specify the preprocessor to use, including any leading arguments. The default preprocessor argument is `gcc -E -xc-header -DRC_INVOKED`." I think that's rather clearly saying that this kind of behaviour was explicitly intended and allowed?
Comment 20 Eli Zaretskii 2021-05-11 12:46:08 UTC
(In reply to Martin Storsjö from comment #19)
> (In reply to Eli Zaretskii from comment #18)
> > Moreover, `windres` calls `popen` only in its default mode; you could change
> > that with the --use-temp-file option, in which case the preprocessor will be
> > invoked via `pexecute`, and that has a different set of
> > implementation-defined perculiarities wrt quoted parts of the command.
> 
> When windres calls pexecute, windres itself splits the string (which is in a
> quoted/escaped form suitable for passing to popen) in the run_cmd function,
> so run_cmd is supposed to split/unescape things in the same way as the shell
> would. So ideally, the distinction between those two souldn't be visible
> externally.

That's an incomplete description of what the code does.  It actually first quotes some parts of the command, then produces a single string with the full shell command, then (only if not using `popen`) splits the full string back into individual argv[] elements to pass to `pexecute`.  Would it surprise you that this process doesn't really support "double quoting" like the `popen` method does?

> Up until the 2.35 version, the option was documented as "This option may be
> used to specify the preprocessor to use, including any leading arguments.
> The default preprocessor argument is `gcc -E -xc-header -DRC_INVOKED`." I
> think that's rather clearly saying that this kind of behaviour was
> explicitly intended and allowed?

So the incompatible change is in the documentation, right?  The code just does what the documentation says it should, right?
Comment 21 Martin Storsjö 2021-05-11 13:29:42 UTC
(In reply to Eli Zaretskii from comment #20)
> (In reply to Martin Storsjö from comment #19)
> > Up until the 2.35 version, the option was documented as "This option may be
> > used to specify the preprocessor to use, including any leading arguments.
> > The default preprocessor argument is `gcc -E -xc-header -DRC_INVOKED`." I
> > think that's rather clearly saying that this kind of behaviour was
> > explicitly intended and allowed?
> 
> So the incompatible change is in the documentation, right?  The code just
> does what the documentation says it should, right?

No; in 21c33bcbe36377abf01614fb1b9be439a3b6de20 (Nov 2020) the implementation was changed to put full quotes around the string coming from --preprocessor; this broke passing extra leading arguments (as previously was documented as supported). In practice, popen() on windows might still have worked in some cases, but it no longer did on unix (and cygwin).

In 5edb8e3f5ad8d74a83fc0df7f6e4514eed0aa77f (Apr 2021) the documentation was changed to stop saying that "the default preprocessor argument is `gcc -E -xc-header -DRC_INVOKED`". In 749c700282097cf679ff019a9674d7c762f48619 (May 2021), which was supposed to restore the original --preprocessor behaviour, it further removed the statement that the --preprocessor option can be used to specify other leading arguments to the preprocessor.
Comment 22 Nick Clifton 2021-05-13 13:41:12 UTC
Hi Guys,

  I am hoping that we can all reach an agreement on how this issue should be handled.

  To be clear, my opinion is that previously the documentation was incorrect.
I do not believe that it was ever the intention that the --preprocessor option should also accept command line options.  It did, yes, and in fact it still does, but it should not be relied upon.  Arguments to the preprocessor should always be passed via the --preprocessor-arg option.

  So I think that the documentation is now correct.  But we still need to agree on how spaces should be handled.  I am not going to propose any more patches however.  I am going to leave that up to you guys.  Come to an agreement.  Propose a patch and I will review and apply it.

Cheers
  Nick
Comment 23 Eli Zaretskii 2021-05-13 14:15:47 UTC
(In reply to Nick Clifton from comment #22)
> Hi Guys,
> 
>   I am hoping that we can all reach an agreement on how this issue should be
> handled.
> 
>   To be clear, my opinion is that previously the documentation was incorrect.
> I do not believe that it was ever the intention that the --preprocessor
> option should also accept command line options.  It did, yes, and in fact it
> still does, but it should not be relied upon.  Arguments to the preprocessor
> should always be passed via the --preprocessor-arg option.
> 
>   So I think that the documentation is now correct.  But we still need to
> agree on how spaces should be handled.  I am not going to propose any more
> patches however.  I am going to leave that up to you guys.  Come to an
> agreement.  Propose a patch and I will review and apply it.

The problem here is that people disagree that --preprocessor should only accept the program's file name.  As long as this basic disagreement is unresolved, there could be no agreement about the patch.

Personally, I agree with Nick's opinion on this.  If this opinion is accepted, then the code as it was before cc3edc52747fd8b184ee48f1b0cc1ac0aca7832e was reverted, is 100% correct.  If Nick's opinion is NOT accepted, then at least commit 749c700282097cf679ff019a9674d7c762f48619 should be reverted, as it changed the function `quot` which is not related to handling of --preprocessor.  Can we at least agree on this latter point?
Comment 24 Martin Storsjö 2021-05-13 19:31:55 UTC
(In reply to Nick Clifton from comment #22)
>   To be clear, my opinion is that previously the documentation was incorrect.
> I do not believe that it was ever the intention that the --preprocessor
> option should also accept command line options.  It did, yes, and in fact it
> still does, but it should not be relied upon.  Arguments to the preprocessor
> should always be passed via the --preprocessor-arg option.

This is indeed the core of the issue.

I can't speculate on what the original intention of the option was, but the docs did say explicitly "including any leading arguments", and then also went on to mention the default option value including extra arguments. And as this documentation matched how it behaved in practice, I'd say it has been hard for users to guess that the intent of the option to be used that way.

I guess the alternative is to more clearly say that we're intentionally changing (or, already changed) the behaviour to clarify and simplify it, and are accepting some collateral breakage. It's primarily the faulting of the users for doing what the docs said, that I'm objecting to.

(In reply to Eli Zaretskii from comment #23)
> then at least commit
> 749c700282097cf679ff019a9674d7c762f48619 should be reverted, as it changed
> the function `quot` which is not related to handling of --preprocessor.  Can
> we at least agree on this latter point?

Yes, full agree on that part.
Comment 25 Thomas Wolff 2021-05-13 21:21:58 UTC
If you want an option with your preferred fancy behaviour, you should add a new option, like --preprocessor-prog, and not break an established interface. Whether the original design is appreciated or not, making incompatible changes is bad practice. Look at lots of aged Unix tools, most pertain old options for compatibility, whether nice or ugly. That's helpful for the community and good practice.
Comment 26 Johannes Schindelin 2021-05-14 14:47:35 UTC
(In reply to Nick Clifton from comment #22)
>   To be clear, my opinion is that previously the documentation was incorrect.
> I do not believe that it was ever the intention that the --preprocessor
> option should also accept command line options.  It did, yes, and in fact it
> still does,

So you agree that this behavior worked, for quite some times (years...). And that it worked as documented.

This means, of course, that the change in behavior was breaking backwards-compatibility.

Whether you liked the original behavior or not is therefore completely irrelevant: a backwards-incompatible change has been slipped in, without appropriate documentation or public announcing, let alone the appropriate deprecation period.

That is not okay.

> but it should not be relied upon.  Arguments to the preprocessor
> should always be passed via the --preprocessor-arg option.

Whatever you want to do, do it correctly. Do not just break backwards-compatibility nilly-willy. And certainly not (maliciously?) snuck into a "bug fix" release.
Comment 27 Eli Zaretskii 2021-05-14 15:09:27 UTC
(In reply to Johannes Schindelin from comment #26)
> 
> Whether you liked the original behavior or not is therefore completely
> irrelevant: a backwards-incompatible change has been slipped in, without
> appropriate documentation or public announcing, let alone the appropriate
> deprecation period.
> 
> That is not okay.
> 
> > but it should not be relied upon.  Arguments to the preprocessor
> > should always be passed via the --preprocessor-arg option.
> 
> Whatever you want to do, do it correctly. Do not just break
> backwards-compatibility nilly-willy. And certainly not (maliciously?) snuck
> into a "bug fix" release.

Could you please identify the commit which you think was "slipped in" or "(maliciously?) snuck" into a bugfix release?  There are a few separate changes referred to in this and related discussions, and it is not clear which one(s) you are alluding to.

If you are talking about the change which broke use of --preprocessor to specify a complete shell command, then AFAIU that change (commit 21c33bc) was in response to a bug report, PR 26865.
Comment 28 Johannes Schindelin 2021-05-14 15:30:23 UTC
https://github.com/bminor/binutils-gdb/commit/21c33bcbe36377abf01614fb1b9be439a3b6de20 caused the regression.

And yes, from its description it looks as if the intention was not malicious, but that the ramifications were just not considered. In other words, I have come to believe that the patch's author did not realize that this was introducing a breakage.