Bug 32003 - Specifying --package-metadata might not be possible and is too fragile
Summary: Specifying --package-metadata might not be possible and is too fragile
Status: NEW
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.44 (HEAD)
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-22 12:09 UTC by Benjamin Drung
Modified: 2024-07-25 11:49 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2024-07-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Drung 2024-07-22 12:09:00 UTC
In Ubuntu we enabled setting ELF package metadata for the Debian package that we build starting from Ubuntu 24.10 (oracular) on. Specifying the linker flag --package-metadata is not possible in a robust way. All tried approaches are either not working or too fragile:

1. The comma in the JSON value is used to split the -Wl parameter specified for gcc:

```
$ echo "void main() { }" > test.c
$ gcc '-Wl,--package-metadata={"type":"deb","os":"ubuntu"}' test.c 
/usr/bin/ld: cannot find "os":"ubuntu"}: No such file or directory
collect2: error: ld returned 1 exit status
```

2. The quotation marks in the JSON value are eaten by configure scripts and Makefiles. Example:

```
$ echo "void main() { }" > test.c
$ printf 'test:\n\tgcc $(CFLAGS) test.c\n' > Makefile
$ env CFLAGS='-Wl,--package-metadata={"type":"deb"}' make
gcc -Wl,--package-metadata={"type":"deb"} test.c
/usr/bin/ld: warning: --package-metadata={type:deb} does not contain valid JSON, ignoring: string or '}' expected near 'type'
```

3. Add `-specs=<spec-file>` to the gcc linker flags. Then this spec file could construct the package metadata parameter. Example spec file:

```
$ cat /usr/share/dpkg/elf-package-metadata.specs
*link:
+ --package-metadata={\"type\":\"deb\",\"os\":\"%:getenv(DEB_BUILD_OS_RELEASE_ID \",\"name\":\"%:getenv(DEB_SOURCE \",\"version\":\"%:getenv(DEB_VERSION \",\"architecture\":\"%:getenv(DEB_HOST_ARCH \"}))))
```

Issue with that approach: It requires the spec file to be around and the environment variables to be set. This will be the case during the package build, but not at a later stage. See https://launchpad.net/bugs/2071468 for examples where this breaks.

## Proposed solution

Add support for an `--escaped-package-metadata` parameter to the linkers that takes a percent encoded (RFC 3986) parameter. Example:
```
-Wl,--encoded-package-metadata,%7B%22type%22:%22deb%22%2C%22os%22:%22ubuntu%22%2C%22name%22:%22dpkg%22%2C%22version%22:%221.22.6ubuntu15%22%2C%22architecture%22:%22amd64%22%7D
```
Comment 1 Benjamin Drung 2024-07-22 12:14:46 UTC
Proposed --encoded-package-metadata patches:
* ld patch v4: https://sourceware.org/pipermail/binutils/2024-July/135769.html
* gold patch v3: https://sourceware.org/pipermail/binutils/2024-July/135796.html
Comment 2 H.J. Lu 2024-07-22 20:46:10 UTC
(In reply to Benjamin Drung from comment #0)
> In Ubuntu we enabled setting ELF package metadata for the Debian package
> that we build starting from Ubuntu 24.10 (oracular) on. Specifying the
> linker flag --package-metadata is not possible in a robust way. All tried
> approaches are either not working or too fragile:
> 
> 1. The comma in the JSON value is used to split the -Wl parameter specified
> for gcc:
> 
> ```
> $ echo "void main() { }" > test.c
> $ gcc '-Wl,--package-metadata={"type":"deb","os":"ubuntu"}' test.c 
> /usr/bin/ld: cannot find "os":"ubuntu"}: No such file or directory
> collect2: error: ld returned 1 exit status
> ```

This is a real issue.

> 2. The quotation marks in the JSON value are eaten by configure scripts and
> Makefiles. Example:
> 
> ```
> $ echo "void main() { }" > test.c
> $ printf 'test:\n\tgcc $(CFLAGS) test.c\n' > Makefile
> $ env CFLAGS='-Wl,--package-metadata={"type":"deb"}' make
> gcc -Wl,--package-metadata={"type":"deb"} test.c
> /usr/bin/ld: warning: --package-metadata={type:deb} does not contain valid
> JSON, ignoring: string or '}' expected near 'type'
> ```

The linker testcase has

"--package-metadata='{\"foo\":\"bar\"}'"

which is how this option should be used.

> 3. Add `-specs=<spec-file>` to the gcc linker flags. Then this spec file
> could construct the package metadata parameter. Example spec file:
> 
> ```
> $ cat /usr/share/dpkg/elf-package-metadata.specs
> *link:
> +
> --package-metadata={\"type\":\"deb\",\"os\":\"%:
> getenv(DEB_BUILD_OS_RELEASE_ID \",\"name\":\"%:getenv(DEB_SOURCE
> \",\"version\":\"%:getenv(DEB_VERSION
> \",\"architecture\":\"%:getenv(DEB_HOST_ARCH \"}))))
> ```
> 
> Issue with that approach: It requires the spec file to be around and the
> environment variables to be set. This will be the case during the package
> build, but not at a later stage. See https://launchpad.net/bugs/2071468 for
> examples where this breaks.
> 
> ## Proposed solution
> 
> Add support for an `--escaped-package-metadata` parameter to the linkers
> that takes a percent encoded (RFC 3986) parameter. Example:
> ```
> -Wl,--encoded-package-metadata,%7B%22type%22:%22deb%22%2C%22os%22:
> %22ubuntu%22%2C%22name%22:%22dpkg%22%2C%22version%22:%221.22.
> 6ubuntu15%22%2C%22architecture%22:%22amd64%22%7D
> ```

It is very much unreadable.  The main issue is that compiler drivers eat
comma.  Can we update linker to support an escape for comma which won't be
eaten by compiler drivers?
Comment 3 H.J. Lu 2024-07-22 23:51:21 UTC
(In reply to H.J. Lu from comment #2)
> It is very much unreadable.  The main issue is that compiler drivers eat
> comma.  Can we update linker to support an escape for comma which won't be
> eaten by compiler drivers?

One possibility is to treat "\comma" as ','.
Comment 4 Jan Beulich 2024-07-23 10:37:12 UTC
(In reply to H.J. Lu from comment #3)
> One possibility is to treat "\comma" as ','.

Yet introducing another escape character would come with further complexity. It also looks to me as if the %-encoding of double-quotes dominated that of commas in the example provided.

I'm inclined to suggest to extend the %-encoding scheme instead; that way what's going to be in 2.43 is not at risk of subtle breakage when support further escaping sequences is added. All that needs to be made sure is for existing escaping sequences to retain their original meaning.

What the best scheme here would be is likely better determined by people who make active use of the feature (knowing what set of characters it is that frequently would need escaping one way or the other). I'd be inclined to borrow naming e.g. from HTML's Named Character References, and then use, say, "%[comma]" or "%comma;" to have proper delimiting on both ends.
Comment 5 H.J. Lu 2024-07-23 12:57:29 UTC
I am against the proposed solution.
The solution should be as close to
the normal JSON syntax as possible
and linker tests should cover +90% of
use cases.
Comment 6 Benjamin Drung 2024-07-23 16:17:18 UTC
(In reply to H.J. Lu from comment #2)
> > 2. The quotation marks in the JSON value are eaten by configure scripts and
> > Makefiles. Example:
> > 
> > ```
> > $ echo "void main() { }" > test.c
> > $ printf 'test:\n\tgcc $(CFLAGS) test.c\n' > Makefile
> > $ env CFLAGS='-Wl,--package-metadata={"type":"deb"}' make
> > gcc -Wl,--package-metadata={"type":"deb"} test.c
> > /usr/bin/ld: warning: --package-metadata={type:deb} does not contain valid
> > JSON, ignoring: string or '}' expected near 'type'
> > ```
> 
> The linker testcase has
> 
> "--package-metadata='{\"foo\":\"bar\"}'"
> 
> which is how this option should be used.

That linker testcase snippet is evaluated to:

--package-metadata='{"foo":"bar"}'

Passing that to a shell or makefile works, because the singe quotes protect the JSON content inside. But this escaping protects this snippet from evaluating only once. The problem is that the linker flag can be passed around and evaluated multiple times. This will happen easily during Debian package build, where debian/rules is a Makefile that calls the project build system (that can use makefiles as well).

Simple example that shows the problem:

```
$ cat example.c 
#include <stdio.h>

void main() {
    printf("Hello world!");
}
$ cat Makefile2
all: example

%: %.c
	gcc $(LDFLAGS) -o $@ $<

.PHONY: all
$ cat Makefile
all:
	make -f Makefile2 all LDFLAGS=$(LDFLAGS)

.PHONY: all
$ LANG=C make LDFLAGS="-Wl,--package-metadata='{\"foo\":\"bar\"}'"
make -f Makefile2 all LDFLAGS=-Wl,--package-metadata='{"foo":"bar"}'
make[1]: Entering directory 'makefile2'
gcc -Wl,--package-metadata={"foo":"bar"} -o example example.c
/usr/bin/ld: warning: --package-metadata={foo:bar} does not contain valid JSON, ignoring: string or '}' expected near 'foo'
make[1]: Leaving directory 'makefile2'
```

I am very confident to find a real world Debian package that will resemble this example. There is no way to programmatically determine how many times the linker flag needs to be escaped for each individual Debian package build. Fixing those cases by quoting will be a lot of work and might not cover all cases (i.e. it will be a fragile approach).

> > ## Proposed solution
> > 
> > Add support for an `--escaped-package-metadata` parameter to the linkers
> > that takes a percent encoded (RFC 3986) parameter. Example:
> > ```
> > -Wl,--encoded-package-metadata,%7B%22type%22:%22deb%22%2C%22os%22:
> > %22ubuntu%22%2C%22name%22:%22dpkg%22%2C%22version%22:%221.22.
> > 6ubuntu15%22%2C%22architecture%22:%22amd64%22%7D
> > ```
> 
> It is very much unreadable.  The main issue is that compiler drivers eat
> comma.  Can we update linker to support an escape for comma which won't be
> eaten by compiler drivers?

The encoding is flexible and you could just encode the characters that are problematic in your case:

-Wl,--encoded-package-metadata,{"type":"deb"%2C"os":"ubuntu"%2C"name":"dpkg"%2C"version":"1.22.6ubuntu15"%2C"architecture":"amd64"}
Comment 7 H.J. Lu 2024-07-23 20:31:49 UTC
(In reply to Benjamin Drung from comment #6)

> The encoding is flexible and you could just encode the characters that are
> problematic in your case:
> 
> -Wl,--encoded-package-metadata,{"type":"deb"%2C"os":"ubuntu"%2C"name":
> "dpkg"%2C"version":"1.22.6ubuntu15"%2C"architecture":"amd64"}

Doesn't this have the same issue with '"'?
Comment 8 Benjamin Drung 2024-07-23 22:20:09 UTC
(In reply to H.J. Lu from comment #7)
> (In reply to Benjamin Drung from comment #6)
> 
> > The encoding is flexible and you could just encode the characters that are
> > problematic in your case:
> > 
> > -Wl,--encoded-package-metadata,{"type":"deb"%2C"os":"ubuntu"%2C"name":
> > "dpkg"%2C"version":"1.22.6ubuntu15"%2C"architecture":"amd64"}
> 
> Doesn't this have the same issue with '"'?

Yes, this example parameter needs to be quoted in case it is used in shell (or multiple times quoted in case it is parsed multiple times). I just wanted to make the point that the percent-encoding can be used sparsely to only encode the characters that are problematic in the specific use-case. So in case you control the whole build pipeline, you can quote just the comma. If you want to pass this flag into a random projects, you can encode the quotation marks as well.
Comment 9 H.J. Lu 2024-07-23 22:35:31 UTC
--package-metadata doesn't work with shell, compiler driver nor make.  The new
solution should support for JSON codes with shell, compiler driver and make.
It should be human readable.   For non-working --package-metadata, we should
either remove it or fix it.
Comment 10 Luca Boccassi 2024-07-23 22:40:03 UTC
(In reply to H.J. Lu from comment #9)
> For non-working --package-metadata, we should either remove it or fix it.

Sorry, but this is absolutely wrong, as the existing option works just fine. I do not mind if you add other options, but the existing one cannot be removed, as it is in active use in production, and it will remain in active use in production for the foreseeable future - I have no intention nor plan of stopping its use, even if there are alternatives. It works just fine either directly with some escaping, or indirectly via a spec file. Again, no problem with adding alternative options if you want to have them, absolutely fine to do so, but just not at the expense of the current one.
Comment 11 Benjamin Drung 2024-07-23 22:46:23 UTC
(In reply to H.J. Lu from comment #9)
> It should be human readable.

What do you recommend? IMO percent-escaping is readable enough and increases the size of the already long string not too much.

The encoding of the JSON {"type":"deb","os":"ubuntu","name":"dpkg","version":"1.22.6ubuntu15","architecture":"amd64"} would be:

-Wl,--encoded-package-metadata=%7B%22type%22:%22deb%22%2C%22os%22:%22ubuntu%22%2C%22name%22:%22dpkg%22%2C%22version%22:%221.22.6ubuntu15%22%2C%22architecture%22:%22amd64%22%7D

At first it might look confusing, but the relevant strings can be seen on a second look: "type", "deb", "os", "ubuntu", "name", "dpkg", "version", "1.22.6ubuntu15", "architecture", "amd64". Only the beginning of the version number is harder to see.

There are multiple tools that can encode/decode it. For example Python's urllib.parse.unquote and urllib.parse.quote.

I am open for better encodings. I am open for making --package-metadata percent-decode the value instead of adding a new parameter. Percents are relative safe encoding option. The Debian package name and the Debian version are not allowed to contain percents. The os, type, and architecture will not have percents in them.
Comment 12 H.J. Lu 2024-07-23 22:57:17 UTC
(In reply to Benjamin Drung from comment #11)
> (In reply to H.J. Lu from comment #9)
> > It should be human readable.
> 
> What do you recommend? IMO percent-escaping is readable enough and increases
> the size of the already long string not too much.
> 
> The encoding of the JSON
> {"type":"deb","os":"ubuntu","name":"dpkg","version":"1.22.6ubuntu15",
> "architecture":"amd64"} would be:
> 
> -Wl,--encoded-package-metadata=%7B%22type%22:%22deb%22%2C%22os%22:
> %22ubuntu%22%2C%22name%22:%22dpkg%22%2C%22version%22:%221.22.
> 6ubuntu15%22%2C%22architecture%22:%22amd64%22%7D
> 
> At first it might look confusing, but the relevant strings can be seen on a
> second look: "type", "deb", "os", "ubuntu", "name", "dpkg", "version",
> "1.22.6ubuntu15", "architecture", "amd64". Only the beginning of the version
> number is harder to see.
> 
> There are multiple tools that can encode/decode it. For example Python's
> urllib.parse.unquote and urllib.parse.quote.
> 
> I am open for better encodings. I am open for making --package-metadata
> percent-decode the value instead of adding a new parameter. Percents are
> relative safe encoding option. The Debian package name and the Debian
> version are not allowed to contain percents. The os, type, and architecture
> will not have percents in the

%22 isn't human readable.  Do we need to escape { and }? We need to escape "
and ,.  Should $ be supported in JSON code? Will "%[string]" escape work?
Comment 13 Benjamin Drung 2024-07-23 23:12:35 UTC
(In reply to H.J. Lu from comment #12)
> (In reply to Benjamin Drung from comment #11)
> > (In reply to H.J. Lu from comment #9)
> > > It should be human readable.
> > 
> > What do you recommend? IMO percent-escaping is readable enough and increases
> > the size of the already long string not too much.
> > 
> > The encoding of the JSON
> > {"type":"deb","os":"ubuntu","name":"dpkg","version":"1.22.6ubuntu15",
> > "architecture":"amd64"} would be:
> > 
> > -Wl,--encoded-package-metadata=%7B%22type%22:%22deb%22%2C%22os%22:
> > %22ubuntu%22%2C%22name%22:%22dpkg%22%2C%22version%22:%221.22.
> > 6ubuntu15%22%2C%22architecture%22:%22amd64%22%7D
> > 
> > At first it might look confusing, but the relevant strings can be seen on a
> > second look: "type", "deb", "os", "ubuntu", "name", "dpkg", "version",
> > "1.22.6ubuntu15", "architecture", "amd64". Only the beginning of the version
> > number is harder to see.
> > 
> > There are multiple tools that can encode/decode it. For example Python's
> > urllib.parse.unquote and urllib.parse.quote.
> > 
> > I am open for better encodings. I am open for making --package-metadata
> > percent-decode the value instead of adding a new parameter. Percents are
> > relative safe encoding option. The Debian package name and the Debian
> > version are not allowed to contain percents. The os, type, and architecture
> > will not have percents in the
> 
> %22 isn't human readable.  Do we need to escape { and }?

Maybe escaping { and } is not needed.

> We need to escape " and ,.

Those two are definitively needed to be escaped.

> Should $ be supported in JSON code?

$ needs to be escaped for shells. Theoretically $ might be part of a string.

> Will "%[string]" escape work?

Like this?

-Wl,--encoded-package-metadata={%[quot]type%[quot]:%[quot]deb%[quot]%[comma]%[quot]os%[quot]:%[quot]ubuntu%[quot]%[comma]%[quot]name%[quot]:%[quot]dpkg%[quot]%[comma]%[quot]version%[quot]:%[quot]1.22.6ubuntu15%[quot]%[comma]%[quot]architecture%[quot]:%[quot]amd64%[quot]}

This might work, but it is much longer. IMO on a scale from human readable to random character this is insignificantly more readable. A longer parameter makes it harder to find the relevant log output (in case of problems unrelated to the package metadata) and the log files will be bigger. Debian package tend to print all calls with all compiler parameters.
Comment 14 H.J. Lu 2024-07-23 23:32:01 UTC
(In reply to Benjamin Drung from comment #13)

> > Will "%[string]" escape work?
> 
> Like this?
> 
> -Wl,--encoded-package-metadata={%[quot]type%[quot]:
> %[quot]deb%[quot]%[comma]%[quot]os%[quot]:
> %[quot]ubuntu%[quot]%[comma]%[quot]name%[quot]:
> %[quot]dpkg%[quot]%[comma]%[quot]version%[quot]:%[quot]1.22.
> 6ubuntu15%[quot]%[comma]%[quot]architecture%[quot]:%[quot]amd64%[quot]}

It should be %[quote]".  Will adding support for "%[string]" to existing
--package-metadata option break anything?

> This might work, but it is much longer. IMO on a scale from human readable
> to random character this is insignificantly more readable. A longer
> parameter makes it harder to find the relevant log output (in case of
> problems unrelated to the package metadata) and the log files will be

Is this a real problem? "grep" should work.

> bigger. Debian package tend to print all calls with all compiler parameters.
Comment 15 Jan Beulich 2024-07-24 06:09:48 UTC
(In reply to Benjamin Drung from comment #6)
> That linker testcase snippet is evaluated to:
> 
> --package-metadata='{"foo":"bar"}'
> 
> Passing that to a shell or makefile works, because the singe quotes protect
> the JSON content inside. But this escaping protects this snippet from
> evaluating only once. The problem is that the linker flag can be passed
> around and evaluated multiple times. This will happen easily during Debian
> package build, where debian/rules is a Makefile that calls the project build
> system (that can use makefiles as well).

That, however, is a problem of the build system. Passing around potentially quoted strings needs special care, to retain quotation. The issue isn't unique to Debian; see e.g. the Linux kernel's "escsq" and its uses (and how it further protects e.g. # and $$).

Imo it's just the commas which are the main problem here, as -Wl,... in the compiler has - afaik - no way of escaping them.
Comment 16 Jan Beulich 2024-07-24 06:11:29 UTC
(In reply to H.J. Lu from comment #14)
> It should be %[quote]".  Will adding support for "%[string]" to existing
> --package-metadata option break anything?

You won't know until someone reports a problem. We simply don't know what people might be using.
Comment 17 H.J. Lu 2024-07-24 06:47:59 UTC
(In reply to Jan Beulich from comment #16)
> (In reply to H.J. Lu from comment #14)
> > It should be %[quote]".  Will adding support for "%[string]" to existing
> > --package-metadata option break anything?
> 
> You won't know until someone reports a problem. We simply don't know what
> people might be using.

It is a new option and only supports very limited JSON code.  Can Debian people
provide feedbacks on adding "%[string]" to existing --package-metadata option?
Comment 18 H.J. Lu 2024-07-24 06:50:13 UTC
(In reply to Jan Beulich from comment #15)
> (In reply to Benjamin Drung from comment #6)
> > That linker testcase snippet is evaluated to:
> > 
> > --package-metadata='{"foo":"bar"}'
> > 
> > Passing that to a shell or makefile works, because the singe quotes protect
> > the JSON content inside. But this escaping protects this snippet from
> > evaluating only once. The problem is that the linker flag can be passed
> > around and evaluated multiple times. This will happen easily during Debian
> > package build, where debian/rules is a Makefile that calls the project build
> > system (that can use makefiles as well).
> 
> That, however, is a problem of the build system. Passing around potentially
> quoted strings needs special care, to retain quotation. The issue isn't
> unique to Debian; see e.g. the Linux kernel's "escsq" and its uses (and how
> it further protects e.g. # and $$).
> 
> Imo it's just the commas which are the main problem here, as -Wl,... in the
> compiler has - afaik - no way of escaping them.

'"' may be an issue for make as in comment #6.
Comment 19 Sam James 2024-07-24 07:30:50 UTC
Just to note that I expect us to have a use for this on our side but I've not had any time for me or someone else to look into it or test it yet.
Comment 20 Andreas K. Huettel 2024-07-24 08:24:50 UTC
The general idea is good, but, man, seriously, this feels like you're reinventing the wheel multiple times....

> 1. The comma in the JSON value is used to split the -Wl parameter specified
> for gcc:

Yep. Won't work. So let's not pass raw json.

And as someone who regularly has to look at stuff that is passed around between bash, python, and perl I stronly advise to rely too much on escaping.

> 2. The quotation marks in the JSON value are eaten by configure scripts and
> Makefiles. Example:

Yep. Won't work. So let's not pass raw json.

Dito.

> Add support for an `--escaped-package-metadata` parameter to the linkers
> that takes a percent encoded (RFC 3986) parameter. Example:

Well, this works, but makes the metadata block pretty much unreadable.
Please not.

> 3. Add `-specs=<spec-file>` to the gcc linker flags. Then this spec file
> could construct the package metadata parameter. Example spec file:
> 
> ```
> Issue with that approach: It requires the spec file to be around and the
> environment variables to be set.

And exactly this is the *much* easier solution. 
Fill in the values of the variables at build time in, e.g., a json file, and then pass the filename on the command line.

You definitely don't want to hardcode environment lookups or any sort of similar evaluation into your metadata (imagine your field for DEB_HOST_ARCH being `cat /dev/zero /dev/root`)... Simple strings that are not interpreted in any way are the best option.
Comment 21 Luca Boccassi 2024-07-25 11:49:23 UTC
(In reply to Luca Boccassi from comment #10)
> (In reply to H.J. Lu from comment #9)
> > For non-working --package-metadata, we should either remove it or fix it.
> 
> Sorry, but this is absolutely wrong, as the existing option works just fine.
> I do not mind if you add other options, but the existing one cannot be
> removed, as it is in active use in production, and it will remain in active
> use in production for the foreseeable future - I have no intention nor plan
> of stopping its use, even if there are alternatives. It works just fine
> either directly with some escaping, or indirectly via a spec file. Again, no
> problem with adding alternative options if you want to have them, absolutely
> fine to do so, but just not at the expense of the current one.

Just to be clear, changing the existing option to also take in escaped input, as long as it is backward compatible and still accepts normal input, would also be absolutely fine by me.