[PATCH v2] libebl: recognize FDO Packaging Metadata ELF note

Mark Wielaard mark@klomp.org
Tue Nov 30 20:04:31 GMT 2021


Hi,

On Tue, Nov 30, 2021 at 05:49:58PM +0100, Florian Weimer wrote:
> * Frank Ch. Eigler via Elfutils-devel:
> > On Tue, Nov 30, 2021 at 12:25:41PM +0100, Mark Wielaard wrote:
> >> [...]
> >> The spec does explain the requirements for JSON numbers, but doesn't
> >> mention any for JSON strings or JSON objects. It would be good to also
> >> explain how to make the strings and objects unambiguous. [...]
> >> For Objects it should require that all names are unique. [...]
> >> For Strings it should require that \uXXXX escaping isn't used [...]
> >> 
> >> That should get rid of various corner cases that different parsers are
> >> known to get wrong. 
> >
> > Are such buggy parsers seen in the wild, now, after all this time with
> > JSON?  It seems to me it's not really elfutils' or systemd's place to
> > impose -additional- constraints on customary JSON.
> 
> JSON has been targeted at the Windows/Java UTF-16 world, there is always
> going to be a mismatch if you try to represent it in UTF-8 or anything
> that doesn't have surrogate pairs.

Right, the \uhex escaping is kind of odd when using UTF-8 because you
suddenly have to use surrogate pairs. It is correct that you may use
them according to the RFC. But the RFC also points out that if you do
there are various odd/bad corner cases to take care of and that it
makes comparing strings really awkward.

> >> Especially \uXXXX escaping is really confusing when using the UTF-8
> >> encoding (and it is totally necessary since UTF-8 can express any
> >> valid UTF character already).
> >
> > Yes, and yet we have had the bidi situation recently where UTF-8 raw
> > codes could visually confuse a human reader whereas escaped \uXXXX
> > wouldn't.  If we forbid \uXXXX unilaterally, we literally become
> > incompatible with JSON (RFC8259 7. String. "Any character may be
> > escaped."), and for what?
> 
> RFC 8259 says this:
> 
>    However, the ABNF in this specification allows member names and
>    string values to contain bit sequences that cannot encode Unicode
>    characters; for example, "\uDEAD" (a single unpaired UTF-16
>    surrogate).  Instances of this have been observed, for example, when
>    a library truncates a UTF-16 string without checking whether the
>    truncation split a surrogate pair.  The behavior of software that
>    receives JSON texts containing such values is unpredictable; for
>    example, implementations might return different values for the length
>    of a string value or even suffer fatal runtime exceptions.
> 
> A UTF-8 environment has to enforce *some* additional constraints
> compared to the official JSON syntax.

Right, all recommendations for encoding the JSON object, string and
numbers actually come from the RFC. I don't think it is odd to
explicitly specify the encoding of the package note as using a zero
terminated UTF-8 string and to specify the allowed JSON string and
number encodings. The contents of the package note is meant to be
human readable, so excluding control characters and excotic escape
sequence in the expected key/value pairs seems natural to me.

Cheers,

Mark



More information about the Elfutils-devel mailing list