libabigail - suppression parsing proposal

Tue May 5 09:30:32 GMT 2020

Hi.

I've dusted this off and tidied it up a bit. It reflects a general
plan of work to improve libabigail's parsing of suppression
specifications.

Comments are very welcome.

Regards,
Giuliano.

# Overview

This is a proposal to improve the parsing of suppression
specifications so that errors are detected and reported to users.

It is intended to resolve
https://sourceware.org/bugzilla/show_bug.cgi?id=19608.

It can be added there, or to a new feature request bug, for further discussion.

# Background

Suppression handling in libabigail can be broken down into:

* file format parsing
* suppression specification parsing
* suppression specification glue logic and internal suppression generation
* suppression evaluation (either at load time or diff time)

## INI Config

Suppression specifications are given in an ini config file format:
named sections containing key = value properties. While most
configuration values are plain strings, some richer value types (such
as tuples) used in some cases. The ini config representation can also
be reserialised to text.

If an error occurs, the parser will typically return a null value
rather than something garbled. There are no error messages.

There is very little direct test coverage. However, there is indirect
testing via suppression specification and KMI whitelist parsing.

## Suppression Specifications

Parsing from ini config sections to suppression specifications is mostly
done by code in abg-suppression.cc. There is also some code, relating
to function_call_expr, that lives (but probably should not) in
abg-ini.{h,cc}.

There is a large amount of duplicated parsing code. Every duplicated
code path is an extra place for bugs to hide and that needs tests to
cover. Changes are costlier and riskier the more places that need to
be touched.

There is no parsing error checking. Most bad configuration is skipped
silently. In the case of regexes, they are compiled on first use and,
if they fail to compile, recompiled on every use but silently
ignored. Similarly, there are no checks on internally-generated
specifications.

The section parsers look for known properties in the input rather than
looking at the input and seeing if a given property is known or not.
Unrecognised (misspelled) properties are skipped silently.

Presence / absence information is handled in a various different ways
for different types (strings etc.) but for enumerated values (and
booleans), there are multiple different approaches, including bundling
enums with presence booleans.

There is a small amount of other logic mixed in with the parsing
logic, generally extra field validation or manipulation of fields.

## Suppression Evaluation

Mostly out of scope for this proposal. However, some of the parsing
issues translate over to the how suppressions are evaluated: code
duplication and inconsistent handling of similar things. There is
diverse interpretation of presence / absence information. In general,
it is *hard* to reason how a suppression specification with a few
related properties will behave in practice.

# Proposal

## High level goals

- error checking and reporting
- improved maintainability
- greater (proportional) test coverage
- bug fixes
- fewer surprising behaviours

## Plan for ini parser

(Plan A) Audit the ini parser, add diagnostic messages (to std::cerr)
and add more tests. The bundled code for parsing "function call
expressions" should be moved elsewhere. Fix the known string escaping
bug in the ini printer so that parse(print(x) = x.

(Plan B) Consider whether a drop-in replacement using an existing
library would be worth it. The interface might be very different and
impose a significant integration cost. A new library may have its own
issues too.

## Internally-generated specifications

Add basic error checks (for regex compilation).

## Suppression specification parser

Every parsing function must return a parse success status and check
the status of parsing operations it itself performs.

All clear errors should generate a diagnostic message and result in a
parse error being returned. Useless or overridden configuration should
generate warning messages.

Aggressively refactor the code. Every piece of common parsing code
must be factored out into its own documented function which ideally
should be unit tested to the point that each success and failure case
is exercised.

Make presence / absence handling more uniform. For a given component
of a suppression specification it should be clear what the user intent
was after parsing is complete. In practice this would mean either
documenting that empty string values are equivalent to absent values
or changing the data representation from string to something with an
explicit absence value like unique_ptr<string> or optional<string>.

Separate miscellaneous logic from parsing logic and review it. There
is a limited amount of logic (relating to things like "drop"
properties) but it should be separate from the main parsing logic.
There will be opportunities to tighten or add further semantic checks
and changes once parsing is simplified.

Consider whether suppression specifications should be completely
devoid of logic and if so, investigate turning them into plain structs.

Switch to a scheme where the parser has a global view of known /
unknown property names so that errors caused by typos can be detected
and reported. Such a scheme must be data-driven (to avoid duplication
of code).