Dodji Seketeli [Wed, 20 Oct 2021 12:42:05 +0000 (14:42 +0200)]
PR28365 - Assert on empty typedef on webkit2gtk3-jsc-2.32.3-1.fc34.x86_64
When doing self-comparison check of
/usr/lib64/libjavascriptcoregtk-4.0.so.18.18.7 from
webkit2gtk3-jsc-2.32.3-1.fc34.x86_64, reading back the abixml file
fails because an empty typedef is used as the element type of an
array.
The empty typedef is there (in a transient manner) because the typedef
is being built. First an empty typedef is built and then its
underlying type is built. During the construction of the underlying
type however (an enum), the empty typedef itself is used (as the
naming typedef of the enum). But because its empty, an assert is
violated during the construction of an array which element type is the
(empty) typedef. A snake eating its own (half-baked) tail, so to
speak.
The patch fixes the issue by constructing the underlying (enum) type
first. Once its constructed, then it's used to construct the typedef
which is thus never empty, even in a transient manner.
The patch adjusts the building of enums so that the naming typedef is
built only once the enum itself is fully constructed. This breaks the
vicious cycle exposed above.
The offending RPM is too big to be added to the test suite. Which
argues (yet again) for the implementation of a separate test suite
that runs libabigail tests on a huge pile of RPMs without having to
embed them in the tarball. We really ought to start that project.
* src/abg-reader.cc (build_enum_type_decl): Set the naming typedef
only after the enum is created and keyed.
(build_typedef_decl): Build the underlying type of the typedef
first.
Giuliano Procida [Mon, 11 Oct 2021 13:17:09 +0000 (14:17 +0100)]
Tweak clang-format configuration
These are the updates:
AlignConsecutiveDeclarations: false
- the dominant style in libabigail is not to align
AllowShortBlocksOnASingleLine: Always
AllowShortEnumsOnASingleLine: true
AllowShortFunctionsOnASingleLine: All
AllowShortLambdasOnASingleLine: All
- the libabigail style favours short things on a single line
Cpp11BracedListStyle: true
- this seems to improve some initialiser syntax
BinPackArguments: false
- we already turn this off for parameters
SpaceAfterCStyleCast: true
- this is the libabigail style
* .clang-format: Various tweaks to Clang format configuration.
Dodji Seketeli [Mon, 18 Oct 2021 10:53:40 +0000 (12:53 +0200)]
Add debug info package for wireshark-cli-3.4.9-1.fc36.x86_64.rpm
I forgot to add the wireshark-cli-3.4.9-1.fc36.x86_64.rpm debug info
package for the test entry that uses it in tests/test-diff-pkg.cc.
It's not necessary on the x86-64 platform, but on many others, it
seems the alternate debug info contained in that package is needed.
So I am adding it in here.
* tests/data/test-diff-pkg/wireshark/wireshark-debuginfo-3.4.9-1.fc36.x86_64.rpm:
Add new debug info package.
* tests/data/Makefile.am: Add it to the source distribution.
* tests/test-diff-pkg.cc: Use the new debug info package in the
test harness.
Dodji Seketeli [Thu, 14 Oct 2021 17:02:59 +0000 (19:02 +0200)]
Bug 28364 - libwiretap fails self comparison
In this case, thanks to all the debugging infrastructure in place,
especially the canonical type debugging infrastructure, I was able to
notice that there was a canonicalization error on a function type when
reading the libwiretap binary as in:
$ build/tools/abidw --debug-tc /usr/lib64/libwiretap.so.11.0.8
structural & canonical equality different for type: function type void (wtap*)
in compare_types_during_canonicalization at: /home/dodji/git/libabigail/PR28364/src/abg-ir.cc:13575: execution should not have reached this point!
Abandon (core dumped)
$
When digging deeper, I noticed that, in the DWARF reader, when
building a function type, we are associating a "textual
representation" of the function type 'void (wtap*)' to its DIE (inside
the current translation) way too early.
By too early, I mean, the association was done before the function
type was fully 'built'. Its parameters were not 'collected', for
instance. So that means that a 'pointer to that function type' could
be formed, with a wrong representation, during the time where the
function type wasn't fully formed. Just moving the association to
after the type was fully constructed solved the issue.
This one was hard to spot!
Later, this uncovered the fact that we could now have (and thus
serialize) member functions of unions. And it turned out the abixml
reader didn't expect those. Oops. I fixed that one as well.
* src/abg-dwarf-reader.cc (build_function_type): Associate the DIE
representation to the constructed type once it's fully built.
* src/abg-reader.cc (build_function_type): Support member function of unions.
* tests/data/Makefile.am: Add the new test input files to the
source distribution.
* tests/data/test-diff-pkg/wireshark/wireshark-cli-3.4.9-1.fc36.x86_64-self-check-report.txt:
Add new test input file.
* tests/data/test-diff-pkg/wireshark/wireshark-cli-3.4.9-1.fc36.x86_64.rpm:
Likewise.
* tests/data/test-diff-pkg/wireshark/wireshark-cli-debuginfo-3.4.9-1.fc36.x86_64.rpm:
Likewise.
* tests/test-diff-pkg.cc (in_out_specs): Add these new test input
files to this test harness.
* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Adjust.
Dodji Seketeli [Thu, 14 Oct 2021 14:56:23 +0000 (16:56 +0200)]
writer: Don't forget that a naming typedef is referenced
When looking into something else, I noticed that when emitting the
'naming-typedef' property of class, the typedef wasn't categorized as
a referenced type. So sometimes the writer could forget to emit the
naming typedef itself later. Fixed thus.
* src/abg-writer.cc (write_naming_typedef): Notice that the naming
typedef is referenced.
Dodji Seketeli [Thu, 14 Oct 2021 14:38:18 +0000 (16:38 +0200)]
writer: Don't forget to emit types referenced by function types
While looking into something else, I noticed that sometimes the writer
would forget to emit types referenced by function types. Fixed thus.
* src/abg-writer.cc (write_referenced_types): Factorize out of ...
(write_translation_unit): ... here. Also, use it to write the
types referenced by emitted function types.
Dodji Seketeli [Thu, 14 Oct 2021 13:38:34 +0000 (15:38 +0200)]
ir: Avoid canonicalizing types that are not meant to
hash_as_canonical_type_or_constant asserts that a certain number of
types are not meant to be canonicalized. We ought to make sure that
type_base::get_canonical_type_for always agrees with
hash_as_canonical_type_or_constant. This patch enforces that for
good measure.
Dodji Seketeli [Thu, 14 Oct 2021 10:12:01 +0000 (12:12 +0200)]
Add --enable-debug-type-canonicalization to configure
This configure option adds the possibility to debug the type
canonicalization process specifically.
When this new configure option is turned on, in
ir::get_canonical_type_for, when the type T, candidate for
canonicalization is compared to a given canonical type C, the
comparison is done twice; once using structural equality and once
using canonical equality whenever it's possible. For all the
sub-types of T and C, structural equality and canonical equality must
yield the same result. Otherwise, an error message is emitted and the
process aborts.
This all happens when using the abidw program with the --enable-tc or
--enable-type-canonicalization option.
This has proven to be very helpful to detect type canonicalization issues.
For instance, here is a trace of canonicalization issue that was
detected thanks to this patch:
$ build/tools/abidw --debug-tc /usr/lib64/libwiretap.so.11.0.8
structural & canonical equality different for type: function type void (wtap*)
in compare_types_during_canonicalization at: /home/dodji/git/libabigail/PR28364/src/abg-ir.cc:13575: execution should not have reached this point!
Abandon (core dumped)
This means that right after canonicalizing the type "void (wtap*)",
structural and canonical equality yield different results. So it
means there is a problem with that type specifically that makes its
canonicalization "go wrong". This requires further debugging to
understand, but at least, we are super close to the root cause of the
canonicalization problem.
* configure.ac: Support the new
--enable-debug-type-canonicalization option. Define macro
WITH_DEBUG_TYPE_CANONICALIZATION accordingly.
* doc/manuals/abidw.rst: Update documentation.
* include/abg-ir.h
(environment::debug_type_canonicalization_is_on): Declare new
member function if WITH_DEBUG_TYPE_CANONICALIZATION is defined.
* src/abg-ir-priv.h
(environment::priv::{use_canonical_type_comparison_,
debug_type_canonicalization_}): Define new data members if
WITH_DEBUG_TYPE_CANONICALIZATION is defined.
(environment::priv::priv): Initialize them.
* src/abg-ir.cc (try_canonical_compare): When
WITH_DEBUG_TYPE_CANONICALIZATION is defined, perform comparison
using either structural or canonical equality depending on the
environment::priv::use_canonical_type_comparison_ flag.
(environment::debug_type_canonicalization_is_on): Define member
function when WITH_DEBUG_TYPE_CANONICALIZATION is defined.
(compare_types_during_canonicalization): Define new function.
(type_base::get_canonical_type_for): Use the new function
compare_types_during_canonicalization.
* tools/abidw.cc (options::debug_type_canonicalization): Define
new data member.
(option::option): Initialize it.
(display_usage): Add help string for --debug-tc.
(parse_command_line): Support new option --debug-tc or
--debug-type-canonicalization.
(load_corpus_and_write_abixml): Turn type canonicalization
debugging on if --enable-tc is provided.
Dodji Seketeli [Wed, 13 Oct 2021 11:52:22 +0000 (13:52 +0200)]
Improve type (de)serialization instability debugging
When debugging an issue uncovered by performing self comparison (abidw
--abidiff <binary>) I realized that I needed a stronger verification
of canonical types changing between type serialization and type
de-serialization. Namely, when a type T with canonical type C is
serialized, its de-serialized type should still have the same
canonical type C. Otherwise, it means some "type instability" took
place during serialization and de-serialization.
This patch implements that verification and also cleans up things
that came across while working on adding this debugging check.
* include/abg-fwd.h (is_non_canonicalized_type): Declare new
function.
* src/abg-ir-priv.h: Include abg-corpus.h
(environment::priv::pointer_type_id_map_): Fix comment.
(environment::priv::check_canonical_type_from_abixml_during_self_comp):
Define new member function.
* src/abg-ir.cc (unmark_types_as_being_compared): Factorize this
from ...
(return_comparison_result): ... here. Also, add a parameter to
control whether this function should perform the "canonical type
propagation optimization" or not. By default the optimization is
performed. This can be changed for debugging purposes later.
(type_base::get_canonical_type_for): Re-organise the self
comparison debugging process to invoke the new function
environment::priv::check_canonical_type_from_abixml_during_self_comp
each time a canonical type is computed, in addition to doing the
previous verification that was done when no canonical type was
found. Emit better error messages.
(is_non_canonicalized_type): Rename the static function
is_allowed_non_canonicalized_type into this and make it
non-static.
(hash_as_canonical_type_or_constant): Adjust.
* src/abg-reader.cc (maybe_map_type_with_type_id): Define new
static function.
(read_context::maybe_check_abixml_canonical_type_stability):
Ignore types that were not canonicalized.
(read_corpus_from_input): Set the origin of the corpus early
enough so that it's available to the canonicalizer even for types
being canonicalized early.
(MAYBE_MAP_TYPE_WITH_TYPE_ID): Factorize this macro out of ...
(build_type): ... this. That macro is defined only when debugging
self comparison.
(build_array_type_def): Map the read subrange type with its
type-id.
(handle_{type_decl, qualified_type, pointer_type_def,
reference_type_def, function_type, array_type_def,enum_type_decl,
typedef_decl, class_decl, union_decl}): Map the read type with its
type-id.
(load_canonical_type_ids): Ignore non-canonicalized types that
which ids were saved in the type-id file.
* src/abg-writer.cc (write_type_record): Factorize from ...
(write_canonical_type_ids): ... here. Don't forget to write the
type-ids of decl-only types. This can be useful for eye
inspection.
* tools/abidw.cc (load_corpus_and_write_abixml): Wait until the
end of the function before removing the type-id file. This can be
useful for eye inspection.
Bug 27086 - Consider all C++ virtual destructors when there are many
The complete and deleting C++ destructors have the same signature.
Because the dwarf-reader re-uses the IR of functions that have the
same signature, it can happen that one of the two destructors of a
class is missed and thus not represented in the IR. When these
destructors are virtual, that can have an impact on class comparison,
because virtual member functions are take part in class comparison,
just like data member and unlike non-virtual member functions.
This patch fixes the build_or_get_fn_decl_if_not_suppressed to avoid
"reusing" virtual destructors, based on their signature when several
are present. Instead an IR is built for all virtual destructors that
are seen.
* src/abg-dwarf-reader.c (build_or_get_fn_decl_if_not_suppressed):
Do not try to re-use a virtual destructor of a class, based on its
signature. Several different of these can have the same
signature, inside a given class.
* tests/data/test-types-stability/PR27086-libstdc++.so.6.0.26:
Add new binary test input.
* tests/data/Makefile.am: Add the new test input to source
distribution.
* tests/test-types-stability.cc (elf_paths): Add the test input
above to this harness.
Bug 27970 - Duplicated member functions cause spurious self comparison changes
Sometimes, in DWARF, a given class can even be defined piece-wise
across several DIEs. The first DIE would defined some properties and
subsequent DIEs would define others. dwarf-reader already supports
this for most properties. Some properties however can be duplicated
across two DIES.
For instance, a DIE describing a class 'C' can define a virtual member
function, and then a subsequent different DIE further describing other
properties of the same class C would define the same virtual member
function again. In that case, we should not define the virtual member
function twice in the IR of C that is being built.
Libabigail is failing to do exactly that. It's representing the
virtual member function of C twice in this case.
* src/abg-dwarf-reader.cc (fixup_functions_with_no_symbols): When
the function decl is finally associated to its (publicly defined)
ELF symbol, mark it as being exported.
(finish_member_function_reading): Don't risk marking a virtual
function as being non-virtual when updating its properties.
(build_or_get_fn_decl_if_not_suppressed): Update comment. If the
member function is already present in the class, do not create a
new one; rather, reuse the existing one. It's going to be later
updated by finish_member_function_reading.
* tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi: Adjust.
* tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi: Likewise.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt: Likewise.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-1.txt: Likewise.
* tests/data/test-diff-filter/test41-report-0.txt: Likewise.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
Likewise.
abipkgdiff: Do not erase working dirs before we are done using them
* tools/abipkgdiff.cc (compare_prepared_userspace_packages):
Removing working directories "early" prevents e.g,
dwarf_reader::get_soname_of_elf_file from accessing those files.
So do not remove them until the very end.
* tests/data/test-diff-pkg/libxcrypt-4.1.1-6.el8.x86_64--libxcrypt-4.1.1-6.el8.x86_64-output-1.txt:
Adjust.
typedef enum {E0 = 0; E1 = 1;} E; // 2/: anonymous enum named by a typedef.
E global0;
In the first context "1/", the type of the global variable is an
anonymous enum that is used as such.
In the second context "2/", the type of the global variable is an
anonymous type that is named by the typedef 'E'. So then, it's E that
is used to designate the enum. The anonymous type is thus never used
directly. In essence, it's the same thing as if it was declared as
enum E {E0 = 0; E1 = 1;};
Right now, libabigail canonicalizes the enum 1/ and 2/ together and
that results in 1/ being canonically equal to 2/.
So, when saving the corpus into abixml, because enum 1/ and enum 2/ can be
used interchangeably, either 1/ or 2/ is going to be saved. That
can result in spurious change reports in which the reporter refer to
the enum 1/ where it should refer to enum 2/ or vice versa.
Intrinsically, the enum 1/ and enum 2/ are different because one
essentially has a name (provided by a typedef) and the second does
not. One is anonymous whereas the second is not, essentially.
At the moment, libabigail supports typedef-named anonymous classes.
But it doesn't support this concept for enums.
This patch extends that concept to enums as well. It makes it so that
any anonymous type can now by typedef-named. In that case, the type
now looks like it has a name which is the typedef name. The
information about the typedef naming a given type is kept and
serialized into abixml.
Thus with this patch, the enum in 1/ is now considered (canonically)
different from enum 2/. So there is no possible confusion once the
type is serialized into abixml.
* include/abg-fwd.h (scope_anonymous_or_typedef_named)
(is_anonymous_or_typedef_named): Declare new functions.
* include/abg-ir.h (decl_base::set_has_anonymous_parent): Remove
declaration.
(decl_base::{get,set}_naming_typedef): Declare new member
functions.
* src/abg-ir.cc (update_qualified_name): Define static function.
(decl_base::priv::naming_typedef_): Define new data member.
(decl_base::priv::has_anonymous_parent_): Remove data member.
(decl_base::priv::priv): Adjust constructor.
(decl_base::get_has_anonymous_parent): Rather than storing a flag
for this, dynamically look at if the scope is anonymous.
(decl_base::set_has_anonymous_parent): Remove definition.
(decl_base::{get,set}_naming_typedef): Define new member
functions.
(scope_anonymous_or_typedef_named)
(is_anonymous_or_typedef_named): Define new functions.
(get_decl_name_for_comparison): Define new sub-routine for the
decl_base overload of equals.
(equals): In the overload for decl_base, use the new
get_decl_name_for_comparison. It helps to ensure that all
anonymous decls of the same kind have the same name for the
purpose of comparison. It also ensures that non anonymous decls
that are part of anonymous scopes should be compared only by
looking at their non-qualified names. In the overload for
class_or_union, adjust.
(scope_decl::add_member_decl): No more need to flag the fast that
the parent scope is anonymous here.
(get_debug_representation): Fix a thinko.
(class_or_union::get_naming_typedef): Remove member function as
it's now handled by decl_base::get_naming_typedef.
* src/abg-dwarf-reader.cc (build_typedef_type): When a typedef is
a naming typedef, then mark the named decl as being typedef-named.
(maybe_canonicalize_type): Delay canonicalization of anonymous
types because they can be typedef-named later.
* src/abg-reader.cc (read_naming_typedef_id_string)
(maybe_set_naming_typedef): Define new static function.
(build_class_decl): Use it here, rather than reading the
"naming-typedef-id" by hand.
(build_enum_type_decl, build_union_decl): Read the
"naming-typedef-id" property.
* src/abg-writer.cc (write_naming_typedef): Make this accept
decl_base_sptr, rather than just class_decl_sptr.
(write_enum_type_decl): Write the naming-typedef-id property if
needed.
* tests/data/test-abidiff-exit/test-PR28316-report.txt: New test
reference output.
* tests/data/test-abidiff-exit/test-PR28316-v{0,1}.cc: Source code
of new binary test input.
* tests/data/test-abidiff-exit/test-PR28316-v{0,1}.o: New binary
test input files.
* tests/data/Makefile.am: Add the new test files to the source
distribution.
* tests/test-abidiff-exit.cc: Add the new test files above to this
harness.
* tests/data/test-annotate/libtest23.so.abi: Adjust.
* tests/data/test-annotate/libtest24-drop-fns-2.so.abi: Likewise.
* tests/data/test-annotate/libtest24-drop-fns.so.abi: Likewise.
* tests/data/test-annotate/test13-pr18894.so.abi: Likewise.
* tests/data/test-annotate/test14-pr18893.so.abi: Likewise.
* tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
* tests/data/test-annotate/test21-pr19092.so.abi: Likewise.
* tests/data/test-diff-dwarf/test15-enum-report.txt: Likewise.
* tests/data/test-diff-filter/test19-enum-report-1.txt: Likewise.
* tests/data/test-diff-filter/test30-pr18904-rvalueref-report0.txt:
Likewise.
* tests/data/test-diff-filter/test30-pr18904-rvalueref-report1.txt:
Likewise.
* tests/data/test-diff-filter/test30-pr18904-rvalueref-report2.txt:
Likewise.
* tests/data/test-diff-filter/test35-pr18754-no-added-syms-report-0.txt:
Likewise.
* tests/data/test-diff-pkg/PR24690/PR24690-report-0.txt: Likewise.
* tests/data/test-diff-pkg/nss-3.23.0-1.0.fc23.x86_64-report-0.txt:
Likewise.
* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-0.txt:
Likewise.
* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-1.txt:
Likewise.
* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-2.txt:
Likewise.
* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-3.txt:
Likewise.
* tests/data/test-read-dwarf/PR22015-libboost_iostreams.so.abi:
Likewise.
* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Likewise.
* tests/data/test-read-dwarf/PR25042-libgdbm-clang-dwarf5.so.6.0.0.abi:
Likewise.
* tests/data/test-read-dwarf/libtest23.so.abi: Likewise.
* tests/data/test-read-dwarf/libtest24-drop-fns-2.so.abi:
Likewise.
* tests/data/test-read-dwarf/libtest24-drop-fns.so.abi: Likewise.
* tests/data/test-read-dwarf/test-libaaudio.so.abi: Likewise.
* tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
* tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi: Likewise.
* tests/data/test-read-dwarf/test11-pr18828.so.abi: Likewise.
* tests/data/test-read-dwarf/test12-pr18844.so.abi: Likewise.
* tests/data/test-read-dwarf/test13-pr18894.so.abi: Likewise.
* tests/data/test-read-dwarf/test14-pr18893.so.abi: Likewise.
* tests/data/test-read-dwarf/test15-pr18892.so.abi: Likewise.
* tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise.
* tests/data/test-read-dwarf/test21-pr19092.so.abi: Likewise.
* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
Likewise.
* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Likewise.
When two packages are different just because one adds or removes
binaries -- and no binary have any ABI change otherwise, abipkgdiff
quits early and doesn't report the added and removed binaries.
This patch fixes the issue by reporting added/removed binaries even
when no ABI comparison took place.
* tools/abipkgdiff.cc (compare_prepared_userspace_packages): Do
not return early if there are no binaries to compare. Also add
more verbose messages.
* tests/data/test-diff-pkg/libxcrypt-4.1.1-6.el8.x86_64--libxcrypt-4.1.1-6.el8.x86_64-output-1.txt:
New reference output file.
* tests/data/test-diff-pkg/libxcrypt-4.1.1-6.el8.x86_64--libxcrypt-compat-4.4.18-3.el9.x86_64-report-1.txt:
New reference output file.
* tests/data/test-diff-pkg/libxcrypt-4.1.1-6.el8.x86_64.rpm: New
binary input file.
* tests/data/test-diff-pkg/libxcrypt-4.4.18-3.el9.x86_64.rpm: Likewise.
* tests/data/test-diff-pkg/libxcrypt-compat-4.4.18-3.el9.x86_64.rpm: Likewise.
* tests/data/test-diff-pkg/libxcrypt-compat-debuginfo-4.4.18-3.el9.x86_64.rpm: Likewise.
* tests/data/test-diff-pkg/libxcrypt-debuginfo-4.1.1-6.el8.x86_64.rpm: Likewise.
* tests/data/test-diff-pkg/libxcrypt-debuginfo-4.4.18-3.el9.x86_64.rpm: Likewise.
* tests/data/Makefile.am: Add the new testing files to source
distribution.
* tests/test-diff-pkg.cc (in_out_specs): Add these binary packages
to this testing harness.
RHBZ1944102 - self comparing ABI of protobuf-3.14.0-2.el9 failed
Reading size and alignment from abixml can lead to loss of precision
that surfaced when self comparing the protobuf package as described in
bug https://bugzilla.redhat.com/show_bug.cgi?id=1944102.
Fixed thus.
* src/abg-reader.cc (read_size_and_alignment): Use atoll to read
long long values, not atoi.
RHBZ1951496 - ir: Acknowledge that "void type" is not canonicalized
In the libabigail type system, the void type is a synthetic type and
is thus not canonicalized.
We forgot to mention this "void type" to
hash_as_canonical_type_or_constant as being one of the types that are
allowed to be non canonicalized in the system. This omission violates
an assert in that function.
The patch introduces a new is_allowed_non_canonicalized_type
subroutine that defines the types that are allowed to be non
canonicalized in the system and make it recognize "void type" as such.
hash_as_canonical_type_or_constant uses the new
is_allowed_non_canonicalized_type.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1951496
* src/abg-ir.cc (is_allowed_non_canonicalized_type): Define new
static function.
(hash_as_canonical_type_or_constant): Use it.
xml-reader: Get back to original way of building qualified types
We build qualified types today by building a temporary qualified type
of "void" (one that has no underlying type), then the underlying type
is built, and once it's built, it's set to the qualified type.
We do this so that if during the construction of the underlying type,
the proper qualified type is needed (by recursion) then the temporary
type is found and used. We used this strategy recently as a temporary
measure until the canonical type propagation algorithm is fixed. And
now it seems fixed.
The problem with that temporary approach is that in some rare cases,
the temporary qualified void type can be used elsewhere and violate
assertions that expect a proper qualified type.
The patch thus creates the underlying type first. If the qualified
type is created (by recursion) in the process, we use it. Otherwise,
the qualified type is later created with the proper underlying type
that is already available.
This helps to fix https://bugzilla.redhat.com/show_bug.cgi?id=1951501.
* src/abg-reader.cc (build_qualified_type_decl): Create the
underlying type first, then create the qualified type.
This helps fix bug
During the canonical type propagation optimization, when the
comparison of two type sub-objects fails, we need to cancel the
(potential) propagation of the canonical type of the current type
sub-object being compared.
We were not doing that in return_comparison_result, but were expecting
it. Oops.
Fixed thus.
This helps to fix bug https://bugzilla.redhat.com/show_bug.cgi?id=1951501.
* src/abg-ir.cc (return_comparison_result): When the comparison of
the current type sub-object fails, clear the potentially
propagated canonical type and remove it from the set of types with
non confirmed propagated canonical types.
ir: Avoid infinite loop during type canonicalization
While looking at something else, I noticed an occurrence of infinite
loop during type canonicalization, especially when cancelling
canonical type propagation on some types.
Fixed thus.
This helps address https://bugzilla.redhat.com/show_bug.cgi?id=1951501
* src/abg-ir-priv.h
(environment::priv::collect_types_that_depends_on): Don't try to
collect a type that has already been collected.
While looking at something else, I stumbled across this bug where the
linkage name of enum are not escaped in abixml. So "forbidden"
characters like '<' can snick in.
Fixed thus.
This helps address https://bugzilla.redhat.com/show_bug.cgi?id=1951501
RHBZ-1944096 - assertion failure during self comparison of systemd
When reading the abixml representing an enumerator which value is
exactly either LLONG_MIN or LLONG_MAX, build_enum_type_decl fails
because we wrongly think that an underflow or overflow happened, while
using strtoll.
This patch fixes the condition used to detect {under,over}flow
whenusing strtoll.
* src/abg-reader.cc (build_enum_type_decl): When strtoll detects
an underflow or overflo, it sets errno to ERANGE. So take that
into account.
Reporting the change in array type exhibits a glitch in the type name.
As the bug report says:
The resulting abidiff output contains:
type of 'int numbers[2]' changed:
type name changed from 'void[2]' to 'void[3]'
array type size changed from 64 to 96
array type subrange 1 changed length from 2 to 3
instead of
type of 'int numbers[2]' changed:
type name changed from 'int[2]' to 'int[3]'
array type size changed from 64 to 96
array type subrange 1 changed length from 2 to 3
The problem comes from array_type_def::get_qualified_name() where we
fail to generate a "new" qualified name once the type of the array is
canonicalized.
Fixed thus.
* src/abg-ir.cc (array_type_def::get_qualified_name): Use the
cache for temporary qualified names when the type is not yet
canonicalized. That way, the cache for (non-temporary) qualified
names is used only for canonicalized types.
* tests/data/test-abidiff/test-PR27985-report.txt: Reference
output for the new test.
* tests/data/test-abidiff/test-PR27985-v{0,1}.c: Source code for
the new test binary inputs.
* tests/data/test-abidiff/test-PR27985-v{0,1}.o: New test binary inputs.
* tests/data/test-abidiff/test-PR27985-v{0,1}.o.abi: New test
abixml input.
* tests/data/Makefile.am: Add the new test materials above to
source distribution.
* tests/test-abidiff.cc (specs): Add the tests above to the harness.
* tests/data/test-diff-pkg/nss-3.23.0-1.0.fc23.x86_64-report-0.txt:
Adjust.
* tests/data/test-abidiff-exit/qualifier-typedef-array-report-1.txt:
Adjust.
Giuliano Procida [Fri, 27 Aug 2021 15:24:39 +0000 (16:24 +0100)]
abg-writer: faster referenced type emission tests
When determining whether a referenced type should be emitted, various
tests are done:
- has the type been emitted already? hash table lookup
- does the translation unit match? string comparison
- is this the last translation unit? read bool variable
The translation unit tests were added in recent commits and followed
the hash table lookups. This resulted in a performance regression
affecting Android continuous integration tests.
The lookups require a hash calculation and an equality check if the
hash is present. The equality checks are expensive deep equalities
rather than pointer comparisons.
This change reorders the tests so that the lookups happen last. This
speeds up abidw by more than a factor of 10 for one Android library.
* src/abg-writer.cc (write_translation_unit): Reorder
referenced type emission tests for efficiency. Consolidate
related comments.
Dodji Seketeli [Tue, 10 Aug 2021 17:24:12 +0000 (19:24 +0200)]
RHBZ 1925886 - Compare anonymous types without qualified names
An anonymous struct/union is, by definition an entity that is not
named (unless a naming typedef is provided for it).
It turns out that in C++ binaries, there are anonymous types that are
logically equivalent (as far as ABI is concerned) because they have
the same members and layout, but turn out to be evaluated as being
different because they are defined in different name spaces. And
because they are not named, showing them as being different just
because of their name space doesn't bring anything but spurious error
reporting.
Consider the DWARF representing this:
struct S
{
union
{
int a;
int b;
} member;
};
where the 'member' is of type S::<anonymous-union>.
Probably due to LTO, we see some DWARF that represents the type of
'member' as just <anonymous-union>, in some translation units.
I could not generate that DWARF from a small test case, myself. But
it comes from the binary 'usr/bin/lto-dump', from the
https://bugzilla.redhat.com/show_bug.cgi?id=1925886 problem report.
So in that case, we want the S::<anonymous-union> to compare equal to
the <anonymous-union>, otherwise, this produces spurious type changes,
especially when doing self comparison.
This is what this patch does.
* include/abg-fwd.h (is_anonymous_type): Constify this function.
* src/abg-ir.cc (equals): In the overload for decl_base, do not
take scope of anonymous types into account. In the overload for
array_type_def do not peel of typedefs. This is not directly
related to anonymous types, but it make comparison more robust
against naming typedefs used for anonymous types in array
elements.
(get_type_name): Do not take into account the scope of anonymous
types when building internal representation of types. Note that
the internal representation is what is used for canonicalization.
This means that all anonymous types are compared against each
others during type canonicalization.
* src/abg-reader.cc (build_class_decl): Do not try to re-use
anonymous types, just like we already do for DWARF.
* tests/data/test-annotate/test17-pr19027.so.abi: Adjust.
* tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
Likewise.
* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
Likewise.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt:
Likewise.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-1.txt:
Likewise.
* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
* tests/data/test-read-dwarf/test-libaaudio.so.abi: Likewise.
* tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
* tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi: Likewise.
* tests/data/test-read-dwarf/test11-pr18828.so.abi: Likewise.
* tests/data/test-read-dwarf/test12-pr18844.so.abi: Likewise.
* tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise.
* tests/data/test-read-dwarf/test17-pr19027.so.abi: Likewise.
* tests/data/test-read-dwarf/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
Likewise.
* tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
Likewise.
* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
Likewise.
* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Likewise.
Bug 27236 - Don't forget to emit some referenced types
Since we arranged to only emit referenced types in translation units
where they belong, it appears that in some cases we forget to emit
some referenced types.
This is because some referenced types might belong to a translation
unit that is *already* emitted by the time we detect that a type is
referenced.
To fix this correctly, we should probably have a pass that walks the
corpus to detect referenced types, so that we have their set even
before we start emitting translation units.
But for now, the patch just detects when we are emitting the last
translation unit. In that case all the non-emitted referenced types
are emitted. It doesn't seem to be an issue if those don't belong to
that translation unit, compared to their original (from the DWARF)
type.
* include/abg-writer.h (write_translation_unit): Add a new
parameter that says if we are emitting the last TU.
* src/abg-writer.cc (write_translation_unit::{type_is_emitted,
decl_only_type_is_emitted}): Constify these methods.
(write_context::has_non_emitted_referenced_types): Define new
member function using the const methods above.
(write_translation_unit): When emitting the last TU, emit all the
referenced types.
(write_corpus): Set signal when emitting the last translation
unit.
Some classes can be defined piece-wise, in some rare cases in the
abixml. build_class_decl is currently preventing that to happen,
leading to some spurious self comparison errors.
Fixed thus.
* src/abg-reader.cc (build_class_decl): Keep going when the class
has already been built. The rest of the code knows how to add new
stuff.
* tests/data/test-abidiff/test-PR18791-report0.txt: Adjust.
Bug 27236 - Fix the canonical type propagation optimization
While working on another bug, it turned out the initial fix for the
bug https://sourceware.org/bugzilla/show_bug.cgi?id=27236 was just
papering over the real issue.
I think the real issue is that "canonical type propagation"
optimization was being done even in cases where it shouldn't have been
done. This patch recognizes the limits of that optimization and avoid
performing it when we are off limits.
So here is what that optimization is. The text below is also present
in the comments in the source code. I am putting it here to explain
the context.
During the canonicalization of a type T (which doesn't yet have a
canonical type), T is compared structurally (member-wise) against a
type C which already has a canonical type. The comparison expression
is C == T.
During that structural comparison, if a subtype of C (which also
already has a canonical type) is structurally compared to a subtype of
T (which doesn't yet have a canonical type) and if they are equal,
then we can deduce that the canonical type of the subtype of C is the
canonical type of the subtype of C.
Thus, we can canonicalize the sub-type of the T, during the
canonicalization of T itself. That canonicalization of the sub-type
of T is what we call "propagating the canonical type of the sub-type
of C onto the sub-type of T". It's also called "on-the-fly
canonicalization". It's on the fly because it happens during a
comparison -- which itself happens during the canonicalization of T.
So this is the general description of the "canonical type propagation
optimization".
Now we must recognize the limits of that optimization. Said
otherwise, there is a case when a type is *NOT* eligible to this
canonical type propagation optimization.
The reason why a type is deemed NON-eligible to the canonical type
propagation optimization is that it "depends" on a recursively present
type. Let me explain.
Suppose we have a type T that has sub-types named ST0 and ST1.
Suppose ST1 itself has a sub-type that is T itself. In this case, we
say that T is a recursive type, because it has T (itself) as one of
its sub-types:
T
+-- ST0
|
+-- ST1
| +
| |
| +-- T
|
+-- ST2
ST1 is said to "depend" on T because it has T as a sub-type. But
because T is recursive, then ST1 is said to depend on a recursive
type. Notice however that ST0 does not depend on any recursive type.
Now suppose we are comparing T to a type T' that has the same
structure with sub-types ST0', ST1' and ST2'. During the
comparison of ST1 against ST1', their sub-type T is compared
against T'. Because T (resp. T') is a recursive type that is
already being compared, the comparison of T against T' (as a
subtypes of ST1 and ST1') returns true, meaning they are
considered equal. This is done so that we don't enter an infinite
recursion.
That means ST1 is also deemed equal to ST1'. If we are in the
course of the canonicalization of T' and thus if T (as well as as
all of its sub-types) is already canonicalized, then the canonical
type propagation optimization will make us propagate the canonical
type of ST1 onto ST1'. So the canonical type of ST1' will be
equal to the canonical type of ST1 as a result of that
optmization.
But then, later down the road, when ST2 is compared against ST2',
let's suppose that we find out that they are different. Meaning
that ST2 != ST2'. This means that T != T', i.e, the
canonicalization of T' failed for now. But most importantly, it
means that the propagation of the canonical type of ST1 to ST1'
must now be invalidated. Meaning, ST1' must now be considered as
not having any canonical type.
In other words, during type canonicalization, if ST1' depends on a
recursive type T', its propagated canonical type must be
invalidated (set to nullptr) if T' appears to be different from T,
a.k.a, the canonicalization of T' temporarily failed.
This means that any sub-type that depends on recursive types and
that has been the target of the canonical type propagation
optimization must be tracked. If the dependant recursive type
fails its canonicalization, then the sub-type being compared must
have its propagated canonical type cleared. In other words, its
propagated canonical type must be cancelled.
This concept of cancelling the propagated canonical type when needed
is what this patch introduces.
New data members have been introduced to the environment::priv private
structure. Those are to keep track of the stack of sub-types being
compared so that we can detect if a candidate to the canonical type
propagation optimization depends on a recursive type.
There is also a data structure in there to track the targets of the
canonical type propagation optimization that "might" need to see their
propagated canonical types be cancelled.
Then new functions have been introduced to detect when a type depends
on a recursive type, to cancel or confirm propagated canonical types
etc.
In abg-ir.cc, The RETURN* macros used in the equals() overloads have
been factorized using the newly introduced function templates
return_comparison_result(). This now contains the book keeping that
was previously done (in the RETURN* macros) to detect recursive cycles
in the comparison, as well as triggering the canonical type
propagation. This i also where the logic of properly limiting the
optimization is implemented now.
* include/abg-ir.h (pointer_set): This typedef is now for an
unordered_set<uintptr_t> rather than an unordered_set<size_t>.
(environment::priv_): Make this public so that code in free form
function from abg-ir.cc can access it.
* src/abg-ir-priv.h (struct type_base::priv): Move this private
structure here, from abg-ir.cc.
(type_base::priv::{depends_on_recursive_type_,
canonical_type_propagated_}): Added these two new data members.
(type_base::priv::priv): Initialize the two new data members.
(type_base::priv::{depends_on_recursive_type,
set_depends_on_recursive_type,
set_does_not_depend_on_recursive_type, canonical_type_propagated,
set_canonical_type_propagated, clear_propagated_canonical_type}):
Define new member functions.
(struct environment::priv): Move this struct here, from abg-ir.cc.
(environment::priv::{types_with_non_confirmed_propagated_ct_,
left_type_comp_operands_, right_type_comp_operands_}): New data
members.
(environment::priv::{mark_dependant_types,
mark_dependant_types_compared_until, confirm_ct_propagation,
collect_types_that_depends_on, cancel_ct_propagation,
remove_from_types_with_non_confirmed_propagated_ct}): New member
functions.
* src/abg-ir.cc (struct environment::priv, struct)
(type_base::priv, struct class_or_union::priv): Move these struct
to include/abg-ir-priv.h.
(push_composite_type_comparison_operands)
(pop_composite_type_comparison_operands)
(mark_dependant_types_compared_until)
(maybe_cancel_propagated_canonical_type): Define new functions.
(notify_equality_failed, mark_types_as_being_compared): Re-indent.
(is_comparison_cycle_detected, return_comparison_result): Define
new function templates.
(RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED): Define new macro.
(equals(const function_type& l, const function_type& r)): Redefine
the RETURN macro using the new return_comparison_result function
template. Use the new RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED
and mark_types_as_being_compared functions.
(equals(const class_or_union& l, const class_or_union&, change_kind*)):
Likewise.
(equals(const class_decl& l, const class_decl&, change_kind*)):
Likewise. Because this uses another equal() function to compare
the class_or_union part the type, ensure that no canonical type
propagation occurs at that point.
(types_are_being_compared): Remove as it's not used anymore.
(maybe_propagate_canonical_type): Use the new
environment::priv::propagate_ct() function here.
(method_matches_at_least_one_in_vector): Ensure the
right-hand-side operand of the equality stays on the right. This
is important because the equals() functions expect that.
* src/abg-reader.cc (build_type): Ensure all types are
canonicalized.
* tests/data/test-diff-dwarf/PR25058-liblttng-ctl-report-1.txt:
Adjust.
* tests/data/test-diff-pkg/nss-3.23.0-1.0.fc23.x86_64-report-0.txt:
Likewise.
* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-2.txt:
Likewise.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
Likewise.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt:
Likewise.
* tests/data/test-read-dwarf/test-libaaudio.so.abi: Likewise.
Dodji Seketeli [Wed, 30 Jun 2021 09:27:37 +0000 (11:27 +0200)]
Bug 27995 - Self comparison error from abixml file
There are several self comparison issues uncovered by comparing the
file test-PR27995.abi (provided in the bug report) against itself.
This patch address them all as well as the regressions induced on some
of the test suite and then and updates the other reference test suite
output that need it.
In the equals overload for decl_base, we compare the non-internal
versions of qualified decl names. For var_decls of anonymous class or
union types, the non-internal version is the flat-representation of
the type. Thus a benign change in a data member name of the anonymous
type might cause the equals function to consider the var_decls to be
wrongly different. The internal version of the qualified decl name
should return a name that is stable for types, irrespective of these
benign variations. The patch thus makes the equals overload for
decl_base to compare internal versions of qualified decl names instead.
The patch ensures that enum_type_decl::get_pretty_representation
return and internal pretty representation that is "stable" for
anonymous types. Basically, all anonymous enums will have the same of
name that looks like "__anonymous_enum__". This is to ensure two
things: first, that all anonymous enums are compared against each
other during type canonicalization, ensuring that when two anonymous
enums are canonically different, it really is because of changes in
their enumerators or basic type, not because of anything having to do
with their artificial names. Second, that in the equals overload for
decl_base, their internal qualified name always compare equal. This
nullifies the risk of having anonymous types compare different because
of their (non existent) name. This is because libabigail's dwarf
reader assigns artificial unique names to anonymous types, so we don't
want to use these during actual type comparison.
We do something similar for class_decl::get_pretty_representation and
union_decl::get_pretty_representation where the pretty internal
representation for class/union decl would now be
__anonymous_{struct,union}__.
The patch scouts the uses of get_pretty_representation() to make sure
to use avoid using the internal-form of the pretty representations
when it's not warranted. It also updates the doxygen comments of the
overloads of that function.
In the abixml reader, we were wrongly canonicalizing array types
early, even before they were fully created. The was leading to
spurious type chances down the road.
The patch also fixes the caching of the name of function types by
making it consistent with caching of the names of the other types of
the system. The idea is that we don't cache the name of a function
type until it's canonicalize. This is because the type might be
edited during its pre-canonicalization life time; and that editing
might change its name. However once the type is canonicalized, it
becomes immutable. At that point we can cache its name, for
performance purposes. Note that we need to do that both for the
"internal version" of the type name (used for canonilization purposes)
and the "non-internal version" one, which is used for other purposes.
This caching scheme wasn't respected for function types, so we were
caching a potentially wrong name for the type after its
canonicalization.
Last but not least, there is a problem that makes canonical type
comparison different from structural type comparison.
Let's consider these two declarations:
typedef int FirstInt;
typedef int SecondInt;
Now, consider these two pointer types: FirstInt* and SecondInt*;
These two pointer types are canonically different because they have
different type names. This is because during type canonicalization,
types with the same "pretty representation" are compared against each
other. So types with different type names will certainly have
different pretty representations and won't be compared; they are thus
going to have different canonical types.
However, FirstInt* and SecondInt* do compare equal, structurally,
because the equals overload for pointer_type_def compares the
pointed-to types of pointers by peeling off typedefs. So, here, as
both pointed-to types are 'int' when the typedefs are peeled off, the
two pointers structurally compare equal. This discrepancy between
structural and canonical equality introduces subtle and spurious type
changes depending on the order in which types are canonicalized. For
instance:
struct {FirstInt* m0;}; /* First type. */
struct {SecondInt* m0;}; /* Second type. */
If FirstInt* and SecondInt* are canonicalized before their containing
anonymous types, then the two anonymous types will compare different
(because FirstInt* and SecondInt* compare different) and have
different canonical types. If, however, the anonymous types are
canonicalized before FirstInt* and SecondInt*, then will compare equal
because FirstInt* and SecondInt* are structurally equivalent.
FirstInt* and SecondInt* will be canonicalized latter and have
different canonical types (because they have different type names)
despite being structurally equivalent.
The change in the order of canonicalization can happen when
canonicalizing types from a corpus coming from DWARF as opposed to
canonicalizing types from a corpus coming from abixml.
The patch fixes this discrepancy by not peeling off typedefs from the
pointed-to types when comparing pointers. Note that this makes us
regress on bug https://sourceware.org/bugzilla/show_bug.cgi?id=27236,
where the typedef peeling was introduced. In hindsight, introducing
that typedef peeling was a mistake. I'll try to address that bug
again in a subsequent patch.
* doc/manuals/abidiff.rst: Add documentation for the --debug
option.
* src/abg-ir.cc (equals): In the overload for decl_base consider
the internal version of qualified decl name. In the overload for
pointer_type_def do not peel typedefs off from the compared
pointed-to types. In the overload for typedef_decl compare the
typedef as a decl as well. In the overload for var_decl, compare
variables that have the same ELF symbols without taking into
account their qualified name, rather than their name. Stop
comparing data member without considering their names.
In the overload for class_or_union, when a decl-only class that is
ODR-relevant is compared against another type, assume that
equality if names are equal. This is useful in environments where
some TUs are ODR-relevant and others aren't.
(*::get_pretty_representation): Update doxygen comments.
(enum_type_decl::get_pretty_representation): Return an internal
pretty representation that is stable across all anonymous enums.
(var_decl::get_anon_dm_reliable_name): Use the non-internal pretty
representation for anonymous data members.
(function_type::priv::temp_internal_cached_name_): New data
member.
(function_type::get_cached_name): Cache the internal name after
the function type is canonicalized. Make sure internal name and
non-internal name are cached separately.
(class_or_union::find_anonymous_data_member): Look for the anonymous
data member by looking at its non-internal name.
({class, union}_decl::get_pretty_representation): Use something like "class
__anonymous_{union,struct}__" for all anonymous classes, so that they can
all be compared against each other during type canonicalization.
(type_has_sub_type_changes): Use non-internal pretty
representation.
(hash_type_or_decl, function_decl_is_less_than:): Use internal
pretty representation for comparison here.
* src/abg-reader.cc (read_context::maybe_canonicalize_type): Don't
early canonicalize array types.
* src/abg-writer.cc (annotate): Use non-internal pretty
representation.
* tests/data/test-diff-filter/test-PR27995-report-0.txt: New
reference report.
* tests/data/test-diff-filter/test-PR27995.abi: New test input
abixml file.
* tests/data/Makefile.am: Add test-PR27995.abi,
test-PR27995-report-0.txt to the source distribution.
* tests/data/test-annotate/libtest23.so.abi: Adjust.
* tests/data/test-diff-dwarf/test6-report.txt: Adjust.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt: Adjust.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-1.txt: Adjust.
* tests/data/test-diff-filter/test41-report-0.txt: Adjust.
* tests/data/test-diff-filter/test43-decl-only-def-change-leaf-report-0.txt: Adjust.
* tests/data/test-diff-filter/test8-report.txt: Adjust.
* tests/data/test-diff-pkg/libICE-1.0.6-1.el6.x86_64.rpm--libICE-1.0.9-2.el7.x86_64.rpm-report-0.txt:
Adjust.
* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-0.txt:
Adjust.
* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-2.txt:
Adjust.
* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-3.txt:
Adjust.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
Adjust.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt:
Adjust.
* tests/data/test-diff-suppr/test39-opaque-type-report-0.txt: Adjust.
* tests/data/test-read-dwarf/PR22015-libboost_iostreams.so.abi: Adjust.
* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Adjust.
* tests/data/test-read-dwarf/libtest23.so.abi: Adjust.
* tests/data/test-read-dwarf/test-libandroid.so.abi: Adjust.
* tests/data/test-read-dwarf/test11-pr18828.so.abi: Adjust.
* tests/data/test-read-dwarf/test12-pr18844.so.abi: Adjust.
* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Adjust.
* tests/test-diff-filter.cc (in_out_specs): Add the
test-PR27995.abi to the test harness.
* tools/abidiff.cc (options::do_debug): New data member.
(options::options): Initialize it.
(parse_command_line): Parse --debug.
(main): Activate self comparison debug if the user provided
--debug.
Dodji Seketeli [Fri, 25 Jun 2021 10:35:28 +0000 (12:35 +0200)]
ir: Tighten type comparison optimization for Linux kernel binaries
types_defined_same_linux_kernel_corpus_public() performs an
optimization while comparing two types in the context of the Linux
kernel. If two types of the same kind and name are defined in the
same corpus and in the same file, then they ought to be equal.
For two anonymous classes that have naming typedefs, the function
forgets to ensure that the naming typedefs have the same name.
I have no binary that exhibits the potential issue, but I stumbled
upon the problem while looking at something else that uncovered
the problem. This change doesn't impact any of the binaries of the
regression suite at the moment, though.
Fixed thus.
* src/abg-ir.cc (types_defined_same_linux_kernel_corpus_public):
Ensure that anonymous classes with naming typedefs have identical
typedef names.
Dodji Seketeli [Fri, 25 Jun 2021 10:00:26 +0000 (12:00 +0200)]
ir: Tighten the test for anonymous data member
In is_anonymous_data_member(), we only test that the name of the data
member is empty; we forget to test that decl_base::get_is_anonymous()
is true. This might make us wrongly think that a data member is
anonymous in cases like in the equals() function for var_decl, where
we temporarily set the name of the compared var_decl to "" before
invoking the decl_base::operator==. We do this to perform the
comparison by not taking into account the name of the variable.
This hasn't yet happened on the binaries of the regression test suite,
but it's definitely wrong so I am fixing it here.
* src/abg-ir.cc: (is_anonymous_data_member): Consider
decl_base::get_is_anonymous as well.
Dodji Seketeli [Thu, 24 Jun 2021 16:08:18 +0000 (18:08 +0200)]
ir: Improve the debugging facilities
While looking at something else, I stumbled across some minor issues
in the debugging facilities I use to track self comparison problems.
I added a missing ABG_RETURN macro in the stack of equals() function
to better detect when there is a change, under the debugger.
I also fixed get_debug_representation() to properly display the
class/enum name (as expected) rather their pretty representation.
* src/abg-ir.cc (maybe_compare_as_member_decls): Add a missing
ABG_RETURN
(get_debug_representation): Display the name of class and enums,
not their pretty representation.
Bitfield and other member offsets can be specified in DWARF using:
- DW_AT_data_bit_offset, or
- DW_AT_data_member_location and optionally DW_AT_bit_offset.
The code would only use the value DW_AT_data_member_location if there
was no DW_AT_bit_offset. This commit fixes this and adjusts
documentation and affected tests.
abg-ir.h: add declaration of operator<< for elf_symbol::visibility
There is a formatted output operator for elf_symbol::visibility in
abg-ir.cc. However, it had no visibile declaration and was not usable
by library users. This commit adds the declaration.
ir: remove "is Linux string constant" property from elf_symbol
This boolean property was obsoleted by the new symtab reader
implementation. It has no users.
Following this change, the find_ksymtab_strings_section function joins
find_ksymtab_section and find_ksymtab_gpl_section in having no users.
* include/abg-ir.h (elf_symbol::elf_symbol): Drop
is_linux_string_cst argument.
(elf_symbol::create): Likewise.
(elf_symbol::get_is_linux_string_cst): Drop method.
* src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab):
Remove code that gets the index of the __ksymtab_strings
section. Drop corresponding elf_symbol::create argument.
(lookup_symbol_from_gnu_hash_tab): Likewise.
(lookup_symbol_from_symtab): Likewise.
(create_default_fn_sym): Drop false is_linux_string_cst
argument to elf_symbol::create.
* src/abg-ir.cc (elf_symbol::priv::is_linux_string_cst_): Drop
member variable.
(elf_symbol::priv default ctor): Drop initialisation of
is_linux_string_cst_.
(elf_symbol::priv normal ctor): Drop is_linux_string_cst
argument and corresponding is_linux_string_cst_
initialisation.
(elf_symbol::elf_symbol ctor): Drop is_linux_string_cst
argument and corresponding forwarding to priv ctor.
(elf_symbol::create): Drop is_linux_string_cst argument and
corresponding forwarding to ctor.
(elf_symbol::get_is_linux_string_cst): Drop method.
* src/abg-reader.cc (build_elf_symbol): Drop false
is_linux_string_cst argument to elf_symbol::create.
* src/abg-symtab-reader.cc (symtab::load): Likewise.
Consistently use std::unique_ptr for private implementations (pimpl)
In the absence of non-refcounting smart pointers before C++11,
std::shared_ptr was commonly used instead. Having bumped the standard to
C++11, allows us to use std::unique_ptr consistently avoiding any costs
involved with shared_ptr ref counting. Hence do that and add default
virtual destructors where required.
symtab-reader: add support for binaries compiled with CFI
Control-Flow-Integrity (CFI) when enabled in clang built binaries
introduces an indirection when looking up ELF symbols. For DSO, the
symbol table (.dynsym) will still contain the symbols, but additional
symbols with suffix .cfi will be added to the full .symtab.
Unfortunately, the DWARF debug information refers to CFI symbols by
address to the .cfi suffixed variants as they point to the actual
implementation.
When the dwarf reader is determining whether to suppress variable or
function declarations, it does so by identifying if there is an
associated ELF symbol at the given address read from DWARF. Unless we
know about the alternative address, this will fail and the type
information will be suppressed.
Hence add the .cfi symbol values to the lookup map to associate their
address with the corresponding publicly exported symbol.
* src/abg-symtab-reader.cc (symtab::load_): use new
add_alternative_address_lookups method.
(add_alternative_address_lookups): New method.
* src/abg-symtab-reader.h (add_alternative_address_lookups): new
function declaration.
* tests/data/test-read-dwarf/test-libaaudio.so: New test data.
* tests/data/test-read-dwarf/test-libaaudio.so.abi: New test data.
* tests/data/Makefile.am: Add the two new tests input to source
distribution.
* tests/test-read-dwarf.cc: New test case.
Reported-by: Dan Albert <danalbert@google.com> Reviewed-by: Giuliano Procida <gprocida@google.com> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
Refactor the acquisition of symtabs to explicitly provide functionality
to get the .symtab and .dynsym sections. A later patch will make use of
that to acquire .symtab while find_symbol_table_section() still provides
.dynsym as default symbol table.
This also adds a new overload to find_section to acquire the first
section by type and adjusts find_symbol_table_section() to make use of
those functions.
* src/abg-elf-helpers.cc(find_section): New overload.
(find_symtab_section): New function.
(find_dynsym_section): New function.
(find_symbol_table_section): Use new find_*_section functions.
* src/abg-elf-helpers.h(find_section): New overload declaration.
(find_symtab_section): New function declaration.
(find_dynsym_section): New function declaration.
Dodji Seketeli [Fri, 18 Jun 2021 12:47:04 +0000 (14:47 +0200)]
Bug 27980 - Fix updating of type scope upon type canonicalization
Once a type T is canonicalized, its scope is updated so that the
vector returned by scope_decl::get_canonical_types() now contains the
new canonical type of T. This works, obviously, even when the scope
is itself a type.
This works well on binaries compiled using C only because, currently,
libabigail de-duplicates the DIEs of types. This means that if the
scope of T is a non-anonymous type, the class of equivalence of that
scope contains just one element. So updating the scope of T implies
updating just one scope.
On binaries where some files are compiled using C++ however, type DIEs
are not de-duplicated. This is just because that feature hasn't yet
been implemented in libabigail. Anyway, in that case, if the scope of
T is a non-anonymous type, the class of equivalence of that scope
contains more than one element. So updating the scope of T implies
updating the scope of all the elements of the class of equivalence T.
In practise, that means updating the canonical type (scope) of T.
Libabigail fails to update the canonical type (scope) of T. Later at
abixml emitting time, just emitting the canonical types of the scope
of T is not enough to emit the canonical type of T. And that's how
the abixml emitter forgets to emit some types as reported in the bug
https://sourceware.org/bugzilla/show_bug.cgi?id=27980.
This patch fixes that issue.
I also noticed that when emitting abixml for unions, the emitter
fails to emit the canonical member types of the union, unlike what is
done for class types. So that is fixed as well.
The binary provided in the bug report is added to the regression
testsuite.
* src/abg-ir.cc (canonicalize): Update the
scope_decl::get_canonical_types() of canonical type of the
containing type of the newly canonicalized type.
* src/abg-writer.cc (write_union_decl): Write the canonical types
contained in the current union scope, just like we do for classes.
* tests/data/test-read-dwarf/test16-pr18904.so.abi: Adjust.
* tests/data/test-types-stability/pr27980-libc.so: New binary
input file.
* tests/data/Makefile.am: Add the test input file above to source
distribution.
* tests/test-types-stability.cc (elf_paths): Add the new test
input file to this test harness.
Giuliano Procida [Thu, 27 May 2021 08:53:05 +0000 (09:53 +0100)]
abg-reader: Create a fresh corpus object per corpus
Currently the XML reader reuses the same corpus object for all
corpora in a corpus group. This has an unwanted side-effect: any
abi-instr with the same path in different corpora will collide and
parts of the ABI will be lost.
Creating a new corpus object for every abi-corpus element seems like
the right thing to do. Testing with large ABIs containing many corpora
also shows a modest (~10%) abidiff speed improvement.
* src/abg-reader.cc (read_corpus_from_input): Always create a
fresh corpus object for each abi-corpus XML element.
Giuliano Procida [Thu, 27 May 2021 08:53:04 +0000 (09:53 +0100)]
abg-reader: Ensure corpus always has a symtab reader
In the presence of an empty abi-corpus element and with the following
change to always allocate a fresh corpus object, such objects can
sometimes be left without a symtab reader, instead of inheritng one
from the previous corpus.
The reader is called to obtain sorted lists of symbols during ABI
comparisons. The simplest way to avoid a crash is to maintain the
invariant that a reader object is always present.
With this change, if there is bad XML preventing symbols from being
read, no error is raised as before, but the logic has been tweaked so
that abi-instr parsing will nevertheless be attempted.
* src/abg-reader.cc (read_symbol_db_from_input): Fix
documentation for this function. Allow "successful parsing" to
include the case where no symbols were present in the input.
(read_corpus_from_input): Unconditionally set a symtab reader
on the corpus object. Unconditionally parse the abi-instr of a
corpus.
Giuliano Procida [Thu, 27 May 2021 08:53:03 +0000 (09:53 +0100)]
dwarf-reader: Create new corpus unconditionally
The DWARF reader appears to create a new corpus object only if one is
not already present. However, the only case where there can be
multiple corpora is when build_corpus_group_from_kernel_dist_under is
called and this function clears down the reader context, including the
current corpus, between reading ELF objects.
So it's clearer to just create a fresh corpus object unconditionally
in the DWARF reader.
* src/abg-dwarf-reader.cc (read_debug_info_into_corpus):
Create new corpus object unconditionally.
abixml reader: Fix recursive type definition handling
This patch basically adjusts build_array_type_def to build the array
type early without trying to create the array element type first. The
array type is then registered, and then the array element type is
created. That way, if the element type indirectly needs the array
type being created, then it's going to be used. Then the element type
is set to the array once it's created.
The patch adjusts the code of the array type to allow creating the
array without element types and then setting the element type later.
* include/abg-ir.h (array_type_def::update_size): Declare new
private member function.
(array_type_def::array_type_def): Declare ...
* src/abg-ir.cc (array_type_def::array_type_def): ... a new
constructor that takes no element type.
(array_type_def::update_size): Define this helper private member
function.
(array_type_def::get_subrange_representation): Adjust for this to
work when there is no element type setup yet.
(array_type_def::{set_element_type, append_subranges}): Update the
size and name of the array.
* src/abg-reader.cc (build_array_type_def): Create the array type
before the element type so that the later can re-use the former.
Dodji Seketeli [Mon, 7 Jun 2021 14:07:50 +0000 (16:07 +0200)]
rhbz1951526 - SELF CHECK FAILED for 'gimp-2.10'
This is a fix for bug https://bugzilla.redhat.com/show_bug.cgi?id=1951526.
Although it's a patch for one bug, it addresses several different
issues that cause the observed self comparison failure. As is often
the case on this kind of problems, the failure is difficult to
reproduce on a synthetic test case so I'll explain the root causes in
this commit log.
There are 4 different root causes to this problem. As I couldn't come
up with a reduced test case for each one of them I am adding the fixes
for those 4 issues in this commit, along with a new regression test
extracted from the initial bugzilla problem report.
So, overall, the symptom we are seeing here is that when we build an
IR for the input binary gimp-2.0, save that IR into abixml, and read
back that abixml into another IR, comparing the two IR shows changes;
it should show no change whatsoever. This is what we call in
libabigail jargon a self comparison (or self check) failure.
As alluded to in my introduction above, there appear to be 4 different
root causes for that self comparison failure.
1/ The first cause has to do with a situation about two anymous enums
that are (wrongly) considered different from an ABI point of view.
Using the debugging capabilities recently gained by libabigail, I
could notice that the two enums are:
Note how the second enum has a new enumerator named
'GIMP_INTERPOLATION_LANCZOS', but its value is '3', which is the exact
same value of as the one of the existing enumerator
GIMP_INTERPOLATION_NOHALO.
During type canonicalization of the IR from the input binary,
libabigail (wrongly) considers these two enums as being different.
This leads to the type 'Gimp*' (or anything type indirectly using any
one of the anonymous enums above) coming from one translation unit
being considered different from a type 'Gimp*' coming from another
translation unit, just because their are not using either one version
of the anonymous enum above or the other.
This leads to a *LOT* of spurious type changes from the first IR, that
are saved into abixml.
To fix this first problem, this patch introduces "two modes" of
comparing enums. There is a binary-only mode which only looks
enumerator values, not enumerator names. And then there is the
source-level mode which looks at both enumerator names and values when
comparing enums. The former mode is used during type
canonicalization. However, when a change is detected between two
enums, then the diff-IR built to describe the change is constructed
using the later mode. Using the later mode allows to describe
precisely things like enumerator insertion/removal by referring to the
names of the added/removed enumerators.
2/ The second root cause is that a struct, say, 'struct _GimpImage'
from a translation unit is considered different from a 'struct
_GimpImage' because the DWARF reader wrongly assign them different sizes.
Notice how the second 'struct _GimpImage' has a size of zero.
This is because when reading the DWARF, we first encounter the DIE for
the first' struct _GimpImage' and we properly build a type for it,
along with its declaration. Then when we encounter another DIE
defining 'struct _GimpImage' again, from a different translation unit,
the DWARF reader recognizes that it's a DIE for a declaration of
'struct _GimpImage' and fails to re-use the previous definition for
'struct _GimpImage'. So it wrongly builds declaration-only 'struct
_GimpImage' for it, hence the second struct _GimpImage with a zero
size.
Here again that creates spurious changes (after type canonicalization)
in types using struct _GimpImage. And that is a lot of types,
including things like 'Gimp*' and the like.
The fix for this root cause issue is to change
add_or_update_class_type in the DWARF reader to recognize that we are
seeing a type declaration for which there was already a definition and
return that definition instead of creating a new declaration.
3/ The third root cause is better explained with a "screen shot".
Consider these two 'versions' of the same struct _GdkDevice from two
different translation units:
GObject parent_instance;'
gchar* _g_sealed__name; // uses canonical type '@0x6892980'
GdkInputSource _g_sealed__source;'
GdkInputMode _g_sealed__mode;'
gboolean _g_sealed__has_cursor; // uses canonical type '@0x688dd00'
gint _g_sealed__num_axes; // uses canonical type '@0x688dd00'
GdkDeviceAxis* _g_sealed__axes;'
gint _g_sealed__num_keys; // uses canonical type '@0x688dd00'
GdkDeviceKey* _g_sealed__keys;'
};
$10 = (abigail::ir::type_base *) 0x7cd71e0
(gdb)
Notice how the name of the second data member 'name' was changed to
'_g_sealed_name'. A similar scheme happens to several other data
member names. The offsets and types of the struct _GdkDevice haven't
changed however. So from an ABI standpoint, the two versions of that
struct are equal. Libabigail consider them different however.
Because that type is used by tons of other types of the binary being
analyzed, this leads to lots of spurious canonical type difference
that shouldn't be there.
These three issues are magnified by the fact that the gimp binary is
compiled using "link time optimization". That brings in a lot more
opportunities to see these underlying issues that have been there for
a long time.
4/ The fourth and last root cause issue. When the abixml writer emits
a translation unit (TU), it keeps track of the 'non-emitted referred
to type' of the currently emitted translation unit and emits them at
the end of each TU. For instance, if the type 'Gimp*' (pointer to
Gimp) was emitted, and yet the referred-to type 'Gimp' wasn't emitted,
the TU writer makes sure to emit the referred-to 'Gimp' type at the
end of the TU. This has been going on for quite some time now.
The problem however is that although the non-emitted referred-to type
was referred to in this current TU, it might no have been *DEFINED* in
this TU. In that case, it should not be emitted in this TU.
Otherwise, the TU where that type is defined in the abixml might
appear different from where it is defined in the initial binary,
leading to self comparison failures down the road.
This patch ensures that a non-emitted referred-to type is always
emitted in the TU it belongs to.
5/ After doing all this, it appears that we were forgetting to emit
some function types that were defined in TUs emitted earlier and yet
were being referred-to later. Looking closer, I realized that we
should just emit function types seen in a given TU, regardless of the
referred-to relation.
The problem with that is that function types are special in libabigail
because there are two situation in which they are created.
Basically, a function type is created by the DWARF DIE
DW_TAG_subroutine_type. This is for instance how pointer to functions
are represented in DWARF, namely, by a DW_TAG_pointer_type that points
to a DW_TAG_subroutine_type. That is represented in the libabigail ir
by an instance of the abigail::ir::function_type type. This is
represented in abixml as a 'function-type' XML element.
But then, libabigail considers that all decls have a type. This
applies obviously for variables or data member. Right. But then,
libabigail considers that a function is also a decl, which has a type.
And the type of a function is a function type, represented by the same
abigail::ir::function_type. A practical difference with the former
situation is that function decls are *NOT* represented in abixml using
a 'function-type' element. Instead a 'function-decl' XML element uses
return type and parameter elements to represent the types involved
with a function decl.
Said otherwise, the former 'function type' concept used to represent
the type of functions in the libabigail IR is artificial. This
artificial-ness was not explicitly expressed in libabigail. This
patch now expresses that artificial-ness for function types. So the
abixml writer now just decide to not emit artificial function types,
and instead, emit all the non-artificial function types instead.
This addresses this last issues by being able to emit all
non-artificial function types defined in a given TU, without having to
bother with the fact that they are referred-to or not.
Together, fixing these 5 problems fixes this reported problem.
The changes to the reference test outputs are adjustments needed
because of the abixml output indeed changes.
* include/abg-ir.h
(environment::use_enum_binary_only_equality): Declare
accessors. (type_or_decl_base::{s,g}et_is_artificial):
Likewise. (decl_base::{s,g}et_is_artificial): Remove
accessors. * src/abg-ir.cc
(environment::priv::use_enum_binary_only_equality): Define new
data member.
(environment::priv::use_enum_binary_only_equality): Define
accessors. (type_or_decl_base::priv::is_artificial_): Define
new data member. It has actually moved here from
decl_base::priv::is_artificial_.
(type_or_decl_base::priv::priv): Initialize it.
(type_or_decl_base::{g,s}et_is_artificial): Define accessors.
(decl_base::is_artificial_): Move this to
type_or_decl_base::is_artificial_.
(maybe_adjust_canonical_type): In a given class of equivalence
of function types, if there is one non-artificial function
type, then the entire class of equivalence is considered
non-artificial; so flag the canonical function type as being
non-artificial. (is_enumerator_present_in_enum): Define new
static function. (equals): Re-arrange the overload for enums
so the order of the enumerators doesn't count in the
comparison. Also, two enums with different numbers of
enumerators can still be equal, with the right redundancy. In
the overload for var_decl, avoid taking into account the names
of data members in the comparison.
(enum_type_decl::enumerator::operator==): In the binary-level
comparison mode, only compare the value of enumerators, not
their name. * src/abg-comparison.cc (compute_diff): In the
overload for enum_type_decl, if the enums compare different
using binary-level comparison, then use source-level
comparison to build the diff-IR. * src/abg-dwarf-reader.cc
(read_context::compare_before_canonicalisation): Compare enums
using binary-level comparison. (add_or_update_class_type): If
we are looking at the definition of an existing declaration
that has been already defined then use the previous
definition, in case we are going to need to update the
definition. Also, update the size only if it's needed.
(build_function_type): By default, consider the newly built
function type as artificial. (build_ir_node_from_die): When
looking at a DW_TAG_subroutine_type DIE, consider the built
function type as non-artificial. * src/abg-reader.cc
(read_context::maybe_check_abixml_canonical_type_stability):
Don't consider declaration-only classes in an ODR context
because they don't have canonical types.
(build_function_decl): Flag the function type of the function
as artificial. (build_class_decl): Make sure to reuse class
types that were already created. * src/abg-writer.cc
(write_translation_unit): Allow emitting empty classes. Make
sure referenced types are emitting in the translation unit
where they belong. Avoid emitting artificial function types.
*
tests/data/test-alt-dwarf-file/rhbz1951526/rhbz1951526-report-0.txt:
New test reference output. *
tests/data/test-alt-dwarf-file/rhbz1951526/usr/bin/gimp-2.10:
New reference test binary input. *
tests/data/test-alt-dwarf-file/rhbz1951526/usr/lib/debug/.dwz/gimp-2.10.22-2.el9.1.aarch64:
Likewise. *
tests/data/test-alt-dwarf-file/rhbz1951526/usr/lib/debug/usr/bin/gimp-2.10-2.10.22-2.el9.1.aarch64.debug:
Likewise. * tests/data/Makefile.am: Add the new test files to
source directory. * tests/test-alt-dwarf-file.cc: Add the new
test inputs to this test harness. *
tests/data/test-abidiff/test-PR18791-report0.txt: Adjust. *
tests/data/test-abidiff/test-enum0-report.txt: Likewise. *
tests/data/test-annotate/libtest23.so.abi: Likewise. *
tests/data/test-annotate/libtest24-drop-fns-2.so.abi:
Likewise. *
tests/data/test-annotate/libtest24-drop-fns.so.abi: Likewise.
* tests/data/test-annotate/test-anonymous-members-0.o.abi:
Likewise. * tests/data/test-annotate/test13-pr18894.so.abi:
Likewise. * tests/data/test-annotate/test14-pr18893.so.abi:
Likewise. * tests/data/test-annotate/test15-pr18892.so.abi:
Likewise. * tests/data/test-annotate/test17-pr19027.so.abi:
Likewise. *
tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
Likewise. *
tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
Likewise. *
tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi:
Likewise. * tests/data/test-annotate/test21-pr19092.so.abi:
Likewise. *
tests/data/test-diff-dwarf-abixml/test0-pr19026-libvtkIOSQL-6.1.so.1.abi:
Likewise. *
tests/data/test-diff-dwarf/PR25058-liblttng-ctl-report-1.txt:
Likewise. * tests/data/test-diff-dwarf/test6-report.txt:
Likewise. *
tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt:
Likewise. *
tests/data/test-diff-filter/test31-pr18535-libstdc++-report-1.txt:
Likewise. * tests/data/test-diff-filter/test8-report.txt:
Likewise. *
tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-2.txt:
Likewise. *
tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-3.txt:
Likewise. *
tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
Likewise. *
tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt:
Likewise. *
tests/data/test-read-dwarf/PR22015-libboost_iostreams.so.abi:
Likewise. *
tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Likewise.
*
tests/data/test-read-dwarf/PR25042-libgdbm-clang-dwarf5.so.6.0.0.abi:
Likewise. *
tests/data/test-read-dwarf/PR26261/PR26261-exe.abi: Likewise.
* tests/data/test-read-dwarf/libtest23.so.abi: Likewise. *
tests/data/test-read-dwarf/libtest24-drop-fns-2.so.abi:
Likewise. *
tests/data/test-read-dwarf/libtest24-drop-fns.so.abi:
Likewise. *
tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
* tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi:
Likewise. * tests/data/test-read-dwarf/test12-pr18844.so.abi:
Likewise. * tests/data/test-read-dwarf/test13-pr18894.so.abi:
Likewise. * tests/data/test-read-dwarf/test14-pr18893.so.abi:
Likewise. * tests/data/test-read-dwarf/test15-pr18892.so.abi:
Likewise. * tests/data/test-read-dwarf/test16-pr18904.so.abi:
Likewise. * tests/data/test-read-dwarf/test17-pr19027.so.abi:
Likewise. *
tests/data/test-read-dwarf/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
Likewise. *
tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
Likewise. *
tests/data/test-read-dwarf/test20-pr19025-libvtkParallelCore-6.1.so.abi:
Likewise. * tests/data/test-read-dwarf/test21-pr19092.so.abi:
Likewise. *
tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
Likewise. *
tests/data/test-read-dwarf/test9-pr18818-clang.so.abi:
Likewise. *
tests/data/test-read-write/test28-without-std-fns-ref.xml:
Likewise. *
tests/data/test-read-write/test28-without-std-vars-ref.xml:
Likewise.
Dodji Seketeli [Mon, 7 Jun 2021 12:33:53 +0000 (14:33 +0200)]
reader: Canonicalizing a type once is enough
While looking at something else, I noticed that the abixml reader was
trying to canonicalize each type twice. Once should be enough.
* src/abg-reader.cc (build_type): Don't try to canonicalize the
type here because all the sub-routines of this function (which
actually build the type) already try to canonicalize it.
Dodji Seketeli [Mon, 7 Jun 2021 12:29:10 +0000 (14:29 +0200)]
ir: make 'debug(artefact)' support showing enums
While debugging something else, I realized that 'debug(artifact)'
couldn't show the enumerators of an enum. I also realized that we were
not showing the 'declaration-only-ness' of the artefact either. This
patch fixes that.
* src/abg-ir.cc (get_debug_representation): Add support for
showing details for enums. Also show declaration-only-ness for
class or unions.
When debugging self comparison issues, once the abixml file is read
back into memory, I often want to get the type-id of an artifact that
was read from abixml or get the canonical type of an artifact which
type-id is known.
Part of that information is indirectly present in the data member
abigail::reader::reader_context::m_pointer_type_id_map after the
.typeid file is loaded from file into memory. The problem is that the
instance of abigail::reader::reader_context is transient as it's
destroyed quickly after the abixml file is read. We want it to stay
alive longer. So this patch moves that data member into
abigail::environment instead, along with its accessors. The patch
then adds the new member functions
environment::{get_type_id_from_pointer,get_canonical_type_from_type_id}
to get the type-id of an artifact de-serialized from abixml and the
canonical type of an artifact for which we now the type-id string.
* include/abg-ir.h (environment::{get_pointer_type_id_map,
get_type_id_from_pointer, get_canonical_type_from_type_id}):
Declare new member functions.
* src/abg-ir.cc (environment::{get_pointer_type_id_map,
get_type_id_from_pointer, get_canonical_type_from_type_id}):
Define member functions.
(environment::priv::pointer_type_id_map_): Move
this data member here from ...
* src/abg-reader.cc (read_context::m_pointer_type_id_map):
... here.
(read_context::get_pointer_type_id_map): Remove this as it's now
defined in environment::get_pointer_type_id_map.
(read_context::maybe_check_abixml_canonical_type_stability):
Adjust.
(build_type): Likewise.
Dodji Seketeli [Mon, 7 Jun 2021 07:05:05 +0000 (09:05 +0200)]
ir: Enable setting breakpoint on first type inequality
When debugging type canonicalization in
type_base::get_canonical_type_for, I more often than not want to know
why a type compares different to another. Until now, I've been doing
that by stepping in the debugger. I figure a much efficient way of
doing that is to be able to set a breakpoint on the first occurrence
of type inequality.
To do that, I am adding a few macros to use in the 'equals' functions
to return their value: ABG_RETURN(value), ABG_RETURN_REQUAL(l,r) and
ABG_RETURN_FALSE. Those invoke a new function called
'notify_equality_failed' when the result of the comparison is false.
This allows to just set a debugger breakpoint on
'notify_equality_failed' to know when and why the type comparison
fails.
These macros invoke notify_equality_failed only if the
WITH_DEBUG_SELF_COMPARISON macro is defined. Otherwise, they do what
the code was doing previously. Said otherwise, this whole shebang is
enabled only when the code is configured with
--enable-debug-self-comparison.
This patch incurs no functional change.
* src/abg-ir.cc (notify_equality_failed): Define new static
function if WITH_DEBUG_SELF_COMPARISON is defined.
(ABG_RETURN_EQUAL, ABG_RETURN_FALSE, ABG_RETURN): Define new macros.
(try_canonical_compare): Use ABG_RETURN_EQUAL rather than just
returning the result of a comparison.
(equals): In all the overloads, use the new ABG_RETURN* macros,
rather than just returning boolean values.
abixml reader: Fix recursive type definition handling
After that patch, I noticed that qualified and reference types also
need to be able to handle the case where their underlying/pointed-to
type recursively refers to the type being created. Just like typedef
and pointer types in that patch.
This patch thus adjusts build_qualified_type_decl and
build_reference_type_def to support that. It also adjusts the
qualified_type_def and reference_type_def classes to support being
created without underlying/pointed-to type initially.
* include/abg-ir.h (qualified_type_def::qualified_type_def):
Declare a constructor with no underlying type.
(reference_type_def::reference_type_def): Declare a constructor
with no pointed-to type.
(reference_type_def::set_pointed_to_type): Declare new method.
* src/abg-ir.cc (qualified_type_def::priv::priv): Define a
constructor that takes no underlying type.
(qualified_type_def::build_name): Make this work even on
incomplete types with no underlying type. In that case, this
behaves like if the underlying type is "void".
(qualified_type_def::qualified_type_def): Define a constructor
that takes no underlying type.
(qualified_type_def::get_size_in_bits): Make this work on
incomplete types with no underlying type.
(qualified_type_def::set_underlying_type): Adjust to properly
update this type when a new underlying type is set. Particularly,
its name and the lookup maps from the type scope.
(reference_type_def::reference_type_def): Define a constructor
that takes no pointed-to type.
(reference_type_def::set_pointed_to_type): Define new function.
* src/abg-reader.cc (build_qualified_type_decl): Construct the
qualified type early before we try to construct its underlying
type. Associate this incomplete type with the type-id. Then try
to construct the underlying type. During its construction, if
this incomplete qualified type is needed due to recursion then it
can be used, leading to just one qualified type being used as it
should be.
(build_reference_type_def): Likewise for building incomplete
reference type first before its pointed-to type.
Dodji Seketeli [Fri, 21 May 2021 21:55:44 +0000 (23:55 +0200)]
abixml reader: Fix recursive type definition handling
This should fix self comparison bug https://bugzilla.redhat.com/show_bug.cgi?id=1944088
This arose from a self comparison check failing on the library
libgvpr.so.2 from the graphviz-2.44.0-17.el9.aarch64.rpm package.
Now that we have facilities to see what type (instantiated from the
abixml representation of the libgvpr.so library) exactly the
canonicalization process is failing for, I decided to use it ;-)
I extracted the package and its associating debug info into a
directory named 'extract' and ran abidw --debug-abidiff on it:
error: problem detected with type 'typedef Vmalloc_t' from second corpus
error: canonical type for type 'typedef Vmalloc_t' of type-id 'type-id-170' changed from 'd72618' to '14a7448'
error: problem detected with type 'Vmalloc_t*' from second corpus
error: canonical type for type 'Vmalloc_t*' of type-id 'type-id-188' changed from 'd72ba8' to '14a7968'
[...]
This tells me that "typedef Vmalloc_t", created from the abixml
compares different from its originating peer that was created from the
binary directly. The same goes for the pointer type "Vmalloc_t*", etc.
Using the new debugging/logging functionalities from the command line
of the debugger, I could see that in the abixml reader,
build_typedef_decl can fail subtly when the underlying type of the
typedef refers to the typedef itself. In that case, we need to ensure
that the typedef created by build_typedef_decl is the same one that is
used by the underlying type. which is not the case at the moment. At
the moment, the underlying type would create a new typedef beside the
one currently being created by build_typedef_decl. That leads to more
than one typedef in the system to designate "typedef Vmalloc_t". And
that wreaks havoc later down the road.
This patch arranges so that build_typedef_decl creates the typedef
"early" before the underlying type is created. That typedef
temporarily has no underlying type. It's registered as being the
typedef for the type-id string that identifies it in the abixml. And
then the function goes to create the underlying type. This
arrangement ensures that if the underlying type refers to the typedef
being created (via its type-id string), then the typedef that was
created early is effectively re-used. This ensures that a typedef
which recursively refer to itself is properly represented. It's only
when the underlying type is fully created that it's added to the
typedef.
Something similar is done for pointer types, in
build_pointer_type_def.
Note that to do this, the patch adjusts the typedef_decl and
pointer_type_def classes so that they can be created with no
underlying/pointed-to types. The underlying/pointed-to type can thus
be added later.
I believe this patch is the minimal patch necessary to fix this issue.
The graphviz RPM is added to the regression test suite for good
measure.
After visual inspection, I realized that there are other types besides
typedef and pointer types that exhibit the same class of problem even
if they are not involved in this issue on this particular binary. A
subsequent patch is going to address the problem for those types,
namely, qualified and reference types.
* include/abg-ir.h (pointer_type_def::pointer_type_def): Declare a
constructor with no pointed-to type.
(pointer_type_def::set_pointed_to_type): Declare new method.
(typedef_decl::typedef_decl): Declare a constructor with no
underlying type.
* src/abg-ir.cc (pointer_type_def::pointer_type_def): Define a
constructor with no pointed-to type. The pointed-to type can thus
later be set when it becomes available.
(pointer_type_def::set_pointed_to_type): Define new method.
(pointer_type_def::get_qualified_name): Make this work on a
pointer type that (momentarily) has no pointed-to type.
(typedef_decl::typedef_decl): Define a constructor with no
underlying type.
(typedef_decl::get_size_in_bits): Make this work on a typedef that
has (momentarily) no underlying type.
(typedef_decl::set_underlying_type): Update the size and alignment
of the typedef from its new underlying type.
* src/abg-reader.cc (build_pointer_type_def): Construct the
pointer type early /BEFORE/ we even try to construct its
pointed-to type. Associate this incomplete type with the type-id.
Then try to construct the pointed-to type. During the
construction of the pointed-to type, if this pointer is needed
(due to recursion) then the incomplete pointer type can be used,
leading to just one pointer type used (recursively) as it should
be.
(build_typedef_decl): Likewise for building typedef type early
without its underlying type so that it can used by the underlying
type if needed.
* tests/data/test-diff-pkg/graphviz-2.44.0-18.el9.aarch64-self-check-report-0.txt:
New test reference output.
* tests/data/test-diff-pkg/graphviz-2.44.0-18.el9.aarch64.rpm: New
binary test input.
* tests/data/test-diff-pkg/graphviz-debuginfo-2.44.0-18.el9.aarch64.rpm: Likewise.
* tests/data/Makefile.am: Add the new test material above to
source distribution.
* tests/test-diff-pkg.cc (in_out_specs): Add the test inputs above
to this test harness.
Dodji Seketeli [Thu, 20 May 2021 10:56:27 +0000 (12:56 +0200)]
Introduce artificial locations
When an abixml file is "read in" and the resulting in-memory internal
representation is saved back into abixml, the saved result can often
differ from the initial input in a non deterministic manner. That
read-write instability is non-desirable because it generates
unnecessary changes that cloud our ability to build reliable
regression tests, among other things. Also, that unnecessarily
increases the changes to the existing regression test reference
outputs leading to a lot more churn than necessary.
This patch tries to minimize that abixml read-write instability in
preparation of patches that would otherwise cause too much churn in
reference output files of the regression test suite.
The main reason why this read-write instability occurs is that a lot
of type definitions don't have source location.
For instance, all the types that are not user defined fall into that
category. Those types can't be topologically sorted by using their
location as a sorting criteria. Instead, we are currently using the
order in which those location-less types are processed by the reader
as the output (i.e, write time) order. The problem with that approach
is that the processing order can be dependant on the other of which
OTHER TYPES likes class types are processed. And that order can be
changed by patches in the future. That in and of itself shouldn't
change the write order of these types.
For instance, if a class Foo has data members and member functions
whose types are non-user-defined types, then, the order in which those
data members are processed can possibly determine the order in which
those non-user-defined are processed.
This patch thus introduces the concept of artificial location.
A *NON-ARTIFICIAL* location is a source location that was emitted by
the original emitter of the type meta-data. In the case of DWARF type
meta-data, the compiler originally emitted source location. That
location when read is considered non-artificial, or natural, if you
prefer.
In the case of abixml however, an artificial location would be the
source location at which an XML element is encountered.
For instance, consider the abixml file below "path/to/exmaple.abi" below:
At line 3 of that file, the non-user defined type name "bool" is
defined using the XML element "type-decl". Note how that element
lacks the "filepath", "line" and "column" attributes that would collectively
define the source location of that type. So this type "bool" don't
carry any natural location.
The abixml reader can however generate an artificial location for it.
That the filepath of that artificial location would thus be the path
to that ABI corpus, i.e, "path/to/example.abi". The line number would
be 3. The column would be left to zero.
That artificial location will never be explicitly be written down as
an XML attribute as it can always be implicitly retrieved by
construction.
The patch changes the internal representation so that each ABI
artifact of the internal representation can now carry both an
artificial and a natural location.
When two artifacts have an artificial location, then its used to
topologically sort them. The one that is defined topologically
"earlier" obviously comes first.
When two artifacts have a natural location then its used to
topologically sort them.
Otherwise, they are sorted lexicographically.
This makes the output of abilint a lot more read-write stable.
* include/abg-fwd.h (get_artificial_or_natural_location): Declare
new function.
* include/abg-ir.h (location::location): Initialize & copy ...
(location::is_artificial_): ... a new data member.
(location::{g,s}et_is_artificial): New accessors.
(location::{operator=}): Adjust.
(type_or_decl_base::{set,get,has}_artificial_location): Declare
new member functions.
* src/abg-ir.cc (decl_topo_comp::operator()): In the overload for
decl_base*, use artificial location for topological sort in
priority. Otherwise, use natural location. Otherwise, sort
lexicographically.
(type_topo_comp::operator()): In the overload for type_base*, use
lexicographical sort only for types that don't have location at
all.
(type_or_decl_base::priv::artificial_location_): Define new data
member.
(type_or_decl_base::{set,get,has}_artificial_location): Define new
member functions.
(decl_base::priv): Allow a constructor without location. That one
sets no natural location to the artifact.
(decl_base::decl_base): Use decl_base::set_location in the
constructor now.
(decl_base::set_location): Adjust this to support setting a
natural or an artificial location.
(get_debug_representation): Emit debugging log showing the
location of an artifact, using its artificial location in
priority.
(get_natural_or_artificial_location): Define new function.
* src/abg-reader.cc (read_artificial_location)
(maybe_set_artificial_location): Define new static functions.
(read_location): Read artificial location when no natural location
was found.
(build_namespace_decl, build_function_decl, build_type_decl)
(build_qualified_type_decl, build_pointer_type_def)
(build_reference_type_def, build_subrange_type)
(build_array_type_def, build_enum_type_decl, build_typedef_decl)
(build_class_decl, build_union_decl, build_function_tdecl)
(build_class_tdecl, build_type_tparameter)
(build_non_type_tparameter, build_template_tparameter): Read and
set artificial location.
* src/abg-writer.cc (write_location): Don't serialize artificial
locations.
(write_namespace_decl): Topologically sort member declarations
before serializing them.
* tests/data/test-read-write/test28-without-std-fns-ref.xml:
Adjust.
* tests/data/test-read-write/test28-without-std-vars-ref.xml:
Likewise.
* tests/data/test-annotate/libtest23.so.abi: Likewise.
* tests/data/test-annotate/libtest24-drop-fns-2.so.abi: Likewise.
* tests/data/test-annotate/libtest24-drop-fns.so.abi: Likewise.
* tests/data/test-annotate/test0.abi: Likewise.
* tests/data/test-annotate/test13-pr18894.so.abi: Likewise.
* tests/data/test-annotate/test14-pr18893.so.abi: Likewise.
* tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
* tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
* tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
Likewise.
* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
Likewise.
* tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi:
Likewise.
* tests/data/test-annotate/test21-pr19092.so.abi: Likewise.
* tests/data/test-read-dwarf/PR22015-libboost_iostreams.so.abi:
Likewise.
* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Likewise.
* tests/data/test-read-dwarf/PR25042-libgdbm-clang-dwarf5.so.6.0.0.abi:
Likewise.
* tests/data/test-read-dwarf/PR26261/PR26261-exe.abi: Likewise.
* tests/data/test-read-dwarf/libtest23.so.abi: Likewise.
* tests/data/test-read-dwarf/libtest24-drop-fns-2.so.abi: Likewise.
* tests/data/test-read-dwarf/libtest24-drop-fns.so.abi: Likewise.
* tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
* tests/data/test-read-dwarf/test-suppressed-alias.o.abi: Likewise.
* tests/data/test-read-dwarf/test0.abi: Likewise.
* tests/data/test-read-dwarf/test0.hash.abi: Likewise.
* tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi: Likewise.
* tests/data/test-read-dwarf/test11-pr18828.so.abi: Likewise.
* tests/data/test-read-dwarf/test12-pr18844.so.abi: Likewise.
* tests/data/test-read-dwarf/test13-pr18894.so.abi: Likewise.
* tests/data/test-read-dwarf/test14-pr18893.so.abi: Likewise.
* tests/data/test-read-dwarf/test15-pr18892.so.abi: Likewise.
* tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise.
* tests/data/test-read-dwarf/test17-pr19027.so.abi: Likewise.
* tests/data/test-read-dwarf/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
Likewise.
* tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
Likewise.
* tests/data/test-read-dwarf/test20-pr19025-libvtkParallelCore-6.1.so.abi:
Likewise.
* tests/data/test-read-dwarf/test21-pr19092.so.abi: Likewise.
* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
Likewise.
* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Likewise.
* tests/data/test-read-write/test28-without-std-fns-ref.xml:
Likewise.
* tests/data/test-read-write/test28-without-std-vars-ref.xml:
Likewise.
Dodji Seketeli [Wed, 19 May 2021 16:22:18 +0000 (18:22 +0200)]
Detect abixml canonical type instability during abidw --debug-abidiff
In the debugging mode of self comparison induced by the invocation of
"abidw --debug-abidiff <binary>", it's useful to be able to ensure the
following invariant:
The pointer value of the canonical type of a type T that is
serialized into abixml with the id string "type-id-12" (for
instance) must keep the same canonical type pointer value when
that abixml file is de-serialized back into memory. This is
possible mainly because libabigail stays loaded in memory all the
time during both serialization and de-serialization.
This patch adds support for detecting when that invariant is not
respected.
In other words it detects when the type build from de-serializing the
type which id is "type-id-12" (for instance) has a canonical type
which pointer value is different from the pointer value of the
canonical type (of the type) that was serialized as having the type id
"type-id-12".
This is done in three phases.
The first phase happens in the code of abidw itself; after the abixml
is written on disk, another file called the "typeid file" is written
on disk as well. That later file contains a set of records; each
record associates a "type id string" (like the type IDs that appear in
the abixml file) to the pointer value of the canonical type that
matches that type id string. That file is thus now available for
manual inspection during a later debugger session. This is done by
invoking the new function write_canonical_type_ids.
The second phase appears right before abixml loading time. The typeid
file is read back and the association "type-id string" <-> is stored
in a hash map that is returned by
environment::get_type_id_canonical_type_map(). This is done by
invoking the new function load_canonical_type_ids.
The third phase happens right after the canonicalization (triggered in
the abixml reader) of a type coming from abixml, corresponding to a
given type id. It checks if the pointer value of the canonicalization
type just computed is the same as the one associated to that type id
in the map returned by environment::get_type_id_canonical_type_map.
This is a way of verifying the "stability" of a canonical type during
its serialization and de-serialization to and from abixml and it's
done as part of "abidw --debug-abidiff <binary>".
Just as an example, here is the kind of error output that I am getting
on a real life debugging session on a binary that exhibits self
comparison error:
$ abidw --debug-abidiff -d <some-binary>
error: problem detected with type 'typedef Vmalloc_t' from second corpus
error: canonical type for type 'typedef Vmalloc_t' of type-id 'type-id-179' changed from '1a083e8' to '21369b8'
[...]
$
From this output, I see that the first type for which libabigail
exhibits an instability on the pointer value of the canonical type is
the type 'typedef Vmalloc_t'. In other words, when that type is saved
to abixml, the type we read back is different. This needs further
debugging but at least it pinpoints exactly what type we are seeing
the core issue on first. This is of a tremendous help in the root
cause analysis needed to understand why the self comparison is
failing.
* include/abg-ir.h (environment::get_type_id_canonical_type_map):
Declare new data member.
* src/abg-ir.cc (environment::priv::type_id_canonical_type_map_):
Define new data member.
(environment::get_type_id_canonical_type_map): Define new method.
* include/abg-reader.h (load_canonical_type_ids): Declare new
function.
* src/abg-reader.cc (read_context::m_pointer_type_id_map):
Define new data member.
(read_context::{get_pointer_type_id_map,
maybe_check_abixml_canonical_type_stability}): Define new methods.
(read_context::{maybe_canonicalize_type,
perform_late_type_canonicalizing}): Invoke
maybe_perform_self_comparison_canonical_type_check after
canonicalization to perform canonicalization type stability
checking.
(build_type): Associate the pointer value for the newly built type
with the type id string identifying it in the abixml. Once the
abixml representation is dropped from memory and we are about to
perform type canonicalization, we can still know what the type id
of a given type coming from abixml was; it's thus possible to
verify that the canonical type associated to that type id is the
same as the one stored in the typeid file.
(read_type_id_string): Define new static function.
(load_canonical_type_ids): Define new function.
* include/abg-writer.h (write_canonical_type_ids): Likewise.
* src/abg-writer.cc (write_canonical_type_ids): Define new
function overloads.
* tools/abidw.cc (options::type_id_file_path): New data member.
(load_corpus_and_write_abixml): Write and read back the typeid
file.
Dodji Seketeli [Tue, 18 May 2021 08:29:38 +0000 (10:29 +0200)]
Detect failed self comparison in type canonicalization of abixml
During the self comparison triggered by "abidw --abidiff <binary>",
some comparison errors can happen when canonicalizing types that are
"de-serialized" from the abixml that was serialized from the input
binary.
This patch adds some debugging checks and messaging to emit a message
when a type from the abixml appears to not "match" the original type
from the initial corpus it originated from.
This is the more detailed description:
Let's consider a type T coming from the corpus of the input binary.
That input corpus is serialized into abixml and de-serialized again
into a second corpus that we shall name the abixml corpus. From that
second corpus, let's consider the type T' that is the result of
serializing T into abixml and de-serializing it again. T is said to
be the original type of T'. If T is a canonical type, then T' should
equal T. Otherwise, if T is not a canonical type, its canonical type
should equal the canonical type of T'.
For the sake of simplicity, let's consider that T is a canonical
type. During the canonicalization of T', T' should equal T. Each and
every canonical type coming from the abixml corpus should be equal to its
original type from the binary corpus.
If a T' is different from its original type T, then there is an
"equality problem" between T and T'. In other words, there is a
mismatch between T and T'. We want to be notified of that problem so
that we can debug it further and fix it.
So this patch introduces the option "abidw --debug-abidiff <binary>"
to trigger the "debug self comparison mode". At canonicalization
time, we detect that we are in that debug self comparison mode and
during canonicalization of types from the abixml corpus, it detects
when they compare different from their counterpart from the original
corpus.
This debugging capability can be enabled at configure time with a new
--enable-debug-self-comparison configure option. That option defines
a new WITH_DEBUG_SELF_COMPARISON compile time macro that is used to
conditionally compile the implementation of this debugging feature.
So, one example of this might look like this:
abidw --debug-abidiff bin:
error: problem detected with type 'typedef Vmalloc_t' from second corpus
error: problem detected with type 'Vmalloc_t*' from second corpus
[...]
So that means the "typedef Vmalloc_t" read from the abixml compares
different from its original type where it should not.
So armed with this new insight, I know I need to debug that comparison
in particular to see why it wrongly results in two different types.
* doc/manuals/abidw.rst: Add documentation for the --debug-abidiff
option.
* include/abg-ir.h (environment::{set_self_comparison_debug_input,
get_self_comparison_debug_inputs, self_comparison_debug_is_on}):
Declare new methods.
* configure.ac: Define a new --enable-debug-self-comparison option
that is disabled by default. That option defines a new
WITH_DEBUG_SELF_COMPARISON preprocessor macro.
* src/abg-ir.cc
(environment::priv::{first_self_comparison_corpus_,
second_self_comparison_corpus_, self_comparison_debug_on_}): New
data members. Also, re-indent the data members.
(environment::{set_self_comparison_debug_input,
get_self_comparison_debug_inputs, self_comparison_debug_is_on}):
Define new method.
(type_base::get_canonical_type_for): In the "debug self comparison
mode", if a type coming from the second corpus compares different
from its counterpart coming from the first corpus then log a debug
message.
* src/abg-dwarf-reader.cc (read_debug_info_into_corpus): When
loading the first corpus, if the debug self comparison mode is on,
then save that corpus on the side in the environment.
* src/abg-reader.cc (read_corpus_from_input): When loading the
second corpus, if the debug self comparison mode is on, then save
that corpus on the side in the environment.
* tools/abidw.cc: Include the config.h file for preprocessor
macros defined at configure
(options::debug_abidiff): New data member.
(parse_command_line): Parse the --debug-abidiff option.
(load_corpus_and_write_abixml): Switch the self debug mode on when
the --debug-abidiff option is provided. Use a read_context for
the abixml loading. That is going to be useful for subsequent
patches.
Dodji Seketeli [Mon, 17 May 2021 13:23:02 +0000 (15:23 +0200)]
Add primitives callable from the command line of the debugger
During debugging it can be extremely useful to be able to visualize
the data members of a class type, instance of
abigail::ir::class_decl*. It's actually useful to visualize the
pretty representation (type name and kind) of all types and decls that
inherit abigail::ir::type_or_decl_base, basically.
Today, in the debugger, if we have a variable defined as
"abigail::ir::type_or_decl_base* t", we can type:
$ p t->get_pretty_representation(true, true);
This would display something like:
$ typedef foo_t
However, if 't' is declared as:
"abigail::ir::class_decl* t", then if we type:
(gdb) p t->get_pretty_representation(true, true);
We'll get something like:
class foo_klass
(gdb)
So we get the kind and the name of the ABI artifact; but in case of a
class, we don't get the details of its data members.
This patch introduces a function named "debug" which, would be invoked
on the 't' above like this:
int tm_sec; // uses canonical type '@0x538270'
int tm_min; // uses canonical type '@0x538270'
int tm_hour; // uses canonical type '@0x538270'
int tm_mday; // uses canonical type '@0x538270'
int tm_mon; // uses canonical type '@0x538270'
int tm_year; // uses canonical type '@0x538270'
int tm_wday; // uses canonical type '@0x538270'
int tm_yday; // uses canonical type '@0x538270'
int tm_isdst; // uses canonical type '@0x538270'
long int tm_gmtoff; // uses canonical type '@0x461200'
const char* tm_zone; // uses canonical type '@0x544528'
};
(gdb)
This gives much more information to understand what 't' designates.
The patch also provides functions to retrieve one data member from a
given type that happens to designate a class type. For instance:
(gdb) p debug(get_data_member(t, "tm_sec")._M_ptr)
int tm::tm_sec
(gdb)
The patch also provides a new 'debug_equals' function that allow us to
easily perform an artifact comparison from the command line of the
debugger, as well as methods to the environment type to poke at the
canonical types available in the environment.
These new debugging primitives already proved priceless while
debugging issues that are fixed by subsequent patches to come.
* include/abg-fwd.h (get_debug_representation, get_data_member)
(debug, debug_equals): Declare new functions.
* include/abg-ir.h (environment{get_canonical_types,
get_canonical_type}): Declare new member functions.
* src/abg-ir.cc (environment::{get_canonical_types,
get_canonical_type}): Define new member functions.
(get_debug_representation, get_data_member)
(debug, debug_equals): Define new functions.
Dodji Seketeli [Mon, 24 May 2021 14:29:36 +0000 (16:29 +0200)]
Peel array types when peeling pointers from a type
In peel_typedef_pointer_or_reference_type, we want to peel typedefs
and pointer types (in general) from a given type. We need to peel
array types as well, as those are conceptually a pointer-like type as
well.
This patch does that.
* src/abg-ir.cc (peel_typedef_pointer_or_reference_type): In the
overloads for type_base_sptr and type_base*, peel array type off
as well.
Dodji Seketeli [Mon, 24 May 2021 14:23:18 +0000 (16:23 +0200)]
Fix DWARF type DIE canonicalization
While looking at something else, I noticed that the DWARF type DIE
canonicalization code wasn't taking the type of array elements into
account when comparing arrays.
This patch fixes that.
* src/abg-dwarf-reader.cc (compare_dies): When comparing array
type DIEs, take into account the type of the elements of the
arrays.
reader: Use xmlFirstElementChild/xmlNextElementSibling to iterate over children elements
Use xmlFirstElementChild/xmlNextElementSibling to iterate over element
children nodes rather than doing it by hand in the various for loops.
* src/abg-reader.cc (walk_xml_node_to_map_type_ids)
(read_translation_unit, read_translation_unit_from_input)
(read_symbol_db_from_input, build_needed)
(read_elf_needed_from_input, read_corpus_group_from_input)
(build_namespace_decl, build_elf_symbol_db, build_function_decl)
(build_function_type, build_array_type_def, build_enum_type_decl)
(build_class_decl, build_union_decl, build_function_tdecl)
(build_class_tdecl, build_type_composition)
(build_template_tparameter): Use
xmlFirstElementChild/xmlNextElementSibling rather than poking at
xmlNode::children and looping over xmlNode::next by hand.
reader: Use xmlFirstElementChild and xmlNextElementSibling rather than xml::advance_to_next_sibling_element
The xml::advance_to_next_sibling_element is redundant with the
xmlNextElementSibling API of libxml. Similarly, xmlFirstElementChild
is redundant with using xml::advance_to_next_sibling_element on the
xmlNode::children data member. Let's use the libxml API instead.
* include/abg-libxml-utils.h (advance_to_next_sibling_element):
Remove the declaration of this function.
* src/abg-libxml-utils.cc (go_to_next_sibling_element_or_stay)
(advance_to_next_sibling_element): Remove definitions of these functions.
* src/abg-reader.cc (read_translation_unit_from_input)
(read_elf_needed_from_input, read_corpus_group_from_input): Use xmlNextElementSibling instead
of xml::advance_to_next_sibling_element.
(read_corpus_from_input): Likewise. Also, use
xmlFirstElementChild instead of
xml::advance_to_next_sibling_element on the xmlNode::children data
member.
(read_corpus_group_from_input): use xmlFirstElementChild instead
of xml::advance_to_next_sibling_element on the xmlNode::children
data member.
reader: Handle 'abi-corpus' element being possibly empty
This problem was reported at https://sourceware.org/bugzilla/show_bug.cgi?id=27616.
The abixml reader wrongly assumes that the 'abi-corpus' element is
always non-empty. Note that until now, the only emitter of abixml
consumed in practice was abg-writer.cc and it only emits non-empty
'abi-corpus' elements. So the issue wasn't exposed.
So, the reader assumes that an 'abi-corpus' element has at least a
text node.
For instance, consider this minimal input file named test-v0.abi:
if (!node->children) // <---- we get out early here and we
return nil; // forget about the properties of
// the current empty corpus element node
So, at its core, fixing the issue at hand involves avoiding the early
return there.
But then, it turns out that's not enough.
In the current setting, the different abixml processing entry points
are designed to be used in a semi "streaming" mode.
So for instance, read_translation_unit_from_input can be invoked
repeatedly to "stream in" the next translation unit at each
invocation.
Alternatively, the lower level xmlTextReaderNext can be used to
iterate over XML node until we reach the translation unit XML element
we are interested in. At that point xmlTextReaderExpand can be used
to expand the XML node, then we let the context know that this is
the current node of the corpus that needs to be processed, using
read_context::get_corpus_node. Once we've done that,
read_translation_unit_from_input can be called to process that
particular corpus node. Note that the corpus node at hand, that needs
to be processed will be retrieved by read_context::get_corpus_node.
These two modes of operation are also available for
read_corpus_from_input, read_symbol_db_from_input,
read_elf_needed_from_input etc.
Today, these functions all assume that the current node returned by
read_context::get_corpus_node is the node /before/ the node of the
corpus to be processed. So they all start looking at the /next sibling/
of the node returned by read_context::get_corpus_node. So the code
was implicitly assuming that read_context::get_corpus_node was
pointing to a text node that was before the node of the corpus that we
want to process.
This is wrong. read_context::get_corpus_node should just return the
current node of the corpus that needs to be processed and voila.
And so read_context::set_corpus_node should be used to set the current
node of the corpus to the current element node that needs to be processed.
That's the spirit of the change done by this patch.
As its name suggests, the existing
xml::advance_to_next_sibling_element is used to skip non element xml
nodes (including text nodes) and move to the next element node to
process, which is set to the context using
read_context::set_corpus_node.
Then the actual processing functions like read_corpus_from_input get
the node to process, using read_context::get_corpus_node and process
it rather than processing the sibling node that comes after it.
The other changes are either to prevent related crashes that I noticed
while doing various tests, update the abilint tool used to read and
debug abixml input files and add better documentation.
* src/abg-reader.cc (read_context::get_corpus_node): Add comment
to this member function.
(read_translation_unit_from_input, read_symbol_db_from_input)
(read_elf_needed_from_input): Start processing the current node of
the corpus that needs to be processed rather than its next
sibling. Once the processing is done, set the new "current node
of the corpus to be processed" properly by skipping to the next
element node to be processed.
(read_corpus_from_input): Don't get out early when the
'abi-corpus' element is empty. If, however, it has children node,
skip to the first child element and flag it -- using
read_context::set_corpus_node -- as being the element node to be
processed by the processing facilities of the reader. If we are
in a mode where we called xmlTextReaderExpand ourselves to get the
node to process, then it means we need to free that node
indirectly by calling xmlTextReaderNext. In that case, that node
should not be flagged by read_context::set_corpus_node. Add more
comments.
* src/abg-corpus.cc (corpus::is_empty): Do not crash when no
symtab is around.
* src/abg-libxml-utils.cc (go_to_next_sibling_element_or_stay):
Fix typo in comment.
(advance_to_next_sibling_element): Don't crash when given a nil
node.
* tests/data/test-abidiff/test-PR27616-squished-v0.abi: Add new
test input.
* tests/data/test-abidiff/test-PR27616-squished-v1.abi: Likewise.
* tests/data/test-abidiff/test-PR27616-v0.xml: Likewise.
* tests/data/test-abidiff/test-PR27616-v1.xml: Likewise.
* tests/data/Makefile.am: Add the new test inputs above to source
distribution.
* tests/test-abidiff.cc (specs): Add the new tests inputs above to
this harness.
* tools/abilint.cc (main): Support writing corpus groups.
dwarf-reader: properly set artificial-ness in opaque types
get_opaque_version_of_type forgets to set the "is-artificial" property
according to the initial type the opaque type is derived from. This
can lead to some instability in the abixml output.
Fixed thus.
* src/abg-dwarf-reader.cc (get_opaque_version_of_type): Propagate
the artificial-ness of the original type here.
* tests/data/test-read-dwarf/PR27700/test-PR27700.abi: Adjust.
dwarf-reader: Canonicalize opaque enums and classes
This issue was reported in bug https://sourceware.org/bugzilla/show_bug.cgi?id=27700.
When we construct an opaque type (triggered by the use of
--drop-private-types along with the --headers-dir option on abidw, for
instance) with get_opaque_version_of_type we forget to canonicalize
the resulting type.
Later, at abixml emitting time (for instance)
hash_as_canonical_type_or_constant would rightfully abort because the
type wasn't canonicalized. We want all types (okay, modulo one
exception) in the system to be canonicalized.
This patch fixes the problem by canonicalizing opaque types.
* src/abg-dwarf-reader.cc (build_ir_node_from_die): Canonicalize
opaque enums and classes.
* tests/data/test-read-dwarf/PR27700/include-dir/priv.h: New test
header file.
* tests/data/test-read-dwarf/PR27700/include-dir/pub.h: Likewise
* tests/data/test-read-dwarf/PR27700/pub-incdir/inc.h: Likewise.
* tests/data/test-read-dwarf/PR27700/test-PR27700.o: New binary
input file.
* tests/data/test-read-dwarf/PR27700/test-PR27700.abi: Reference
abi file of the binary above.
* tests/data/test-read-dwarf/PR27700/test-PR27700.c: Source file
of the binary above.
* tests/data/Makefile.am: Add the test material above to source
distribution.
* tests/test-read-dwarf.cc (InOutSpec::in_public_headers_path):
Add new data member.
(in_out_specs): Adjust to reflect the new data member in the
InOutSpec type. Add a new test input.
(set_suppressions_from_headers): Define new static function.
(test_task::perform): Use the content of the new
InOutSpec::in_public_headers_path to construct and add
"--headers-dir <headers-dir> --drop-private-types" to the options
of the abidw program run.
symtab: Add support for MODVERSIONS (CRC checksums)
The Linux Kernel has a mechanism (MODVERSIONS) to checksum symbols based
on their type. In a way similar to what libabigail does, but different.
The CRC values for symbols can be extracted from the symtab either by
following the __kcrctab_<symbol> entry or by using the __crc_<symbol>
value directly.
This patch adds support for extracting those CRC values and storing them
as a property of elf_symbol. Subsequently, 'crc' gets emitted as an
attribute of 'elf-symbol' in the XML representation.
CRC comparisons are also added to the abidiff machinery such that if
both representations of a comparison contain a CRC value, they will be
compared and if any of the values is unset (i.e. == 0), equality is
assumed. Differences will be reported in the format that the Kernel
presents them e.g. via Module.symvers. It is likely, but not necessary,
that a CRC difference comes along with an ABI difference reported by
libabigail. Not everything that leads to a change of the CRC value an
ABI breakage in the libabigail sense.
In case a function or variable symbol changes in a harmless way, we
would previously also consider a CRC change harmless (or more precise:
not harmful). Explicitly testing for CRC changes when analyzing whether
diff nodes need to be considered harmful, allows to still classify them
harmful.
Also add some test cases to ensure reading CRC values from kernel
binaries works as expected. The empty-report files have been
consolidated to one file: empty-report.txt. That also clarifies the
expected outcome for the affected tests.
* include/abg-ir.h (elf_symbol::elf_symbol): Add CRC parameter.
(elf_symbol::create): Likewise.
(elf_symbol::get_crc): New member method.
(elf_symbol::set_crc): New member method.
* src/abg-comp-filter.cc (crc_changed): New function.
(categorize_harmful_diff_node): Also test for CRC changes.
* src/abg-ir.cc (elf_symbol::priv::crc_): New data member.
* src/abg-ir.cc (elf_symbol::priv::priv): Add CRC parameter.
(elf_symbol::elf_symbol): Likewise.
(elf_symbol::create): Likewise.
(elf_symbol::textually_equals): Add CRC support.
(elf_symbol::get_crc): New member method.
(elf_symbol::set_crc): New member method.
* src/abg-reader.cc (build_elf_symbol): Add CRC support.
* src/abg-reporter-priv.cc (maybe_report_diff_for_symbol): Likewise.
* src/abg-symtab-reader.cc (symtab::load): Likewise.
* src/abg-writer.cc (write_elf_symbol): Likewise.
* tests/data/Makefile.am: Add new test data files.
* tests/data/test-abidiff-exit/test-crc-report.txt: New test file.
* tests/data/test-abidiff-exit/test-crc-v0.abi: Likewise.
* tests/data/test-abidiff-exit/test-crc-v1.abi: Likewise.
* tests/data/test-abidiff/empty-report.txt: New file.
* tests/data/test-abidiff/test-PR18166-libtirpc.so.report.txt: Deleted.
* tests/data/test-abidiff/test-PR24552-report0.txt: Deleted.
* tests/data/test-abidiff/test-crc-0.xml: New test file.
* tests/data/test-abidiff/test-crc-1.xml: Likewise.
* tests/data/test-abidiff/test-crc-2.xml: Likewise.
* tests/data/test-abidiff/test-crc-report.txt: Likewise.
* tests/data/test-abidiff/test-empty-corpus-report.txt: Deleted.
* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Add CRC values.
* tests/data/test-read-write/test-crc.xml: New test data file.
* tests/data/test-symtab/kernel-modversions/Makefile: New test source.
* tests/data/test-symtab/kernel-modversions/one_of_each.c: Likewise.
* tests/data/test-symtab/kernel-modversions/one_of_each.ko: Likewise.
* tests/test-abidiff-exit.cc: Add new test case.
* tests/test-abidiff.cc: Add new test case.
* tests/test-read-write.cc: Likewise.
* tests/test-symtab.cc (Symtab::KernelSymtabsWithCRC): New test case.
The function write_elf_symbol_reference iterates through aliases,
looking for an unsuppressed alias to use. The existing code went wrong
in the case when aliases are present. In the case of all symbols
suppressed, it would also have selected the last alias, rather than
the first, if the data structure invariants had matched the code's
expectations.
The main symbol is always distinguished. When aliases are absent, the
sole symbol's next pointer is null, but when aliases are present, they
form a circular list. This makes traversal of aliases a bit tricky.
The code now picks the first symbol from the following:
- the given symbol, if unsuppressed
- the main symbol, if unsuppressed
- the unsuppressed aliases in the remainder of the alias chain
- the main symbol (suppressed)
The given symbol, which need not be the same as the main symbol, will
be tested twice, if suppressed, but addressing this would make the
code even more elaborate and fragile.
The last case may be unreachable if symbol suppression triggers when
all aliases are suppressed.
I left this change stand-alone for easier review and to credit Giuliano for his
work on it, though it fixes a previous commit.
* src/abg-writer.cc (write_elf_symbol_reference): Check main
symbol and aliases with more care.
dwarf-reader/writer: consider aliases when dealing with suppressions
When the symbol of a decl is suppressed and it happens to be the main
symbol of a group of aliased symbols where another symbol is not
suppressed, the dwarf reader discards the decl from the internal
representation altogether upon reading and thus the writer will not be
able to connect that decl to the non-suppressed aliased elf symbol.
In order to address this, ensure we are not suppressing decls for which
an alias is not suppressed. We therefore keep the decl in the IR when at
least one its underlying aliased symbols is non-suppressed.
Likewise, when the abg-writer is having to attach an elf-symbol-id to
the decl, instead of omitting the symbol altogether, rather make use of
the property of aliases and connect the dwarf information to an alias
instead. This way the function dwarf information stays connected to the
elf symbol that we want to track.
When reading from XML with a symbol whitelist that leads to suppression
of aliased symbols, abidiff would hit an assertion and crash when
looking up the aliased symbol due to it being suppressed. In the new
symtab reader we can still suppress a symbol without removing it from
the lookup. Make use of that property to fix this bug.
A test has been added for this as well.
* src/abg-dwarf-reader.cc(function_is_suppressed): Do not suppress
a function for which there is an alias that is not suppressed.
(variable_is_suppressed): Likewise for variables.
* src/abg-reader.cc (build_elf_symbol): Improve handling of
suppressed aliased symbols when reading from XML.
* src/abg-symtab-reader.cc (load): Likewise.
* src/abg-writer.cc(write_elf_symbol_reference): Fall back to
any aliased symbol if the main symbol is suppressed.
* tests/data/Makefile.am: Add new test files.
* tests/data/test-abidiff-exit/test-missing-alias-report.txt: New test file.
* tests/data/test-abidiff-exit/test-missing-alias.abi: Likewise.
* tests/data/test-abidiff-exit/test-missing-alias.suppr: Likewise.
* tests/test-abidiff-exit.cc: Add support for whitelists and add
new testcase.
* tests/data/test-read-dwarf/test-suppressed-alias.c: New test file.
* tests/data/test-read-dwarf/test-suppressed-alias.o: Likewise.
* tests/data/test-read-dwarf/test-suppressed-alias.o.abi: Likewise.
* tests/data/test-read-dwarf/test-suppressed-alias.suppr: Likewise.
* tests/data/test-read-dwarf/test3-alias-1.so.hash.abi: Likewise.
* tests/data/test-read-dwarf/test3-alias-1.suppr: Likewise.
* tests/data/test-read-dwarf/test3-alias-2.so.hash.abi: Likewise.
* tests/data/test-read-dwarf/test3-alias-2.suppr: Likewise.
* tests/data/test-read-dwarf/test3-alias-3.so.hash.abi: Likewise.
* tests/data/test-read-dwarf/test3-alias-3.suppr: Likewise.
* tests/data/test-read-dwarf/test3-alias-4.so.hash.abi: Likewise.
* tests/data/test-read-dwarf/test3-alias-4.suppr: Likewise.
* tests/test-read-dwarf.cc: Add new test cases.
symtab/dwarf-reader: allow hinting of main symbols for aliases
In case of aliased symbols, the "main symbol" cannot be deduced from
the symtab as this solely contains a name->addr mapping and aliases
are represented by multiple names resolving to the same address.
Therefore the main symbol can only be picked rather randomly and
unpredictable.
Unlike DWARF, which contains a single symbol entry for only the main
symbol. Hence we can (late) detect the main symbol. Exploiting that
property allows to correct the addr->symbol lookup in the symtab to
return the correct main symbol and it also allows to correct the aliased
symbols to maintain the correct main symbol.
This patch adds the `update_main_symbol` functionality to `elf_symbol`
to update the main symbol by name (ELF symbols need unique names) and
adds `update_main_symbol` to `symtab` that makes use of said new method.
When we discover a main symbol during DWARF reading, we instruct the
symtab to update the mapping.
This creates consistent representations across different builds of the
same binary with the same ABI by pinning down the main symbol to the
defined one. Knowing the main symbol also helps to keep the correct
dwarf information in the representation in the presence of symbol
suppressions. A later patch will address that.
Some test cases in tests/data need adjustment and they have all been
verified to be valid changes.
- main symbol changed for various elf symbols
- test-annotate/test15-pr18892.so.abi
- test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi
- test-annotate/test3.so.abi
- test-read-dwarf/test15-pr18892.so.abi
- test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi
- test-read-dwarf/test3.so.abi
- test-read-dwarf/test3.so.hash.abi
- due to main symbol changes, the symbol diff needs to be corrected
- test-diff-dwarf/test12-report.txt
- test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt
- test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt
- the test scenario needed adjustments as the main symbol changed
- test-diff-suppr/test23-alias-filter-4.suppr
- test-diff-suppr/test23-alias-filter-report-0.txt
- test-diff-suppr/test23-alias-filter-report-2.txt
As usual, the complete changelog follows.
* include/abg-ir.h (elf_symbol::update_main_symbol): New method.
* include/abg-symtab-reader.h (symtab::update_main_symbol): New method.
* src/abg-dwarf-reader.cc
(build_var_decl): Hint symtab about main symbol discovered in DWARF.
(build_function_decl): Likewise.
* src/abg-ir.cc (elf_symbol::get_main_symbol): Lock the weak_ptr
on access in both overloads.
(update_main_symbol): New method to allow updating the main symbol.
* src/abg-symtab-reader.cc (symtab::update_main_symbol): New method.
* data/Makefile.am: Add new test data files.
* tests/data/test-annotate/test15-pr18892.so.abi: Updated test file.
* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise.
* tests/data/test-annotate/test2.so.abi: Likewise.
* tests/data/test-annotate/test3.so.abi: Likewise.
* tests/data/test-diff-dwarf/test12-report.txt: Likewise.
* tests/data/test-diff-dwarf/test42-PR21296-clanggcc-report0.txt: Likewise.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt: Likewise.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt: Likewise.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt: Likewise.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt: Likewise.
* tests/data/test-diff-suppr/test23-alias-filter-4.suppr: Likewise.
* tests/data/test-diff-suppr/test23-alias-filter-report-0.txt: Likewise.
* tests/data/test-diff-suppr/test23-alias-filter-report-2.txt: Likewise.
* tests/data/test-read-dwarf/PR22015-libboost_iostreams.so.abi: Likewise.
* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
* tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi: Likewise.
* tests/data/test-read-dwarf/test11-pr18828.so.abi: Likewise.
* tests/data/test-read-dwarf/test12-pr18844.so.abi: Likewise.
* tests/data/test-read-dwarf/test15-pr18892.so.abi: Likewise.
* tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise.
* tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise.
* tests/data/test-read-dwarf/test2.so.abi: Likewise.
* tests/data/test-read-dwarf/test2.so.hash.abi: Likewise.
* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi: Likewise.
* tests/data/test-read-dwarf/test3.so.abi: Likewise.
* tests/data/test-read-dwarf/test3.so.hash.abi: Likewise.
* tests/data/test-symtab/basic/aliases.c: New test source file.
* tests/data/test-symtab/basic/aliases.so: Likewise.
* tests/test-symtab.cc (Symtab::AliasedFunctionSymbols): New test case.
(Symtab::AliasedVariableSymbols): Likewise.
Extend the test functionality in test-symtab to allow processing of KMI
whitelists and add additional test cases for whitelist handling.
* tests/data/Makefile.am: add new test files
* tests/data/test-symtab/basic/one_function_one_variable_all.whitelist: New test file,
* tests/data/test-symtab/basic/one_function_one_variable_function.whitelist: Likewise.
* tests/data/test-symtab/basic/one_function_one_variable_irrelevant.whitelist: Likewise.
* tests/data/test-symtab/basic/one_function_one_variable_variable.whitelist: Likewise.
* tests/test-symtab.cc (read_corpus): Add support for whitelists.
(assert_symbol_count): Likewise.
(Symtab::SymtabWithWhitelist): New testcase.
dwarf reader: drop (now) unused code related to symbol table reading
The introduction of the new symtab reader incorporated much of the
existing functionality. Now that the most code parts are migrated to the
new symtab reader, we can safely remove the old code paths.
Ignoring the symbol table is not a thing anymore. The new symtab reader
does read the symtab unconditionally for consistency reasons. Hence also
remove all functionality around conditional symtab reading.
With the prework in previous commits, we are now able to drop the
public symbols maps in corpus::priv and replace them by private members
with access through getters. The getters use the new symtab
implementation to generate the maps on the fly. Setters are not required
anymore and are removed. Also remove redundant getters.
We could also remove the getters for the symbol maps and the local
caching variable and leave it all to lookup_symbol, but this is left for
a later change.
* include/abg-corpus.h (corpus::set_fun_symbol_map): Remove
method declaration.
(corpus::set_undefined_fun_symbol_map): Likewise.
(corpus::set_var_symbol_map): Likewise.
(corpus::set_undefined_var_symbol_map): Likewise.
(corpus::get_fun_symbol_map_sptr): Likewise.
(corpus::get_undefined_fun_symbol_map_sptr): Likewise.
(corpus::get_var_symbol_map_sptr): Likewise.
(corpus::get_undefined_var_symbol_map_sptr): Likewise.
* src/abg-corpus-priv.h (corpus::priv::var_symbol_map): make
private and mutable
(corpus::priv::undefined_var_symbol_map): Likewise.
(corpus::priv::fun_symbol_map): Likewise.
(corpus::priv::undefined_fun_symbol_map): Likewise.
(corpus::priv::get_fun_symbol_map): New method declaration.
(corpus::priv::get_undefined_fun_symbol_map): Likewise.
(corpus::priv::get_var_symbol_map): Likewise.
(corpus::priv::get_undefined_var_symbol_map): Likewise.
* src/abg-corpus.cc (corpus::priv::get_fun_symbol_map): New
method implementation.
(corpus::priv::get_undefined_fun_symbol_map): Likewise.
(corpus::priv::get_var_symbol_map): Likewise.
(corpus::priv::get_undefined_var_symbol_map): Likewise.
(corpus::is_empty): depend on symtab only.
(corpus::set_fun_symbol_map): Remove method.
(corpus::set_undefined_fun_symbol_map): Likewise.
(corpus::set_var_symbol_map): Likewise.
(corpus::set_undefined_var_symbol_map): Likewise.
(corpus::get_fun_symbol_map_sptr): Likewise.
(corpus::get_undefined_fun_symbol_map_sptr): Likewise.
(corpus::get_var_symbol_map_sptr): Likewise.
(corpus::get_undefined_var_symbol_map_sptr): Likewise.
(corpus::get_fun_symbol_map): Use corpus::priv proxy method.
(corpus::get_undefined_fun_symbol_map): Likewise.
(corpus::get_var_symbol_map): Likewise.
(corpus::get_undefined_var_symbol_map): Likewise.
* src/abg-dwarf-reader.cc (read_debug_info_into_corpus): Do not
set corpus symbol maps anymore.
* src/abg-reader.cc (read_corpus_from_input): Likewise.
* tests/test-symtab.cc (assert_symbol_count): Do not access the
corpus symbol maps through sptr anymore.
* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Adjust
expected test output.
symtab_reader: add support for ppc64 ELFv1 binaries
When loading the symtab from an ppc64 binary, also keep track of the
function entry addresses as a key for the symbol lookup. That
accommodates the differences in DWARF pointing to the function entry
address while the symbol table points to the function pointer.
The implementation is mostly copied and adopted from abg-dwarf-reader's
read_context to add this functionality also to the new symtab reader.
* src/abg-symtab-reader.cc (symtab::lookup_symbol): fall back to
lookup the address in entry_addr_symbol_map_.
(symtab::load): update the function entry address map for
ppc64 targets.
(symtab::update_function_entry_address_symbol_map): New
function implementation.
* src/abg-symtab-reader.h
(symtab::entry_addr_symbol_map_): New data member.
(symtab::update_function_entry_address_symbol_map): New
function declaration.
This migrates more helpers to abg-elf-helpers:
lookup_ppc64_elf_fn_entry_point_address with dependencies
read_uint64_from_array_of_bytes
read_int_from_array_of_bytes
address_is_in_opd_section with dependency
address_is_in_section
read_context::find_opd_section and read_context::opd_section_ are obsolete.
Switch kernel stuff over to new symtab and drop unused code
Now that the new symtab implementation is capable of reading the
ksymtab, we can switch over the implementation to gather information
from there and delete all now-obsolete code.
dwarf-reader: read_context: use new symtab in *_symbols_is_exported
Testing whether a symbol is exported can be simplified using the new
symtab implementation. The same holds true for whether a symbol is
exported via ksymtab in case of linux kernel binaries. So, do that.
* src/abg-dwarf-reader.cc (function_symbol_is_exported): Use new
symtab implementation.
(variable_symbol_is_exported): Likewise.
abg-reader: avoid using the (var|function)_symbol_map
Instead of using the corpus var|function_symbol_maps for symbol lookups,
let build_elf_symbol_from_reference use the symtab::lookup_symbol
method. That leads to a shorter implementation and we can drop the
indicative parameter.
* src/abg-reader.cc (build_elf_symbol_from_reference): drop
last parameter indicating the lookup type and use corpus
symtab for the lookup
(build_function_decl): Adjust accordingly.
(build_var_decl): Likewise.
corpus: make get_unreferenced_(function|variable)_symbols use the new symtab
Make the corresponding members an implementation detail of corpus::priv.
They get computed based on the new symtab whenever they are accessed
first with an atomic instantiation. That simplifies the implementation
and homogenizes the access to functions and variables. Sorting does not
need to be done as the symtab already gives a guarantee for that.
* src/abg-corpus-priv.h (corpus::priv::unrefed_var_symbols): make
private, mutable and optional.
(corpus::unrefed_fun_symbols): Likewise.
(corpus::priv::get_unreferenced_function_symbols): New method declaration.
(corpus::priv::get_unreferenced_variable_symbols): Likewise.
* src/abg-corpus.cc
(corpus::priv::build_unreferenced_symbols_tables): Delete method.
(corpus::priv::get_unreferenced_function_symbols): New method implementation.
(corpus::priv::get_unreferenced_variable_symbols): Likewise.
(corpus::get_unreferenced_function_symbols): Proxy call to corpus::priv.
(corpus::get_unreferenced_variable_symbols): Likewise.
corpus: make get_(undefined_)?_(var|fun)_symbols use the new symtab
Make the corresponding members an implementation detail of corpus::priv.
They get computed based on the new symtab whenever they are accessed
first with an atomic instantiation. That simplifies the implementation
and homogenizes the access to functions and variables. Sorting does not
need to be done as the symtab already gives a guarantee for that.
Due to improved alias detection in the new symtab reader, ensure we only
write symbol aliases to ksymtab symbols if having a ksymtab main symbol.
Test data needed to be adjusted as the new symtab reader is stricter in
regards to symbols listed in ksymtab. I.e. init_module is not an
exported symbol in the ksymtab of a kernel module.
* src/abg-corpus-priv.h (corpus::priv::sorted_var_symbols): make
private, mutable and optional.
(corpus::sorted_undefined_var_symbols): Likewise.
(corpus::sorted_fun_symbols): Likewise.
(corpus::sorted_undefined_fun_symbols): Likewise.
(corpus::priv::get_sorted_fun_symbols): New method declaration.
(corpus::priv::get_sorted_undefined_fun_symbols): Likewise.
(corpus::priv::get_sorted_var_symbols): Likewise.
(corpus::priv::get_sorted_undefined_var_symbols): Likewise.
* src/abg-corpus.cc
(corpus::elf_symbol_comp_functor): Delete struct.
(corpus::priv::get_sorted_fun_symbols): New method implementation.
(corpus::priv::get_sorted_undefined_fun_symbols): Likewise.
(corpus::priv::get_sorted_var_symbols): Likewise.
(corpus::priv::get_sorted_undefined_var_symbols): Likewise.
(corpus::get_sorted_fun_symbols): Proxy call to corpus::priv.
(corpus::get_sorted_undefined_fun_symbols): Likewise.
(corpus::get_sorted_var_symbols): Likewise.
(corpus::get_sorted_undefined_var_symbols): Likewise.
* src/abg-writer.cc (write_elf_symbol_aliases): When emitting
aliases for a kernel symbol, ensure to only emit exported,
public aliases.
* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: update test
data.
Integrate new symtab reader into corpus and read_context
While reading the corpus in the read_context, also load the new type
symtab object side-by-side and set it accordingly in the resulting
corpus. This is still side by side and passive code that gets active in
the following changes. This is applicable for the dwarf reader as well
as for the reader that consumes XML.
* include/abg-corpus.h (corpus::set_symtab): New method declaration.
(corpus::get_symtab): New method declaration.
* include/abg-fwd.h (symtab_reader::symtab_sptr): New forward
declaration.
* src/abg-corpus-priv.h (corpus::priv::symtab_): New data member.
* src/abg-corpus.cc
(corpus::set_symtab): Likewise.
(corpus::get_symtab): Likewise.
* src/abg-dwarf-reader.cc (read_context::symtab_): New data member.
(read_context::initialize): reset symtab_ as well
(read_context::symtab): new method that loads a symtab on
first access and returns it.
(read_debug_info_into_corpus): also set the new symtab object
on the current corpus.
(read_corpus_from_elf): Also determine (i.e. load) the new
symtab object and contribute to the load status.
* src/abg-reader.cc (read_corpus_from_input): also set the new
type symtab when reading from xml.
* tests/test-symtab.cc: Add test assertions.
Refactor ELF symbol table reading by adding a new symtab reader
Based on existing functionality, implement the reading of ELF symbol
tables as a separate component. This reduces the complexity of
abg-dwarf-reader's read_context by separating and delegating the
functionality. This also allows dedicated testing.
The new namespace symtab_reader contains a couple of new components that
work loosely coupled together. Together they allow for a consistent view
on a symbol table. With filter criteria those views can be restricted,
iterated and consistent lookup maps can be built on top of them. While
this implementation tries to address some shortcomings of the previous
model, it still provides the high level interfaces to the symbol table
contents through sorted iterating and name/address mapped access.
symtab_reader::symtab
While the other classes in the same namespace are merely helpers, this
is the main implementation of symtab reading and storage.
Symtab objects are factory created to ensure a consistent construction
and valid invariants. Thus a symtab will be loaded by either passing
an ELF handle (when reading from binary) or by passing a set of
function/variable symbol maps (when reading from XML).
When constructed they are considered const and are not writable
anymore. As such, all public methods are const.
The load reuses the existing implementation for loading symtab
sections, but since the new implementation does not distinguish
between functions and variables, the code could be simplified. The
support for ppc64 function entry addresses has been deferred to a
later commit.
Linux Kernel symbol tables are now directly loaded by name when
encountering symbols prefixed with the __ksymtab_ as per convention.
This has been tricky in the past due to various different binary
layouts (relocations, position relative relocations, symbol
namespaces, CFI indirections, differences between vmlinux and kernel
modules). Thus the new implementation is much simpler and is less
vulnerable to future ksymtab changes. As we are also not looking up
the Kernel symbols by addresses, we could resolve shortcomings with
symbol aliasing: Previously a symbol and its alias were
indistinguishable as they are having the same symbol address. We could
not identify the one that is actually exported via ksymtab.
One major architectural difference of this implementation is that we
do not early discard suppressed symbols. While we keep them out of the
vector of exported symbols, we still make them available for lookup.
That helps addressing issues when looking up a symbol by address (e.g.
from the ksymtab read implementation) that is suppressed. That would
fail in the existing implementation.
Still, we intend to only instantiate each symbol once and pass around
shared_ptr instances to refer to it from the vector as well as from
the lookup maps.
For reading, there are two access paths that serve the existing
patterns:
1) lookup_symbol: either via a name or an address
2) filtered iteration with begin(), end()
The former is used for direct access with a clue in hand (like a name
or an address), the latter is used for iteration (e.g. when emitting
the XML).
symtab_reader::symtab_iterator
The symtab_iterator is an STL compatible iterator that is returned
from begin() and end() of the symtab. It allows usual forward iterator
operations and can optionally take a filter predicate to skip non
matching elements.
symtab_reader::symtab_filter
The symtab_filter serves as a predicate for the symtab_iterator by
providing a matches(const elf_symbol_sptr&) function. The predicate
is built by ANDing together several conditions on attributes a symbol
can have. The filter conditions are implemented in terms of
std::optional<bool> members to allow a tristate: "needs to have the
condition set", "must not have it set" and "don't care".
symtab_reader::filtered_symtab
The filtered_symtab is a convenience zero cost abstraction that allows
prepopulating the symtab_filter (call it a capture) such that begin()
and end() are now accessible without the need to pass the filter
again. Argumentless begin() and end() are a requirement for range-for
loops and other STL based algorithms.
Dodji Seketeli [Wed, 31 Mar 2021 12:56:52 +0000 (14:56 +0200)]
Bug 27598 - abidiff mishandles union member functions
abidiff segfaults when a union member function is involved in the
comparison. This patch fixes that.
* src/abg-default-reporter.cc (default_reporter::report): Assume
the parent type of the method can be either a class or a union.
* tests/data/test-diff-filter/test-PR27598-report-0.txt: New
reference output for the test.
* tests/data/test-diff-filter/test-PR27598-v{0,1}.cc: New source
code for the input binaries below.
* tests/data/test-diff-filter/test-PR27598-v{0,1}.o: New input
test binaries.
* tests/data/Makefile.am: Add the new test material to source
distribution.
* tests/test-diff-filter.cc (in_out_specs): Add the test inputs
above to this test harness.
Dodji Seketeli [Thu, 25 Mar 2021 10:45:57 +0000 (11:45 +0100)]
Bug 27569 - abidiff misses a function parameter addition
Adding or deleting function parameters are changes that have not yet
been categorized in the comparison engine. As a result, those changes
can be considered harmless and thus be filtered out. Oops.
This patch categorizes function addition and removal as
FN_PARM_ADD_REMOVE_CHANGE_CATEGORY, which is a new category being
introduced. Changes in this category are considered harmful and are
thus always reported by default.
* include/abg-comparison.h (enum diff_category): Add a new
FN_PARM_ADD_REMOVE_CHANGE_CATEGORY enumerator and adjust the
following enumerator values. Update the EVERYTHING_CATEGORY
accordingly.
(function_type_diff::{sorted_deleted_parms, sorted_added_parms}):
Add new member functions.
* src/abg-comparison.cc
(function_type_diff::{sorted_deleted_parms, sorted_added_parms}):
Define new accessors.
(get_default_harmful_categories_bitmap):
Consider changes of the category
FN_PARM_ADD_REMOVE_CHANGE_CATEGORY as harmful.
(operator<<(ostream& o, diff_category c)): Support the new
FN_PARM_ADD_REMOVE_CHANGE_CATEGORY while serializing a category
bitmap.
* src/abg-comp-filter.cc
(has_added_or_removed_function_parameters): Define new static
function.
(categorize_harmful_diff_node): Categorize diff nodes representing
function parameter addition or removal as
FN_PARM_ADD_REMOVE_CHANGE_CATEGORY.
* tests/data/test-diff-filter/test-PR27569-report-0.txt: New test
reference output.
* tests/data/test-diff-filter/test-PR27569-v{0,1}.abi: New test inputs.
* tests/data/Makefile.am: Add the new test inputs to the source
distribution.
* tests/test-diff-filter.cc (in_out_specs): Add the new test
inputs to this test harness.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt:
Adjust.
* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-1.txt:
Likewise.
* tests/data/test-diff-pkg/libsigc++-2.0-0c2a_2.4.0-1_amd64--libsigc++-2.0-0v5_2.4.1-1ubuntu2_amd64-report-0.txt:
Likewise.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
Likewise.
* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt:
Likewise.