Bug 29464 - abidw performance regression on vmlinux
Summary: abidw performance regression on vmlinux
Alias: None
Product: libabigail
Classification: Unclassified
Component: default (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: dodji
Depends on:
Reported: 2022-08-09 17:57 UTC by gprocida
Modified: 2022-09-19 10:19 UTC (History)
1 user (show)

See Also:
Last reconfirmed: 2022-08-22 00:00:00


Note You need to log in before you can comment on or make changes to this bug.
Description gprocida 2022-08-09 17:57:50 UTC
Hi Dodji.

I've been testing changes since our last upstream merge and have found a performance regression.

A typical standalone "abidw vmlinux" normally takes around 20s on an AOSP kernel. Following the merge, run times are around 1.5h. So that's around 270 times slower.

Using almost vanilla upstream libabigail (I had to disable one ELF symbol assertion for unrelated reasons), I was able to bisect the regression to:

7ecef6361799326b99129a479b43b138f0b237ae "Canonicalize DIEs w/o assuming ODR & handle typedefs transparently"

This commit does a few things, but most significantly removes an optimisation that assumed the ODR.

The change in behaviour was consistent across 6 different test kernels and a couple of other kernels I had lying around from different test cases. However, in one case, abidw ran quickly as normal.

It's very likely that one of the previous kernels you've had from me will trigger the problem. I will also see if I can find "off the shelf" binary that you can debug with.

Comment 1 gprocida 2022-08-10 15:05:02 UTC
The performance regression is reproducible with this test case:

Comment 2 dodji 2022-09-01 09:03:53 UTC
(In reply to gprocida from comment #1)
> The performance regression is reproducible with this test case:
> https://github.com/myxoid/libabigail/raw/
> 51fd7ce8d9b896bf4736bb638266bda5c761ee8a/vmlinux.lzma

Thank you Giuliano, this was very useful.  I could reproduce the issue, indeed.

The issue comes from the fixing of the earlier canonical propagation during the computing of canonical type DIEs, when canonicalizing type DIEs.

Before 7ecef6361799326b99129a479b43b138f0b237ae, we were propagating canonical DIEs "too much".  Things were fast.  But wrong.  That is why I tried to fix the issue.  But then, it appears that with that fix, we are not propagating canonical DIEs "enough", so things are super slow.

I have a tentative approach to this in the branch users/dodji/libabigail-perf-regr, accessible at https://sourceware.org/git/?p=libabigail.git;a=shortlog;h=refs/heads/users/dodji/libabigail-perf-regr.

It should bring things back to an acceptable performance, albeit slower to what it was previously (probably in the 35s range for you).

I still need to comment and split the content into proper patches but you can already test it to see if things work for you with that branch.

In the mean time, I am moving forward with the commenting & splitting.
Comment 3 gprocida 2022-09-06 16:00:25 UTC
Hi Dodji.

I have done some tests by merging dodji-perf (e6f222c8) into AOSP and running our tests. In more detail:

I noticed that there was a merge conflict of "dwarf-reader: Revamp the canonical type DIE propagation algorithm" with my hack that limited comparisons per DIE pair to 10000. Given it seemed that the changes were working in the same area, I reverted my hack and remerged.

I ran into trouble as your development branch is not known to AOSP, so I actually ended up with: revert, merge, cherry-pick x 2.

I need to retest without any AOSP things. However, here is some initial feedback:

1. Testing (unintentionally) at the second last cherry-pick ("dwarf-reader: Revamp the canonical type DIE propagation algorithm") showed some test timeouts on our standard kernel tests.

This means I cannot push that series to AOSP. I will consider reinstating my hack and trying again, after other tests.

The test suite run also shows other issues:
* name changes like "long long unsigned" to "unsigned long long int"
* linearly probed - and so unstable - hash-based type ids for types like "unsigned int" and "unsigned short int" (the strings that get hashed are the same for the different types)

The last of these issues will also need to be resolved - we may decide to do it in post-processing, but I imagine it's only a small change in libabigail.

2. Testing at the last cherry-pick ("Allow restricting analyzed decls to exported symbols") also had test timeouts. However, more seriously, the run time (for a 4.19 kernel) is much worse than in 1.

I haven't yet done any ABI comparisons using this abidw's output as it still running at over 30 minutes.
Comment 4 gprocida 2022-09-06 16:10:50 UTC
Good news (for you, at least). A vanilla build of libabigail at e6f222c8 has abidw completing in about 1 minute on the same test case.

Clearly more for us to do at our end!

I really hope it's not another Clang vs GCC issue.
Comment 5 gprocida 2022-09-06 16:37:56 UTC
A little more on the vanilla build testing.


* does make things about 3 times slower (but perhaps still acceptable; I'm only testing with one case at the moment)
* does make 19 more types fully defined
* loses the types of 997 symbols

The last of these is a clear regression. I hope you can reproduce it easily.

It's possible that Alexey's bug report with a small test case for undefined types may help point to a way forward.
Comment 6 gprocida 2022-09-08 09:28:29 UTC
It turns out that there is now a new infinite loop in the XML writer when compiled with Clang. This is our old "friend" where containers have inconsistent equality and hash functions.

In particular:

Commit 7ecef636 "Canonicalize DIEs w/o assuming ODR & handle typedefs transparently" introduced new sets and a map in the XML writer to track referencing, emission and type ids for "non canonicalized" types as a separate category.

It added non_canonicalized_type_equal and non_canonicalized_type_hash function objects. The former is a deep equality and the latter hashes the non-internal name.

The net effect is that I see a typedef being repeatedly emitted in a tight loop. It's recorded as emitted but not found when checked to see if it was emitted.

The last time these issues occurred they were eventually resolved by using much safer and more efficient bare pointers as the keys to the containers.

I don't understand why new sets and maps were added or why they need to have bespoke equality and hash functions.
Comment 7 dodji 2022-09-19 10:19:17 UTC

The branch contains new content that re-instated typedef canonicalization and many fixes that comes along with it.

The infinite loops along with many other things should be better now.

Thanks for looking into this.

I am planning to merge this soon, unless you find something that is deleterious.  But all in all, I feel like this is an improvement over what we have in master.

Thanks again!