Bug 27568 - Add initial .debug_info reading phase to dwz
Summary: Add initial .debug_info reading phase to dwz
Status: NEW
Alias: None
Product: dwz
Classification: Unclassified
Component: default (show other bugs)
Version: unspecified
: P2 enhancement
Target Milestone: ---
Assignee: Nobody
Depends on:
Blocks: 25229 25459 27544 27557
  Show dependency treegraph
Reported: 2021-03-12 10:54 UTC by Tom de Vries
Modified: 2021-03-12 10:56 UTC (History)
1 user (show)

See Also:
Last reconfirmed:


Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2021-03-12 10:54:12 UTC
Currently dwz is setup such that the .debug_info section is read once (unless we have low-mem mode).

There is a number of problems with this approach.

I. Forward (pseudo) references. 

When encountering a reference, and it's intra-CU or backward inter-CU, we can determine whether the reference is to an actual DIE or not, because we've already processed those DIEs and they're in the off_htab. But when encountering a forward inter-CU reference, we don't known, because those DIEs haven't been processed yet. See PR25459 for a pseudo-reference example.

If in all cases we can determine whether a DIE reference is valid or not, we can handle the invalid ones more gracefully: assume a value of 0 and continue processing (PR27544).

II. Input error messages are generated on-the-fly (PR 25229)

Input errors are generated when and where they are encountered.
- it may take a long while to find out that we cannot optimize the file.
- the error messages are not deterministic.  An unrelated change in input
  may influence whether the error is triggered or not.
- the errors may have to be handled in more than one location in the source code

III. Sub-file level parallelization (PR27557)

Things that can and cannot be parallelized are interwoven in the implementation.    If we'd have two phases or reading .debug_info instead of one, we could try to move non-parallelizable to the first phase, and parallelizable parts to the second phase, as well as detect parallelization-breaking conditions in the first phase.

IV. Ad-hoc determination of optimization scope

Computation of optimization scope (i.e, propagation of CK_BAD, die_no_multifile) is done in parallel with the optimization.  Some of this propagation is backwards, and needs to be fixed up after optimization to get propagation complete (see propagate_multifile, PR25109), which is awkward.

Also, for odr we have the inelegant and expensive solution of calling checksum_ref_die twice (PR26252), once to do the CK_BAD propagation, which gives us wrong checksums for the odr DIEs, which we then fixup, after which we call checksum_ref_die again, to recalculate the checksums for the non-odr DIEs.  If the CK_BAD propagation was separated from the checksum calculation, we wouldn't have to redo the checksum calculation for non-odr DIEs.