Consider odr-struct: ... $ make odr-struct ... It has two structs aaa (from different CUs), each with member of type bbb and ccc, but in one case bbb is a decl, in the other case ccc is a decl: ... $ readelf -wi odr-struct | egrep -A2 "DW_TAG_structure" | egrep "DW_TAG|DW_AT_name|DW_AT_decl" <1><f4>: Abbrev Number: 2 (DW_TAG_structure_type) <f5> DW_AT_name : ccc <1><114>: Abbrev Number: 2 (DW_TAG_structure_type) <115> DW_AT_name : aaa <1><139>: Abbrev Number: 5 (DW_TAG_structure_type) <13a> DW_AT_name : bbb <13e> DW_AT_declaration : 1 <1><1af>: Abbrev Number: 2 (DW_TAG_structure_type) <1b0> DW_AT_name : bbb <1><1cf>: Abbrev Number: 2 (DW_TAG_structure_type) <1d0> DW_AT_name : aaa <1><1fa>: Abbrev Number: 6 (DW_TAG_structure_type) <1fb> DW_AT_name : ccc <1ff> DW_AT_declaration : 1 ... And we keep the same when doing dwz: ... $ cp odr-struct 1; ./dwz 1 $ readelf -wi 1 | egrep -A2 "DW_TAG_structure" | egrep "DW_TAG|DW_AT_name|DW_AT_decl" <1><e7>: Abbrev Number: 23 (DW_TAG_structure_type) <e8> DW_AT_name : ccc <1><101>: Abbrev Number: 23 (DW_TAG_structure_type) <102> DW_AT_name : aaa <1><11d>: Abbrev Number: 27 (DW_TAG_structure_type) <11e> DW_AT_name : bbb <122> DW_AT_declaration : 1 <1><17d>: Abbrev Number: 23 (DW_TAG_structure_type) <17e> DW_AT_name : bbb <1><197>: Abbrev Number: 23 (DW_TAG_structure_type) <198> DW_AT_name : aaa <1><1b6>: Abbrev Number: 27 (DW_TAG_structure_type) <1b7> DW_AT_name : ccc <1bb> DW_AT_declaration : 1 ... When doing odr, we end up with one struct aaa, and no decls: ... $ cp odr-struct 1; ./dwz 1 --odr $ readelf -wi 1 | egrep -A2 "DW_TAG_structure" | egrep "DW_TAG|DW_AT_name|DW_AT_decl" <1><19>: Abbrev Number: 25 (DW_TAG_structure_type) <1a> DW_AT_name : ccc <1><2f>: Abbrev Number: 25 (DW_TAG_structure_type) <30> DW_AT_name : aaa <1><4b>: Abbrev Number: 25 (DW_TAG_structure_type) <4c> DW_AT_name : bbb ... Now consider: ... $ cp odr-struct 1; cp 1 2; ./dwz -m 3 1 2 --odr ... The desired outcome is that the structs aaa are unified in both 1 and 2, and then moved to multifile 3. Sofar, so good: ... $ readelf -wi 3 | egrep -A2 "DW_TAG_structure" | egrep "DW_TAG|DW_AT_name|DW_AT_decl" <1><14>: Abbrev Number: 15 (DW_TAG_structure_type) <15> DW_AT_name : ccc <1><2a>: Abbrev Number: 15 (DW_TAG_structure_type) <2b> DW_AT_name : aaa <1><46>: Abbrev Number: 15 (DW_TAG_structure_type) <47> DW_AT_name : bbb ... But: ... $ readelf -wi 1 | egrep -A2 "DW_TAG_structure" | egrep "DW_TAG|DW_AT_name|DW_AT_decl" <1><1e>: Abbrev Number: 12 (DW_TAG_structure_type) <1f> DW_AT_name : aaa <1><3d>: Abbrev Number: 12 (DW_TAG_structure_type) <3e> DW_AT_name : bbb ... And it's not just that these are dead dies left behind. The 0x1e DIE is used here: ... <1><114>: Abbrev Number: 20 (DW_TAG_variable) <115> DW_AT_name : (alt indirect string, offset: 0xe) <11b> DW_AT_type : <0x1e> ...
This seems to be related to DW_AT_decl_file. The first sign of trouble is that this DIE in the multifile: ... 4f O 83a3029f 0 member_four member (type: 16f int base_type) ... has a different checksum than this DIE from file 1 (in the finalize_multifile phase): ... 54 O a793678e 0 member_four member (type: 7c int base_type) ... Looking at checksum calculation of the 4f: ... DIE 4f, hash: 3a09c42b, tag DIE 4f, hash: b77c105, attr (0) DIE 4f, hash: 2d5a104c, attr (1) DIE 4f, hash: 7a4061d4, attr (2) DIE 4f, hash: 1d9a1621, attr (3) DIE 4f, hash: 83a3029f, attr (4) DIE 4f, hash: 83a3029f, final ... and the 54 DIEs: ... DIE 54, hash: 3a09c42b, tag DIE 54, hash: b77c105, attr (0) DIE 54, hash: b8ab158e, attr (1) DIE 54, hash: d527a92, attr (2) DIE 54, hash: 402d45a9, attr (3) DIE 54, hash: a793678e, attr (4) DIE 54, hash: a793678e, final ... it's clear that the difference starts at attribute 1, which is the DW_AT_decl_file one. Looking at the 4f DIE: ... <2><4f>: Abbrev Number: 16 (DW_TAG_member) <50> DW_AT_name : (indirect string, offset: 0x3fd): member_four <54> DW_AT_decl_file : 3 <55> DW_AT_decl_line : 6 <56> DW_AT_type : <0x16f> <5a> DW_AT_data_member_location: 0 ... The File Name Table (offset 0x67): Entry Dir Time Size Name 1 1 0 0 odr.cc 2 1 0 0 odr.h 3 1 0 0 odr-2.cc 4 2 0 0 stddef.h 5 0 0 0 elf-init.c .... the decl_file is odr-2.cc. Looking at the 54 DIE: ... <2><54>: Abbrev Number: 26 (DW_TAG_member) <55> DW_AT_name : (indirect string, offset: 0x280): member_four <59> DW_AT_decl_file : 1 <5a> DW_AT_decl_line : 6 <5b> DW_AT_type : <0x7c> <5f> DW_AT_data_member_location: 0 The File Name Table (offset 0x131): Entry Dir Time Size Name 1 1 0 0 odr.cc 2 1 0 0 odr.h ... the decl_file is odr.cc.
The same problem exists for struct bbb, the struct containing member_four. Looking at the original file: ... $ llvm-dwarfdump ../odr-struct | grep -A3 struct | egrep -v "^--|DW_AT_byte_size" | sed 's%/home/vries/dwz/dwz.git/testsuite/dwz.tests/%%' ../odr-struct: file format ELF64-x86-64 .debug_info contents: 0x00000000: Compile Unit: length = 0x0000002a version = 0x0002 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x0000002e) 0x000000f4: DW_TAG_structure_type DW_AT_name ("ccc") DW_AT_decl_file ("odr.cc") 0x00000114: DW_TAG_structure_type DW_AT_name ("aaa") DW_AT_decl_file ("odr.h") 0x00000139: DW_TAG_structure_type DW_AT_name ("bbb") DW_AT_declaration (true) 0x000001af: DW_TAG_structure_type DW_AT_name ("bbb") DW_AT_decl_file ("odr-2.cc") 0x000001cf: DW_TAG_structure_type DW_AT_name ("aaa") DW_AT_decl_file ("odr.h") 0x000001fa: DW_TAG_structure_type DW_AT_name ("ccc") DW_AT_declaration (true) ... it seems that perhaps odr-2.cc is the correct file for struct bbb, so the multifile is correct, and single-file-optimized file 1 is wrong. In other words, we can reproduce the problem without multifile mode like this: ... $ cp ../odr-struct 1; ../dwz 1 --odr $ llvm-dwarfdump 1 | grep -A3 struct | egrep -v "^--|DW_AT_byte_size" | sed 's%/home/vries/dwz/dwz.git/testsuite/dwz.tests/%%' 0x00000019: DW_TAG_structure_type DW_AT_name ("ccc") DW_AT_decl_file ("odr.cc") 0x0000002f: DW_TAG_structure_type DW_AT_name ("aaa") DW_AT_decl_file ("odr.h") 0x0000004b: DW_TAG_structure_type DW_AT_name ("bbb") DW_AT_decl_file ("odr.cc") ...
Alternatively, by ignoring locs, we get the desired optimization: ... $ cp ../odr-struct 1; cp 1 2; ../dwz -m 3 1 2 --odr --devel-ignore-loc $ readelf -wi 3 | egrep -A2 "DW_TAG_structure" | egrep "DW_TAG|DW_AT_name|DW_AT_decl" <1><14>: Abbrev Number: 15 (DW_TAG_structure_type) <15> DW_AT_name : ccc <1><2a>: Abbrev Number: 15 (DW_TAG_structure_type) <2b> DW_AT_name : aaa <1><46>: Abbrev Number: 15 (DW_TAG_structure_type) <47> DW_AT_name : bbb $ readelf -wi 1 | egrep -A2 "DW_TAG_structure" | egrep "DW_TAG|DW_AT_name|DW_AT_decl" $ ...
The problem is caused by the fact that the duplicate chain: ... duplicate chain: 139 O 26b2adba(fdca21c2) 26b2adba bbb structure_type 1af O 26b2adba(d80f5f71) 26b2adba bbb structure_type ... starts with a decl in CU1: ... <1><139>: Abbrev Number: 5 (DW_TAG_structure_type) <13a> DW_AT_name : bbb <13e> DW_AT_declaration : 1 ... and then has a def in CU2: ... <1><1af>: Abbrev Number: 2 (DW_TAG_structure_type) <1b0> DW_AT_name : bbb <1b4> DW_AT_byte_size : 4 <1b5> DW_AT_decl_file : 1 <1b6> DW_AT_decl_line : 4 <1b7> DW_AT_sibling : <0x1c8> ... This is a problem for writing out the dies, since there we count on accessing the attributes and children of the first die in the duplicate chain. This problem is fixed by reorder_dups, which switches order of def and decl. However, consequently the DW_AT_decl_file with value 1, referring to odr-2.cc in CU2 will be interpreted using the file table of CU1, and ends up referring to odr.cc instead. This is the root cause.
For-the-record posting: https://sourceware.org/pipermail/dwz/2021q1/000951.html
proposed fix: https://sourceware.org/pipermail/dwz/2021q1/000953.html
Fixed in commit https://sourceware.org/git/?p=dwz.git;a=commit;h=61c8d81134becf41be124a540a4e0288b6798761