Bug 31237 - [gdb, clang] FAIL: gdb.gdb/index-file.exp: gdb-index files are identical
Summary: [gdb, clang] FAIL: gdb.gdb/index-file.exp: gdb-index files are identical
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-12 08:13 UTC by Tom de Vries
Modified: 2024-01-16 12:36 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Demonstrator patch adding "maint find-addr" (2.49 KB, patch)
2024-01-16 12:36 UTC, Tom de Vries
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2024-01-12 08:13:00 UTC
I build gdb with clang, and ran into:
...
(gdb) PASS: gdb.gdb/index-file.exp: create second dwarf-index files
Executing on host: cmp -s "/home/vries/gdb/build/gdb/testsuite/outputs/gdb.gdb/index-file/index_1/xgdb.gdb-index" "/home/vries/gdb/build/gdb/testsuite/outputs/gdb.gdb/index-file/index_2/xgdb.gdb-index"    (timeout = 300)
builtin_spawn -ignore SIGHUP cmp -s /home/vries/gdb/build/gdb/testsuite/outputs/gdb.gdb/index-file/index_1/xgdb.gdb-index /home/vries/gdb/build/gdb/testsuite/outputs/gdb.gdb/index-file/index_2/xgdb.gdb-index^M
FAIL: gdb.gdb/index-file.exp: gdb-index files are identical
...

And indeed:
...
$ ls -al index_2/xgdb.gdb-index
-rw-------. 1 vries vries 13582433 12 jan 09:08 index_2/xgdb.gdb-index
$ ls -al  index_1/xgdb.gdb-index
-rw-------. 1 vries vries 13582873 12 jan 09:08 index_1/xgdb.gdb-index
...
Comment 1 Tom de Vries 2024-01-12 10:18:13 UTC
(In reply to Tom de Vries from comment #0)
> I build gdb with clang, and ran into:

This was on aarch64-linux.

Now also reproduced on x86_64-linux, with clang 15.0.7, both at -O0 and 02.
Comment 2 Tom Tromey 2024-01-13 01:56:37 UTC
I wonder if this could be a race in the DWARF reader.
Comment 3 Tom de Vries 2024-01-13 14:25:14 UTC
(In reply to Tom Tromey from comment #2)
> I wonder if this could be a race in the DWARF reader.

At first glance, that doesn't seem to be the case.

I get consistent results for worker-threads 8, and consistent results for work-threads 4, just a difference between those 2.

It might be a similar issue as fixed by commit aff250145af ("gdb: generate gdb-index identically regardless of work thread count") for the symbol table, but for the address table.
Comment 4 Tom de Vries 2024-01-13 14:54:54 UTC
Hmm, there's an actual difference in the address table:
...
$ grep " 122$" 1 | sort -d
000000000043ac80 000000000043ad48 122
00000000006bbeb0 00000000006bd024 122
00000000028d4aa8 00000000028d4aa9 122
00000000028d4b50 00000000028d4b51 122
0000000002b74880 0000000002b74881 122
0000000002b74888 0000000002b74889 122
$ grep " 122$" 2 | sort -d
00000000006bbeb0 00000000006bd024 122
00000000028d4aa8 00000000028d4aa9 122
00000000028d4b50 00000000028d4b51 122
0000000002b74880 0000000002b74881 122
0000000002b74888 0000000002b74889 122
...
The first element of 1 is extra, the rest of the entries is equal.

Let's grep for the starting address of the extra element: 
...
$ grep 43ac80 1
000000000043ac80 000000000043ad48 122
000000000043ac80 000000000043ace8 244
000000000043ac80 000000000043b1c4 490
000000000043ac80 000000000043ad48 600
$ grep 43ac80 2
000000000043ac80 000000000043ace8 244
000000000043ac80 000000000043b1c4 490
...

So, for a given address 0x43ac80 there is a number of CUs that match, and the number is different between the two address tables.
Comment 5 Tom de Vries 2024-01-16 07:42:38 UTC
(In reply to Tom de Vries from comment #4)
> So, for a given address 0x43ac80 there is a number of CUs that match, and
> the number is different between the two address tables.

And, the overlap is caused by clang generating:
...
Disassembly of section .text._ZN14frame_info_ptrC2ERKS_:

0000000000000000 <_ZN14frame_info_ptrC2ERKS_>:
...
after which the code is merged by the linker.
Comment 6 Tom de Vries 2024-01-16 10:55:08 UTC
The cooked index consists of shards, and each shard has an address map, of type addrmap_mutable.

The addr_mutable object supports overlapping insertion, but with insertion-order specific results.

So, by changing the number of worker threads, we change the number of shards, and consequently the CUs indexed by each shard, and consequently the sequence of insertions into each per-shard address map.

This is fine for the non-overlapping cases, we'd expect the same collection of <key,value> pairs, just distributed differently over the shards.

But for the overlapping case, that doesn't work.
Comment 7 Tom de Vries 2024-01-16 12:36:11 UTC
Created attachment 15306 [details]
Demonstrator patch adding "maint find-addr"

To easily demonstrate the problem outside of the scope of index writing:
...
$ gdb -q -batch -iex "maint set worker-thread 4" xgdb -ex "maint find-addr 0x43ac80"
((dwarf2_per_cu_data *) 0xffff30005da0), offset: 551f
((dwarf2_per_cu_data *) 0xffff3000f4c0), offset: feffeb
((dwarf2_per_cu_data *) 0xffff30018750), offset: 1f6b978
((dwarf2_per_cu_data *) 0)
((dwarf2_per_cu_data *) 0)
$ gdb -q -batch -iex "maint set worker-thread 8" xgdb -ex "maint find-addr 0x43ac80"
((dwarf2_per_cu_data *) 0xffff10005da0), offset: 551f
((dwarf2_per_cu_data *) 0xffff1000a810), offset: 7e8be8
((dwarf2_per_cu_data *) 0xffff1000f4c0), offset: feffeb
((dwarf2_per_cu_data *) 0)
((dwarf2_per_cu_data *) 0xffff10018750), offset: 1f6b978
((dwarf2_per_cu_data *) 0xffff1001d580), offset: 251ad16
((dwarf2_per_cu_data *) 0)
((dwarf2_per_cu_data *) 0)
((dwarf2_per_cu_data *) 0)
...