Pedro pointed out that using the index cache is still faster than the new DWARF reader. I think I found why: What happens here is that gdb does the finalization step in the background. Interactively, this helps make it seem faster. However, it's not really faster, it is just deferring some work. You can see it with an invocation like: gdb -q -batch -iex 'set debug timestamp 1' -iex 'set debug dwarf-read 1' -ex start ./gdb Here for gdb-with-an-index I see: 0.820423 [dwarf-read] dwarf2_initialize_objfile: found gdb index from file 1.699175 [dwarf-read] process_queue: Expanding one or more symtabs of objfile /tmp/gdb.idx ... But for ordinary gdb: 1.310033 [dwarf-read] dwarf2_build_psymtabs_hard: Done building psymtabs of /tmp/gdb 4.666649 [dwarf-read] process_queue: Expanding one or more symtabs of objfile /tmp/gdb ... This is disappointing of course. It might be possible to reduce this time at the cost of a bit more complexity in the lookup code and perhaps a bit more memory use. For example, right now finalization merges the results from all the readers into a single entry table -- but maybe the table could remain sharded instead. Another possibility might be to do the work in worker threads and, instead of sharding the result, pre-sort the vectors and use a sorted merge operation to combine them. See the last messages in this thread: https://sourceware.org/pipermail/gdb-patches/2022-April/188194.html
I have a patch to "shard" the index. The idea is to do each finalization step locally in a given index, and never combine them into the cooked vector. This seems better for performance, at least from a time-to-first-prompt perspective. And, the memory overhead doesn't seem so bad, either. Before: (gdb) file /tmp/gdb 2022-05-13 14:54:41.981 - command started Reading symbols from /tmp/gdb... 2022-05-13 14:54:42.391 - command finished Command execution time: 2.202096 (cpu), 0.410121 (wall) Space used: 10010624 (+6594560 for this command) After: (gdb) file /tmp/gdb 2022-05-13 14:54:19.907 - command started Reading symbols from /tmp/gdb... 2022-05-13 14:54:20.125 - command finished Command execution time: 1.025581 (cpu), 0.218469 (wall) Space used: 10014720 (+6598656 for this command)
... however there is a new crash somewhere so take the numbers with a grain of salt.
https://sourceware.org/pipermail/gdb-patches/2022-May/189116.html
Hi Tom, I'm not sure why I am starting to see this only today, but I see a lot of tests timing out. gdb.base/sizeof.exp is one of them. It looks like it's due to a slowdown that started with the new DWARF reader patches. My build uses -O0 and all the sanitizer and glibcxx debug stuff, so it doesn't help, but I see the same relative slowdown without that. I copied the test program of gdb.base/sizeof.exp to /tmp/sizeof so it could be reused easily across builds. The command I use is: time ./gdb -q --data-directory=data-directory -nx /tmp/sizeof -ex "start" -iex "set debug dwarf-read 0" -ex "p/d sizeof(short)" -batch This replicates the first steps of the test. Using 3d20b8d99a5: ./gdb -q --data-directory=data-directory -nx /tmp/sizeof -ex "start" -ex -e 1.18s user 0.17s system 116% cpu 1.157 total Using 3d20b8d99a5^: ./gdb -q --data-directory=data-directory -nx /tmp/sizeof -ex "start" -ex -e 0.40s user 0.07s system 109% cpu 0.427 total By toggling "set debug dwarf-read" to 1, I noticed that we are now expanding a lot of CUs from libc's debug info when doing the "print sizeof": [dwarf-read] process_queue: Expanding one or more symtabs of objfile /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so ... [dwarf-read] process_queue: Expanding symtab of CU at offset 0xd3cbe [dwarf-read] process_queue: Done expanding CU at offset 0xd3cbe [dwarf-read] process_queue: Done expanding symtabs of /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so. [dwarf-read] process_queue: Expanding one or more symtabs of objfile /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so ... [dwarf-read] process_queue: Expanding symtab of CU at offset 0x22bf6f [dwarf-read] process_queue: Done expanding CU at offset 0x22bf6f [dwarf-read] process_queue: Done expanding symtabs of /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so. [dwarf-read] process_queue: Expanding one or more symtabs of objfile /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so ... [dwarf-read] process_queue: Expanding symtab of CU at offset 0x7ef56e [dwarf-read] process_queue: Done expanding CU at offset 0x7ef56e [dwarf-read] process_queue: Done expanding symtabs of /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so. ... We expand ~1500 CUs. Before, we didn't do that. So having debug info for libc is important for reproducing this, and a GDB configured with --prefix/usr or some other switch that makes it find separate debug info in /usr/lib/debug (by default a local build would look in /usr/local/lib/debug). I am reproducing this on Ubuntu 20.04.
I investigated the problem reported in my last message a little bit. What I saw was that the cooked index contains entries for base types, like "int". When doing "print sizeof(short)", it eventually looks up the "int" type, (since short means "short int"), and all CUs match and get expanded, because they all have an int type. Previously with partial symbols, I don't think we had entries for those. What is the problem here? - Do we want entries for base types in the index? - Should the type search algorithm expand all CUs that contain a matching type? Couldn't it stop at the first CU that has a matching type?
> So having debug info for libc is important for reproducing this I suppose it has to be a libc that has debug info but does not use .gdb_index. > When doing "print sizeof(short)", it eventually looks up the "int" type, > (since short means "short int") That seems peculiar. > Previously with partial symbols, I don't think we had entries for those. Probably they got unified by the bcache. > What is the problem here? > > - Do we want entries for base types in the index? > - Should the type search algorithm expand all CUs that contain a matching > type? Couldn't it stop at the first CU that has a matching type? I think you can't really avoid base types in the index. That seems like it would open up a number of problems. Maybe just stopping on the first type would be ok. We'll have to investigate. Otherwise some kind of de-duplication can be done; though I'd prefer to push this kind of thing from read-time to lookup-time.
I'm currently debugging objfile::lookup_symbol, and it seems the intention to stop expanding when the symbol is found. What I see is that we are looking for the "signed int" type: (top-gdb) frame #6 0x0000555555c5dd69 in objfile::lookup_symbol (this=0x555556807040, kind=STATIC_BLOCK, name=0x7fffffffaca0 "signed int", domain=VAR_DOMAIN) at symfile-debug.c:276 276 if (!iter->expand_symtabs_matching (this, But in cooked_index_functions::expand_symtabs_matching, we decided to match index entries with "int" instead of "signed int" (see the name_vec vector): (top-gdb) frame #5 0x00005555558e1d07 in cooked_index_functions::expand_symtabs_matching(objfile*, gdb::function_view<bool (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool (char const*)>, gdb::function_view<bool (compunit_symtab*)>, enum_flags<block_search_flag_values>, domain_enum_tag, search_domain) (this=0x555556809360, objfile=0x555556807040, file_matcher=..., lookup_name=0x7fffffffa760, symbol_matcher=..., expansion_notify=..., search_flags=..., domain=VAR_DOMAIN, kind=ALL_DOMAIN) at dwarf2/read.c:20288 20288 if (!dw2_expand_symtabs_matching_one (entry->per_cu, per_objfile, (top-gdb) p name_vec $13 = std::vector of length 1, capacity 1 = {"int"} Most CUs have a type called "int", so that matches a lot of them. Back in the expansion notifier in objfile::lookup_symbol, we search the just expanded CUs for the "signed int" type, and we don't find it, so we return false, meaning "not found, please expand the next CU". I'm not too familiar with this code, but the discrepancy between the criteria on which we choose the CUs to expand (those that have the "int" type) and then the type we actually look for ("signed int") is odd. In the previous psymtabs code, I think we looked for partial symbols named "signed int", and didn't find any, so it didn't cause expansion of these CUs.
"signed int" becomes "int" when lookup_name_info::split_name is called with lang == language_cplus: #0 cp_canonicalize_string (string=0x7fffffffaca0 "signed int") at cp-support.c:654 #1 0x0000555555c7dab4 in demangle_for_lookup (name=0x7fffffffaca0 "signed int", lang=language_cplus, storage=...) at symtab.c:1913 #2 0x0000555555c7d913 in demangle_for_lookup_info::demangle_for_lookup_info (this=0x7fffffffa4b8, lookup_name=..., lang=language_cplus) at symtab.c:1864 #3 0x00005555558255d8 in gdb::optional<demangle_for_lookup_info>::emplace<lookup_name_info const&, language> (this=0x7fffffffa4b8) at ./../gdbsupport/gdb_optional.h:155 #4 0x0000555555824fc3 in lookup_name_info::maybe_init<gdb::optional<demangle_for_lookup_info>, language> (this=0x7fffffffa470, field=...) at /tmp/apres/gdb/symtab.h:350 #5 0x0000555555824d8d in lookup_name_info::cplus (this=0x7fffffffa470) at /tmp/apres/gdb/symtab.h:322 #6 0x00005555558468f2 in lookup_name_info::language_lookup_name (this=0x7fffffffa470, lang=language_cplus) at /tmp/apres/gdb/symtab.h:281 #7 0x00005555558f918e in lookup_name_info::split_name (this=0x7fffffffa470, lang=language_cplus) at ./symtab.h:309 #8 0x00005555558e198f in cooked_index_functions::expand_symtabs_matching(objfile*, gdb::function_view<bool (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool (char const*)>, gdb::function_view<bool (compunit_sy mtab*)>, enum_flags<block_search_flag_values>, domain_enum_tag, search_domain) (this=0x5555567721a0, objfile=0x555556593400, file_matcher=..., lookup_name=0x7fffffffa6a0, symbol_matcher=..., expansion_notify=..., search_flags=..., domain=VAR_DOMAIN, kind=ALL_DOMAIN) at dwarf2/read.c:20223 #9 0x0000555555c5dd69 in objfile::lookup_symbol (this=0x555556593400, kind=GLOBAL_BLOCK, name=0x7fffffffaca0 "signed int", domain=VAR_DOMAIN) at symfile-debug.c:276 #10 0x0000555555c7f0fc in lookup_symbol_via_quick_fns (objfile=0x555556593400, block_index=GLOBAL_BLOCK, name=0x7fffffffaca0 "signed int", domain=VAR_DOMAIN) at symtab.c:2451 #11 0x0000555555c7f5b9 in lookup_symbol_in_objfile (During symbol reading: cannot get low and high bounds for subprogram DIE at 0x3956955 objfile=0x555556593400, block_index=GLOBAL_BLOCK, name=0x7fffffffaca0 "signed int", domain=VAR_DOMAIN) at symtab.c:2599 #12 0x0000555555c7f7b6 in lookup_symbol_global_or_static_iterator_cb (objfile=0x555556593400, cb_data=0x7fffffffaa30) at symtab.c:2663 #13 0x0000555555c24b70 in svr4_iterate_over_objfiles_in_search_order (gdbarch=0x5555567b34a0, cb=0x555555c7f722 <lookup_symbol_global_or_static_iterator_cb(objfile*, void*)>, cb_data=0x7fffffffaa30, current_objfile=0x0) at solib-sv r4.c:3168 #14 0x00005555556d319e in gdbarch_iterate_over_objfiles_in_search_order (gdbarch=0x5555567b34a0, cb=0x555555c7f722 <lookup_symbol_global_or_static_iterator_cb(objfile*, void*)>, cb_data=0x7fffffffaa30, current_objfile=0x0) at /tmp/ apres/gdb/gdbarch.c:4937 #15 0x0000555555c7f965 in lookup_global_or_static_symbol (name=0x7fffffffaca0 "signed int", block_index=GLOBAL_BLOCK, objfile=0x0, domain=VAR_DOMAIN) at symtab.c:2709 #16 0x0000555555c7faf7 in lookup_global_symbol (name=0x7fffffffaca0 "signed int", block=0x0, domain=VAR_DOMAIN) at symtab.c:2760 #17 0x0000555555c7f2ff in language_defn::lookup_symbol_nonlocal (this=0x5555564d2fc0 <c_language_defn>, name=0x7fffffffaca0 "signed int", block=0x0, domain=VAR_DOMAIN) at symtab.c:2520 #18 0x0000555555c7e5bd in lookup_symbol_aux (name=0x7fffffffaca0 "signed int", match_type=symbol_name_match_type::FULL, block=0x0, domain=VAR_DOMAIN, language=language_c, is_a_field_of_this=0x0) at symtab.c:2170 #19 0x0000555555c7dd18 in lookup_symbol_in_language (name=0x7fffffffaca0 "signed int", block=0x0, domain=VAR_DOMAIN, lang=language_c, is_a_field_of_this=0x0) at symtab.c:1962 #20 0x00005555559884d3 in lookup_typename (language=0x5555564d2fc0 <c_language_defn>, name=0x7fffffffaca0 "signed int", block=0x0, noerr=1) at gdbtypes.c:1694 #21 0x00005555559886ae in lookup_signed_typename (language=0x5555564d2fc0 <c_language_defn>, name=0x555555fdb81d "int") at gdbtypes.c:1723
The master branch has been updated by Tom Tromey <tromey@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=20a26f4e018cd0cb40a3b81568be0d4f164eed0f commit 20a26f4e018cd0cb40a3b81568be0d4f164eed0f Author: Tom Tromey <tromey@adacore.com> Date: Fri May 13 13:46:52 2022 -0600 Finalize each cooked index separately After DWARF has been scanned, the cooked index code does a "finalization" step in a worker thread. This step combines all the index entries into a single master list, canonicalizes C++ names, and splits Ada names to synthesize package names. While this step is run in the background, gdb will wait for the results in some situations, and it turns out that this step can be slow. This is PR symtab/29105. This can be sped up by parallelizing, at a small memory cost. Now each index is finalized on its own, in a worker thread. The cost comes from name canonicalization: if a given non-canonical name is referred to by multiple indices, there will be N canonical copies (one per index) rather than just one. This requires changing the users of the index to iterate over multiple results. However, this is easily done by introducing a new "chained range" class. When run on gdb itself, the memory cost seems rather low -- on my current machine, "maint space 1" reports no change due to the patch. For performance testing, using "maint time 1" and "file" will not show correct results. That approach measures "time to next prompt", but because the patch only affects background work, this shouldn't (and doesn't) change. Instead, a simple way to make gdb wait for the results is to set a breakpoint. Before: $ /bin/time -f%e ~/gdb/install/bin/gdb -nx -q -batch \ -ex 'break main' /tmp/gdb Breakpoint 1 at 0x43ec30: file ../../binutils-gdb/gdb/gdb.c, line 28. 2.00 After: $ /bin/time -f%e ./gdb/gdb -nx -q -batch \ -ex 'break main' /tmp/gdb Breakpoint 1 at 0x43ec30: file ../../binutils-gdb/gdb/gdb.c, line 28. 0.65 Regression tested on x86-64 Fedora 34. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
That patch solved one problem but not all the problems pointed out in this bug, so I'm leaving this open.
I managed to reproduce this with the 'sizeof' test program (after hacking gdb to ignore .gdb_index, since the Fedora libc debuginfo uses that). I couldn't reproduce it when running on gdb itself, and I discovered that I can avoid the problem by using C++. That is, when debugging sizeof: (gdb) set debug dwarf-read 1 (gdb) set lang c++ Warning: the current language does not match this frame. (gdb) print sizeof(short) $1 = 2 So I tend to think this probably a bug elsewhere in gdb. In C++, gdb canonializes inputs so that lookups will succeed. This isn't done for C, I think, probably because it seemed like there was no need. But given that the DWARF says "signed short" as a symbol name, perhaps we need some special cases for this sort of thing.
Hi Tom, I'm wondering whether this PR is still about the performance of the new DWARF reader, or is it now about some bug(s?). As this PR is marked as targeting GDB 13, would you mind telling us more about what's left to do for it? Thank you!
(In reply to Joel Brobecker from comment #12) > I'm wondering whether this PR is still about the performance of the new > DWARF reader, or is it now about some bug(s?). As this PR is marked as > targeting GDB 13, would you mind telling us more about what's left to do for > it? With the new reader, gdb seems to expand too many CUs in some cases, and we should probably fix this before 13. Comment #5 describes the scenario but I imagine it can be reproduced in other ways as well.
I was looking at another timeout failure that happens randomly on my system, it points to this. It's from test gdb.mi/mi-var-cmd.exp: The test tries to create a varobj with expression "int". It expects GDB to error out with "Attempt to use a type name as an expression." ^error,msg="-var-create: unable to create variable object" I can replicate this with: $ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.mi/mi-var-cmd/mi-var-cmd -ex start (gdb) set debug dwarf-read 1 (gdb) print int ... [dwarf-read] process_queue: Expanding one or more symtabs of objfile /usr/lib/debug/.build-id/69/389d485a9793dbe873f0ea2c93e02efaa9aa3d.debug ... [dwarf-read] process_queue: Expanding symtab of CU at offset 0x480bd4 [dwarf-read] process_queue: Done expanding CU at offset 0x480bd4 [dwarf-read] process_queue: Done expanding symtabs of /usr/lib/debug/.build-id/69/389d485a9793dbe873f0ea2c93e02efaa9aa3d.debug. [dwarf-read] process_queue: Expanding one or more symtabs of objfile /usr/lib/debug/.build-id/69/389d485a9793dbe873f0ea2c93e02efaa9aa3d.debug ... [dwarf-read] process_queue: Expanding symtab of CU at offset 0x48cd5e [dwarf-read] process_queue: Done expanding CU at offset 0x48cd5e [dwarf-read] process_queue: Done expanding symtabs of /usr/lib/debug/.build-id/69/389d485a9793dbe873f0ea2c93e02efaa9aa3d.debug. [dwarf-read] process_queue: Expanding one or more symtabs of objfile /usr/lib/debug/.build-id/69/389d485a9793dbe873f0ea2c93e02efaa9aa3d.debug ... [dwarf-read] process_queue: Expanding symtab of CU at offset 0x498c49 [dwarf-read] process_queue: Done expanding CU at offset 0x498c49 [dwarf-read] process_queue: Done expanding symtabs of /usr/lib/debug/.build-id/69/389d485a9793dbe873f0ea2c93e02efaa9aa3d.debug. ... Attempt to use a type name as an expression The post-indexer-changes version takes a long time, expanding pretty much all CUs, while the pre-indexer-changes version finishes almost instantly.
I've been debugging this a bit. For the sizeof test, gdb makes an index entry for "short int". This comes directly from the DWARF: <1><bc>: Abbrev Number: 2 (DW_TAG_base_type) <bd> DW_AT_byte_size : 2 <be> DW_AT_encoding : 5 (signed) <bf> DW_AT_name : (indirect string, offset: 0x120): short int lookup_signed_typename first looks for "signed short" though. (top-gdb) up #15 0x00000000007bb408 in lookup_typename (language=0x24c0da0 <c_language_defn>, name=0x7fffffffab50 "signed short", block=0x0, noerr=1) at ../../binutils-gdb/gdb/gdbtypes.c:1700 (top-gdb) up #16 0x00000000007bb5ac in lookup_signed_typename (language=0x24c0da0 <c_language_defn>, name=0x12528f8 "short") at ../../binutils-gdb/gdb/gdbtypes.c:1729 Now, trying this with an older gdb shows that psyms also make these same symbol names: (gdb) maint print psymbols [...] `short int', type, 0x0 I'm not sure why it works in this case and not in the new code. I added a 'maint canonicalize' command and it shows: (gdb) maint canonicalize short No change. (gdb) maint canonicalize short int canonical = short (gdb) maint canonicalize signed short int canonical = short (gdb) maint canonicalize signed short canonical = short That is, one oddity here is that the C++ canonical form for these names differs from what's in the DWARF and what things like lookup_signed_typename expect. Next I'm going to look at canonicalization for C, at least for those types that have multiple spellings.
Name canonicalization for C seems to work fine. I applied a patch to disable the gdb-index (Fedora debuginfo ships with this) and then I tried the "sizeof" test with DWARF debugging enabled, and "print sizeof(short)" doesn't expand anything. Only the symbol readers have to be updated to canonicalize for C, not lookup, because the C parser arranges to only look up the canonical names anyway. (Well, lookup_signed_typename did need a small tweak.) I am not sure how to write a test for this.
Not sure how to help you, I don't know anything about all of this. In the mean time, I stumbled on another testsuite timeout probably caused by this: $ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.base/info-os/info-os -ex start -ex 'call ((int (*) (void)) getpid) ()' -batch The call takes a long time, many CUs are expanded, which probably shouldn't. Probably because of the "int" in the function pointer type. I'm noting it here so I can test it again once we have a patch to test.
dw2-unusual-field-names.exp points out that the canonicalization should probably depend on the DWARF tag, which is pretty unfortunate. According to the history, a type like this is only defined by gdb itself, but of course the test invalidates this by writing explicit DWARF to this end, necessitating a workaround in the reader :(
Also: ptype enum EU type = enum EU : unsigned int integer {TWO = 2} (gdb) FAIL: gdb.dwarf2/enum-type.exp: ptype EU in C++ though maybe this can be fixed in the test.
(In reply to Simon Marchi from comment #17) > The call takes a long time, many CUs are expanded, which probably shouldn't. > Probably because of the "int" in the function pointer type. I'm noting it > here so I can test it again once we have a patch to test. Try my branch "pr-29105-signed-index"
(In reply to Tom Tromey from comment #20) > (In reply to Simon Marchi from comment #17) > > The call takes a long time, many CUs are expanded, which probably shouldn't. > > Probably because of the "int" in the function pointer type. I'm noting it > > here so I can test it again once we have a patch to test. > > Try my branch "pr-29105-signed-index" All the cases I have reported appear fixed (in that they don't expand a ton of CUs) with that branch.
https://sourceware.org/pipermail/gdb-patches/2022-November/193530.html
I can confirm this solves at least one of the problems I have seen with slow debugging. Those problems are sluggishness stepping into functions and program startup. I believe the latter is due to having the glibc debug packages installed on my CentOS 7 system. The former - slow stepping into functions - appears to be related to the LLVM linker. If use the BFD or gold linker to generate my executables and shared libraries, stepping into function is pretty snappy.
(In reply to Tom Kacvinsky from comment #23) > I can confirm this solves at least one of the problems I have seen with slow > debugging. Those problems are sluggishness stepping into functions and > program startup. I believe the latter is due to having the glibc debug > packages installed on my CentOS 7 system. The former - slow stepping into > functions - appears to be related to the LLVM linker. If use the BFD or > gold linker to generate my executables and shared libraries, stepping into > function is pretty snappy. You last problem sounds a lot like: https://sourceware.org/bugzilla/show_bug.cgi?id=27754
(In reply to Simon Marchi from comment #24) > (In reply to Tom Kacvinsky from comment #23) > > I can confirm this solves at least one of the problems I have seen with slow > > debugging. Those problems are sluggishness stepping into functions and > > program startup. I believe the latter is due to having the glibc debug > > packages installed on my CentOS 7 system. The former - slow stepping into > > functions - appears to be related to the LLVM linker. If use the BFD or > > gold linker to generate my executables and shared libraries, stepping into > > function is pretty snappy. > > You last problem sounds a lot like: > > https://sourceware.org/bugzilla/show_bug.cgi?id=27754 Indeed it is. I see a lot of DW_MACRO_import - offset : 0x0 in the ELF files when compiling with -ggdb3 and using lld as the linker. If I switch to the gold or bfd linker, the myriad "offset : 0x0" go away. So, my suspicion is that this is not gdb's fault, but something undesirable that lld is doing.
The master branch has been updated by Simon Marchi <simark@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=00a5867df72983c3f8a11c9955c5032d6f601b70 commit 00a5867df72983c3f8a11c9955c5032d6f601b70 Author: Simon Marchi <simon.marchi@efficios.com> Date: Thu Dec 1 11:44:41 2022 -0500 gdb/dwarf: add some QUIT macros While testing the fix for PR 29105, I noticed I couldn't ctrl-C my way out of GDB expanding many symtabs. GDB was busy in a loop in cooked_index_functions::expand_symtabs_matching. Add a QUIT there. I also happened to see a spot in cooked_index_functions::expand_matching_symbols where a QUIT would be useful too, since we iterate over a potentially big number of index entries and expand CUs in the loop. Add one there too. Change-Id: Ie1d650381df7f944c16d841b3e592d2dce7306c3 Approved-By: Kevin Buettner <kevinb@redhat.com>
The master branch has been updated by Tom Tromey <tromey@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=55fc1623f942fba10362cb199f9356d75ca5835b commit 55fc1623f942fba10362cb199f9356d75ca5835b Author: Tom Tromey <tromey@adacore.com> Date: Thu Nov 3 13:49:17 2022 -0600 Add name canonicalization for C PR symtab/29105 shows a number of situations where symbol lookup can result in the expansion of too many CUs. What happens is that lookup_signed_typename will try to look up a type like "signed int". In cooked_index_functions::expand_symtabs_matching, when looping over languages, the C++ case will canonicalize this type name to be "int" instead. Then this method will proceed to expand every CU that has an entry for "int" -- i.e., nearly all of them. A crucial component of this is that the caller, objfile::lookup_symbol, does not do this canonicalization, so when it tries to find the symbol for "signed int", it fails -- causing the loop to continue. This patch fixes the problem by introducing name canonicalization for C. The idea here is that, by making C and C++ agree on the canonical name when a symbol name can have multiple spellings, we avoid the bad behavior in objfile::lookup_symbol (and any other such code -- I don't know if there is any). Unlike C++, C only has a few situations where canonicalization is needed. And, in particular, due to the lack of overloading (thus avoiding any issues in linespec) and due to the way c-exp.y works, I think that no canonicalization is needed during symbol lookup -- only during symtab construction. This explains why lookup_name_info is not touched. The stabs reader is modified on a "best effort" basis. The DWARF reader needed one small tweak in dwarf2_name to avoid a regression in dw2-unusual-field-names.exp. I think this is adequately explained by the comment, but basically this is a scenario that should not occur in real code, only the gdb test suite. lookup_signed_typename is simplified. It used to search for two different type names, but now gdb can search just for the canonical form. gdb.dwarf2/enum-type.exp needed a small tweak, because the canonicalizer turns "unsigned integer" into "unsigned int integer". It seems better here to use the correct C type name. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105 Tested-by: Simon Marchi <simark@simark.ca> Reviewed-by: Andrew Burgess <aburgess@redhat.com>
Fixed. (Note the macro thing is a different bug.)
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=9f50fe0835850645bd8ea9bb1efe1fe6c48dfb12 commit 9f50fe0835850645bd8ea9bb1efe1fe6c48dfb12 Author: Andrew Burgess <aburgess@redhat.com> Date: Wed Dec 7 15:55:25 2022 +0000 gdb/testsuite: new test for recent dwarf reader issue This commit provides a test for this commit: commit 55fc1623f942fba10362cb199f9356d75ca5835b Date: Thu Nov 3 13:49:17 2022 -0600 Add name canonicalization for C Which resolves PR gdb/29105. My reason for writing this test was a desire to better understand the above commit, my process was to study the commit until I thought I understood it, then write a test to expose the issue. As the original commit didn't have a test, I thought it wouldn't hurt to commit this upstream. The problem tested for here is already described in the above commit, but I'll give a brief description here. This description describes GDB prior to the above commit: - Builtin types are added to GDB using their canonical name, e.g. "short", not "signed short", - When the user does something like 'p sizeof(short)', then this is handled in c-exp.y, and results in a call to lookup_signed_type for the name "int". The "int" here is actually being looked up as the type for the result of the 'sizeof' expression, - In lookup_signed_type GDB first adds a 'signed' and looks for that type, so in this case 'signed int', and, if that lookup fails, GDB then looks up 'int', - The problem is that 'signed int' is not the canonical name for a signed int, so no builtin type with that name will be found, GDB will then go to each object file in turn looking for a matching type, - When checking each object file, GDB will first check the partial symtab to see if the full symtab should be expanded or not. Remember, at this point GDB is looking for 'signed int', there will be no partial symbols with that name, so GDB will not expand anything, - However, GDB checks each partial symbol using multiple languages, not just the current language (C in this case), so, when GDB checks using the C++ language, the symbol name is first canonicalized (the code that does this can be found lookup_name_info::language_lookup_name). As the canonical form of 'signed int' is just 'int', GDB then looks for any symbols with the name 'int', most partial symtabs will contain such a symbol, so GDB ends up expanding pretty much every symtab. The above commit fixes this by avoiding the use of non-canonical names with C, now the initial builtin type lookup will succeed, and GDB never even considers whether to expand any additional symtabs. The test case creates a library that includes char, short, int, and long types, and a test program that links against the library. In the test script we start the inferior, but don't allow it to progress far enough that the debug information for the library has been fully expanded yet. Then we evaluate some 'sizeof(TYPE)' expressions. In the buggy version of GDB this would cause the debug information for the library to be fully expanded, while in the fixed version of GDB this will not be the case. We use 'info sources' to determine if the debug information has been fully expanded or not. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105