Bug 13498 - save-index makes gdb slow with dwarf4
Summary: save-index makes gdb slow with dwarf4
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 minor
Target Milestone: ---
Assignee: dje
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-14 17:21 UTC by Luboš Luňák
Modified: 2012-07-10 20:28 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
WIP patch (746 bytes, patch)
2012-02-17 21:51 UTC, Tom Tromey
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Luboš Luňák 2011-12-14 17:21:01 UTC
When using the gdb-add-index script in order to modify binaries for faster load in gdb, some operations such as tab completion of function names are noticeably slower if the binaries have been built with -gdwarf-4. I experience the problem on openSUSE 11.4 with a git LibreOffice build, using gdb-weekly-7.3.50.20111213.tar.bz2 snapshot built from source.

Steps to reproduce:
1) Build LibreOffice with -gdwarf-4 (http://wiki.documentfoundation.org/Development/Native_Build , --enable-debug passed to autogen/configure, current git uses -gdwarf-4 by default if supported, otherwise can be passed manually by invoking make as 'CFLAGS="-gdwarf-4" CXXFLAGS="-gdwarf-4" make')
2) Run LibreOffice, open an empty Writer document, type in something and save as a .docx document.
3) Attach gdb, type 'break DocxA<tab>', watch the time it took to complete the name to DocxAttributeOutput. Try repeatedly (second time it's faster).
4) Modify LibreOffice binaries using the gdb-add-index script (for dev-install build they are in solver/*/lib/, for normal install they are in basis3.4/program/).
5) Repeat steps 2) and 3), watch the increased time needed to complete the function name (second completion takes at most 1 second without the saved index and more than 5 seconds with it, and this is a fast machine).

The problem presumably exists in other scenarios and other codebases, assuming they are large enough to make the slowness noticeable.

Further information:
- using dwarf4 instead of dwarf2 saves about 30% size of LibreOffice's binaries when built with debug info
- the problem does not seem to exist when building with -gdwarf-2
- the debug flag passed to gcc/g++ is -ggdb2 (besides the -gdwarf-2/4)
- the .gdb_index sections in the binaries appear to be loaded both with dwarf2 and dwarf4 by gdb, the load time is faster and memory usage is lower
- when examining /proc/<pid>/status of gdb, in the case of .gdb_index and dwarf4, VmHWM and VmData are almost the same, while in the dwarf2 case the VmHWM is about 20% larger than VmData (perhaps in the dwarf2 case gdb generates additional internal data that is not generated in the dwarf4 case and that causes this?)
Comment 1 Luboš Luňák 2011-12-14 23:27:48 UTC
Ok, I think I've figured out where the problem is.

When completing, location_completer() calls also make_source_files_completion_list(), which calls map_partial_symbol_filenames(). That one is where most of the time during the completion is spent - it calls the proper map_symbol_filenames function depending on the debugging format - for .so's without an index, it's map_symbol_filenames_psymtab(), for indexed files, it is dw2_map_symbol_filenames().

The problem is, with dwarf4-based index, dw2_map_symbol_filenames() iterates over a huge number of filenames, magnitudes more than with dwarf2-based index or psymtabs (in my case, it's 35M filenames with dwarf4 index, 380k with dwarf4 without index, 720k with dwarf2 index, 220k with dwarf2 without index). All the time is spent just iterating over them. And the number is so high because there are many duplicates, as high as 175k duplicates for the most common header files.

I don't know DWARF or gdb, so I'm not sure what the proper fix should be. Possibly the save-index functionality should try to eliminate these duplicates.
Comment 2 Luboš Luňák 2011-12-15 15:54:43 UTC
It takes so long because dw2_map_symbol_filenames() loops over both dwarf2_per_objfile::all_comp_units and dwarf2_per_objfile::type_comp_units, where e.g. LibreOffice's libswlo.so has ~700 of the first and ~16000 of the second. It appears unnecessary to iterate over type_comp_units, as all_comp_units should already include all the files encountered during compilation.

But it turns out gdb performs poorly with DWARF4 in general. In this case dw2_map_symbol_filenames() could be fixed, assuming what I wrote above is correct, but there appear to be other inefficiencies (related to .debug_types too?). E.g. 'break shared<tab>' (trying to complete a filename) is very fast with DWARF2, but takes very long with DWARF4, with or without the gdb index, and it also noticeably increases memory usage.
Comment 3 Tom Tromey 2012-02-17 21:48:26 UTC
Thanks for your analysis.  It was very helpful.

The LibreOffice build crushed my machine, I'm sure you've heard this
before :), but I was able to reproduce the problem to my satisfaction
just with libooxlo.so.  So far I've only built with DWARF-4 and .debug_types
(if you are using a recent Fedora you must also specify -fdebug-types-section).
I can definitely see a slowdown when moving from the non-indexed to the indexed
library (times from "maint time 1", which irritatingly does not work with
-ex options):

No index
(gdb) complete break int_CRYPTO_set_do_dynlock_call
break int_CRYPTO_set_do_dynlock_callback
Command execution time: 0.118982 (cpu), 0.120245 (wall)
(gdb) complete break int_CRYPTO_set_do_dynlock_call
break int_CRYPTO_set_do_dynlock_callback
Command execution time: 0.123981 (cpu), 0.125739 (wall)

Index
(gdb) complete break int_CRYPTO_set_do_dynlock_call
break int_CRYPTO_set_do_dynlock_callback
Command execution time: 4.362337 (cpu), 4.382212 (wall)
(gdb) complete break int_CRYPTO_set_do_dynlock_call
break int_CRYPTO_set_do_dynlock_callback
Command execution time: 3.187515 (cpu), 3.201839 (wall)


With a reasonably simple patch I can get this down to:

(gdb) complete break int_CRYPTO_set_do_dynlock_call
break int_CRYPTO_set_do_dynlock_callback
Command execution time: 1.255809 (cpu), 1.260110 (wall)
(gdb) complete break int_CRYPTO_set_do_dynlock_call
break int_CRYPTO_set_do_dynlock_callback
Command execution time: 0.111983 (cpu), 0.112815 (wall)

... the first completion is still quite slow, but later ones
are fast.

Further investigation and fixes will have to wait until next week.
I'll attach my WIP patch momentarily.
Comment 4 Tom Tromey 2012-02-17 21:51:01 UTC
Created attachment 6226 [details]
WIP patch
Comment 5 Luboš Luňák 2012-02-20 13:48:42 UTC
I'd like to point out again the second paragraph in comment #2. It's been a while since I did this analysis, so I don't remember all the details anymore and getting into it again would take some time, therefore this is a bit of guesswork and trying to recall, but as far as I remember:

I think I concluded that this is not a localized problem in dw2_map_symbol_filenames(), but a design problem with the gdb internal structures. I had a patch for dw2_map_symbol_filenames() that would have worked sufficiently for me for that one specific problem, but I went back to dwarf-2 anyway, and IIRC I decided that larger changes were needed that I wasn't feeling up to.

Specifically, I think I found the bottleneck to be dwarf2_per_objfile->n_type_units being a large number in general that slows down more operations than just the one specific example given. As I understand it dwarf-2 had this number quite low, so it wasn't a problem, but dwarf-4 by design causes the number to be much higher.

Again, this is just trying to recall things from 2 months ago with my limited gdb knowledge, so you should verify this yourself, but I suggest first checking gdb's performance in general when dealing with large dwarf-4-built .so files. Maybe it'll be necessary to change all accesses to the CU-list (or whatever it's called exactly) to be more efficient (compact it into less items, hashtable for access, whatever).

Also, for the save-index case, since gdb creates the index on its own anyway, it could possibly make sense to prepare it better for the completion case. As I understand it, gdb always first collects all items for completion and only afterwards it selects ones eligible for completion, so maybe the index could also contain a sorted list of (references to) all completion items in the matching .so, which presumably would take only relatively little space and would make completion almost instant.
Comment 6 Tom Tromey 2012-02-20 19:51:01 UTC
(In reply to comment #5)

> I think I concluded that this is not a localized problem in
> dw2_map_symbol_filenames(), but a design problem with the gdb internal
> structures. I had a patch for dw2_map_symbol_filenames() that would have worked
> sufficiently for me for that one specific problem, but I went back to dwarf-2
> anyway, and IIRC I decided that larger changes were needed that I wasn't
> feeling up to.

What I did today is look at all the places where we iterate over all
CUs and TUs when using the index.  Then I went backward from here to
find the callers and thus construct test cases that may exhibit bad
behavior.

Aside from map_symbol_filenames, which I've already fixed locally,
there were only 2 such methods that could plausibly cause problems:
map_symtabs_matching_filenames and expand_symtabs_matching.

The former had a very bad performance bug in CVS HEAD:

http://sourceware.org/ml/gdb-patches/2012-02/msg00409.html

However, this is pretty new and I don't think it would have shown up
in your tests :-(

The test case for this one, with libooxlo.so is just:

break numberformatsbuffer.cxx:100


In my testing, the latter is slower when type units are present, but
not drastically so.  The index speeds things up in both cases:

     Normal  Index
    +-------+-------+
DW3 |0:07.61|0:01.76|
    +-------+-------+
DW4 |0:08.80|0:03.45|
    +-------+-------+

I plan to look into this one a bit more.

The test case here is:

info function CRYPTO

> Specifically, I think I found the bottleneck to be
> dwarf2_per_objfile->n_type_units being a large number in general that slows
> down more operations than just the one specific example given. As I understand
> it dwarf-2 had this number quite low, so it wasn't a problem, but dwarf-4 by
> design causes the number to be much higher.

Yes, in DWARF < 4, there are no type units.

BTW you might wish to use -gdwarf-4 -fno-debug-types-section.
This gets you some DWARF 4 benefits without type units; it should be
a pure improvement over DWARF[23].


I'm definitely interested if you know of other specific things that are
too slow.
Comment 7 Tom Tromey 2012-02-20 19:54:36 UTC
Ok, I found one more slowdown.  The test case:

rbreak numberformatsbuffer.cxx:zardoz

Is notably slower for DWARF 4 + index.
I think this is due to the same problem as affected map_symbol_filenames.
Comment 8 cvs-commit@gcc.gnu.org 2012-02-20 20:56:16 UTC
CVSROOT:	/cvs/src
Module name:	src
Changes by:	tromey@sourceware.org	2012-02-20 20:56:13

Modified files:
	gdb            : ChangeLog dwarf2read.c 

Log message:
	PR gdb/13498:
	* dwarf2read.c (dw2_expand_symtabs_matching): Only visit a
	particular set of file names once.
	(dw2_map_symbol_filenames): Likewise.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/ChangeLog.diff?cvsroot=src&r1=1.13847&r2=1.13848
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/dwarf2read.c.diff?cvsroot=src&r1=1.613&r2=1.614
Comment 9 dje 2012-02-21 06:31:59 UTC
What's there now re. TUs could stand for a bit of reorg.
'tis on my long symtab wishlist.
Comment 10 Luboš Luňák 2012-02-21 18:14:25 UTC
I've tried current gdb CVS with the patch mentioned here, and it looks like the problems caused by the index have been fixed by it, so I guess this bugreport can be closed.

Attaching gdb to LibreOffice Writer and doing 'break sha<tab>' is still awfully slow with dwarf-4 though, roughly 30 seconds on a rather fast machine (it appears the time is spent in dwarf2_psymtab_to_symtab() and below). That is however with or without the index. With dwarf-2 it is almost instant, but the difference is that with dwarf-2 the completion list does not include template instances, which is a majority of the completions. I find this still unacceptable for dwarf-4 to be usable with something the size LibreOffice.
Comment 11 cvs-commit@gcc.gnu.org 2012-07-10 20:28:38 UTC
CVSROOT:	/cvs/src
Module name:	src
Changes by:	devans@sourceware.org	2012-07-10 20:28:34

Modified files:
	gdb            : ChangeLog dwarf2read.c 

Log message:
	PR gdb/13498
	* dwarf2read.c (dwarf2_per_objfile): New members n_type_unit_groups,
	all_type_unit_groups, type_unit_groups, tu_stats.
	(dwarf2_per_cu_data): Move "imported_symtabs" into new union "s".
	All uses updated.  Add type_unit_group to union "s".
	(type_unit_group): New struct.
	(IS_TYPE_UNIT_GROUP): New macro.
	(abbrev_table): Delete unused member "section".
	(dw2_do_instantiate_symtab): Early exit if type_unit_group.
	(dw2_get_cu): Assert not used with type_unit_group.
	(dw2_get_primary_cu): New function.
	(dw2_build_type_unit_groups_reader): New function.
	(dw2_build_type_unit_groups): New function.
	(dw2_get_file_names): Assert not called on type units.
	(dw2_map_symtabs_matching_filename): Call dw2_build_type_unit_groups.
	Redo loop to iterate over type unit groups instead of type units.
	(dw2_expand_symtabs_matching, dw2_map_symbol_filenames): Ditto.
	(read_abbrev_offset): New function.
	(init_cutu_and_read_dies): New arg "abbrev_table".  All callers
	updated.
	(create_partial_symtab): New function.
	(process_psymtab_comp_unit_reader): Assert not used with type units.
	Call create_partial_symtab.
	(process_psymtab_type_unit): Delete.
	(hash_type_unit_group, eq_type_unit_group): New functions.
	(allocate_type_unit_groups_table): New function.
	(NO_STMT_LIST_TYPE_UNIT_PSYMTAB): New macro.
	(NO_STMT_LIST_TYPE_UNIT_PSYMTAB_SIZE): New macro.
	(create_type_unit_group, get_type_unit_group): New functions.
	(tu_abbrev_offset): New struct.
	(sort_tu_by_abbrev_offset): New function.
	(add_type_unit_group_to_table): New function.
	(build_type_unit_groups): New function.
	(build_type_psymtabs_reader): New function.
	(build_type_psymtab_dependencies): New function.
	(build_type_psymtabs): Rewrite.
	(scan_partial_symbols): Flag an error if a DW_TAG_imported_unit
	is seen in a type unit.
	(process_queue): Move symtab expansion debugging printfs here.
	Call process_full_type_unit for type units.
	(compute_symtab_includes): Assert not called for type units.
	(process_cu_includes): Don't call compute_symtab_includes for
	type units.
	(process_full_type_unit): New function.
	(process_imported_unit_die): Flag an error if called for type units.
	(handle_DW_AT_stmt_list): Delete arg "want_line_info".  All callers
	updated.  Assert not called for type units.
	(read_file_scope): Call dwarf2_start_symtab.
	(setup_type_unit_groups): New function.
	(read_type_unit_scope): Rewrite.
	(abbrev_table_read_table): Initialize abbrev_table->offset.
	(abbrev_table_free_cleanup): New function.
	(dwarf2_start_symtab): New function.
	(load_full_type_unit): Assert not called for type unit groups.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/ChangeLog.diff?cvsroot=src&r1=1.14457&r2=1.14458
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/dwarf2read.c.diff?cvsroot=src&r1=1.683&r2=1.684