Bug 16405 - backtrace takes GBs and minutes with dwz -m
Summary: backtrace takes GBs and minutes with dwz -m
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: symtab (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Jan Kratochvil
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-07 18:16 UTC by Jan Kratochvil
Modified: 2015-01-25 15:30 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-unknown-linux-gnu
Build:
Last reconfirmed:


Attachments
draft patch (2.66 KB, patch)
2014-01-10 20:49 UTC, Jan Kratochvil
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Kratochvil 2014-01-07 18:16:56 UTC
Attempt to backtrace LibreOffice on Fedora 20 x86_64 locks up, each frame gets up to minutes to print, GDB RSS grows to 1GB and more.

The queue in process_queue() in the backtrace below (randomly stopped GDB) had 166 symtab entries.

It is probably a regression since call sites exist; unaware why it did not reproduce before.  But expansion of a single symtab should not cause 166 symtabs to be expanded.

FAIL: GNU gdb (GDB) 7.6.50.20140107-cvs
FAIL: GNU gdb (GDB) 7.6.1.20131021-cvs
FAIL: gdb-7.6.50.20130731-16.fc20.x86_64
N/A:  GNU gdb (GDB) 7.5.1.20130125-cvs - it cannot parse Fedora 20 DWARF


LibreOffice backtrace:
(gdb) bt
Python Exception <type 'exceptions.ImportError'> No module named gdb.frames: 
#0  0x0000003eaaceaa8d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000003ead4495b4 in g_main_context_poll (priority=2147483647, n_fds=3, fds=0x14da140, timeout=9915, 
    context=0xa4db60) at gmain.c:4007
#2  g_main_context_iterate (context=context@entry=0xa4db60, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<optimized out>) at gmain.c:3708
<locked up>


GDB backtrace:
#0  process_queue
#1  in dw2_do_instantiate_symtab
#2  in dw2_instantiate_symtab
#3  in dw2_find_pc_sect_symtab
#4  in find_pc_sect_symtab
#5  in find_pc_symtab
#6  in call_site_for_pc
#7  in call_site_find_chain_1
#8  in call_site_find_chain
#9  in dwarf2_tailcall_sniffer_first
#10 in dwarf2_frame_prev_register
#11 in frame_unwind_register_value
#12 in frame_register_unwind
#13 in frame_register_unwind_location
#14 in get_prev_frame_1
#15 in get_prev_frame
#16 in value_of_dwarf_reg_entry
#17 in value_of_dwarf_block_entry
#18 in loclist_read_variable_at_entry
#19 in read_frame_arg
#20 in print_frame_args
#21 in print_frame
#22 in print_frame_info
#23 in backtrace_command_1
#24 in backtrace_command
#25 in do_cfunc
#26 in cmd_func
#27 in execute_command
#28 in command_handler
#29 in command_line_handler
#30 in rl_callback_read_char
#31 in rl_callback_read_char_wrapper
#32 in stdin_event_handler
#33 in handle_file_event
#34 in process_event
#35 in gdb_do_one_event
#36 in start_event_loop
#37 in cli_command_loop
#38 in current_interp_command_loop
#39 in captured_command_loop
#40 in catch_errors
#41 in captured_main
#42 in catch_errors
#43 in gdb_main
#44 in main
Comment 1 Jan Kratochvil 2014-01-07 20:30:46 UTC
call-sites are really innocent, it is DWARF expansion problem in general.

I have patched out call-sites in GDB code and the problem remains.


#0  in start_subfile
#1  in dwarf2_start_subfile
#2  in dwarf_decode_lines 
#3  in handle_DW_AT_stmt_list
#4  in read_file_scope
#5  in process_die
#6  in process_full_comp_unit
#7  in process_queue
#8  in dw2_do_instantiate_symtab
#9  in dw2_instantiate_symtab
#10 in dw2_find_pc_sect_symtab
#11 in find_pc_sect_symtab
#12 in blockvector_for_pc_sect
#13 in block_for_pc_sect
#14 in block_for_pc
#15 in inline_frame_sniffer
#16 in frame_unwind_find_by_frame
#17 in compute_frame_id
#18 in get_prev_frame_if_no_cycle
#19 in get_prev_frame_1
#20 in get_prev_frame
#21 in value_of_dwarf_reg_entry
#22 in value_of_dwarf_block_entry
#23 in loclist_read_variable_at_entry
#24 in read_frame_arg
#25 in print_frame_args
#26 in print_frame
#27 in print_frame_info
#28 in backtrace_command_1
#29 in backtrace_command
Comment 2 Jan Kratochvil 2014-01-10 20:49:18 UTC
Created attachment 7347 [details]
draft patch

This patch:
bt
Command execution time: 0.472000 (cpu), 0.539249 (wall)
Space used: 380514304 (+44015616 for this command)
#symtabs: 30964 (+2457), #primary symtabs: 18 (+7), #blocks: 7877 (+7852)
+
thread apply all bt
Command execution time: 0.496000 (cpu), 0.500745 (wall)
Space used: 408891392 (+28377088 for this command)
#symtabs: 32294 (+1330), #primary symtabs: 31 (+13), #blocks: 16467 (+8590)

FSF GDB HEAD:
bt
Command execution time: 153.667000 (cpu), 154.234038 (wall)
Space used: 1283444736 (+946941952 for this command)
#symtabs: 4373195 (+4344688), #primary symtabs: 2376 (+2365), #blocks: 12593 (+12568)
+
thread apply all bt
Command execution time: 104.489000 (cpu), 104.961148 (wall)
Space used: 1768660992 (+485216256 for this command)
#symtabs: 6662638 (+2289443), #primary symtabs: 3673 (+1297), #blocks: 23751 (+11158)

But
  thread apply all bt full
is still slow.  One may omit DW_TAG_imported_unit even in this case thanks to .gdb_index (if it points to the right CU, maybe it is not).  This still affects for example ABRT.

gdb -p `pidof soffice.bin` -ex 'set pagination off' -ex 'maintenance set per-command space' -ex 'maintenance set per-command symtab' -ex 'maintenance set per-command time'
Comment 3 Jan Kratochvil 2014-10-01 19:49:32 UTC
[patch] Fix 100x slowdown regression on DWZ files
https://sourceware.org/ml/gdb-patches/2014-10/msg00020.html
Comment 4 Jan Kratochvil 2015-01-24 15:03:10 UTC
527f3840e1af8bc2e3173922ddae15d0021ed9b1 master
0325b9bae8037aaa69cdfc8aa43a46678474f261 7.9

time gdb -batch -q -p `pidof soffice.bin` -ex 'thread apply all bt full'|wc -l
174
real	0m6.805s

time gdb -batch -q -p `pidof soffice.bin` -ex 'thread apply all bt'|wc -l
61
real	0m3.635s
Comment 5 Jan Kratochvil 2015-01-25 15:30:34 UTC
The Comment 4 benchmark is BTW bogus as it did not have expanded CUs.