Distribution: Fedora 33 gdb --version: GNU gdb (GDB) Fedora 10.2-4.fc33 The problem was also reported to happen on GNU gdb (Gentoo 11.1 vanilla) 11.1. # Description of the issue For a few years now, we have a debugging procedure in GIMP, when a warning or critical error occurs, GIMP forks and runs gdb in the separate process to grab a backtrace (popping up in a dialog, allowing people to report such bugs) calling "thread apply all backtrace full". It has worked well so far. Recently some people (me included) started to experience very weird hanging of gdb. The forked process would not return. Even worse, even using select() with a timeout (hoping not to block the main process waiting forever for gdb), the select() call would not return either, which is very weird and bad. The report: https://gitlab.gnome.org/GNOME/gimp/-/issues/7539 # Workaround Running just "backtrace full" works fine. gdb returns, GIMP gets the trace and can continue normally. So that's my workaround so far. I am not getting the traces of all threads, just the main one. But it's a bit annoying (we do have some processing happening in threads, and even though most of the time, the bugs are in the main thread, inspecting other threads is useful). # More hints Another of GIMP developers, the one who experience this on Gentoo said that the problem happens only in gdb with python support, in their case: > When I installed gdb with -python USE flag in Gentoo, there was no hang (window with backtrace was displayed). # Backtrace I attached another gdb on the blocked gdb and this is the backtrace I got each time: Using host libthread_db library "/lib64/libthread_db.so.1". b__GI___libc_write (nbytes=20480, buf=0x55f6489c8bcf, fd=1) at ../sysdeps/unix/sysv/linux/write.c:26 26 return SYSCALL_CANCEL (write, fd, buf, nbytes); (gdb) bt full #0 __GI___libc_write (nbytes=32768, buf=0x55d460e0e06e, fd=1) at ../sysdeps/unix/sysv/linux/write.c:26 sc_ret = 12288 sc_cancel_oldtype = 0 __arg3 = <optimized out> _a2 = <optimized out> sc_ret = <optimized out> __value = <optimized out> sc_ret = <optimized out> __arg1 = <optimized out> _a3 = <optimized out> resultvar = <optimized out> __arg2 = <optimized out> _a1 = <optimized out> #1 __GI___libc_write (fd=1, buf=0x55d460e0e06e, nbytes=32768) at ../sysdeps/unix/sysv/linux/write.c:24 No locals. #2 0x00007f6a67300d0d in _IO_new_file_write (f=0x7f6a67444520 <_IO_2_1_stdout_>, data=0x55d460e0e06e, n=32768) at fileops.c:1181 count = <optimized out> to_do = 32768 #3 0x00007f6a67300066 in new_do_write (fp=fp@entry=0x7f6a67444520 <_IO_2_1_stdout_>, data=data@entry=0x55d460e0e06e "ssage_log_func>\n domain_fatal_mask = <optimized out>\n masquerade_fatal = 0\n test_level = G_LOG_LEVEL_WARNING\n was_fatal = <optimized out>\n was_recursion = <optimized"..., to_do=to_do@entry=32768) at libioP.h:948 count = <optimized out> #4 0x00007f6a6730142e in _IO_new_file_xsputn (n=36481, data=<optimized out>, f=<optimized out>) at fileops.c:1255 block_size = <optimized out> do_write = <optimized out> s = 0x55d460e0e06e "ssage_log_func>\n domain_fatal_mask = <optimized out>\n masquerade_fatal = 0\n test_level = G_LOG_LEVEL_WARNING\n was_fatal = <optimized out>\n was_recursion = <optimized"... to_do = 33235 must_flush = <optimized out> count = <optimized out> s = <optimized out> to_do = <optimized out> must_flush = <optimized out> count = <optimized out> p = <optimized out> block_size = <optimized out> --Type <RET> for more, q to quit, c to continue without paging-- do_write = <optimized out> #5 _IO_new_file_xsputn (f=0x7f6a67444520 <_IO_2_1_stdout_>, data=<optimized out>, n=36481) at fileops.c:1197 s = <optimized out> to_do = 36481 must_flush = 0 count = 0 p = <optimized out> block_size = <optimized out> do_write = <optimized out> #6 0x00007f6a672f56bb in __GI__IO_fputs ( str=0x55d460e0d3c0 "#0 __libc_read (nbytes=255, buf=0x7ffd92a1fe30, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26\n sc_ret = -512\n sc_cancel_oldtype = 0\n __arg3 = <optimized out>\n _a2 = <opti"..., fp=0x7f6a67444520 <_IO_2_1_stdout_>) at libioP.h:948 _IO_acquire_lock_file = 0x7f6a67444520 <_IO_2_1_stdout_> len = 36481 result = -1 #7 0x000055d453cdcc99 in vfprintf_styled_no_gdbfmt(ui_file*, ui_file_style const&, bool, char const*, __va_list_tag*) () No symbol table info available. #8 0x000055d453cc53a7 in ui_out::call_do_message(ui_file_style const&, char const*, ...) () No symbol table info available. #9 0x000055d453cc5aa9 in ui_out::vmessage(ui_file_style const&, char const*, __va_list_tag*) () No symbol table info available. #10 0x000055d453cdb98a in vfprintf_maybe_filtered(ui_file*, char const*, __va_list_tag*, bool, bool) () No symbol table info available. #11 0x000055d453cdbf37 in printf_filtered(char const*, ...) () No symbol table info available. #12 0x000055d453c9118b in thr_try_catch_cmd(thread_info*, char const*, int, qcs_flags const&) () No symbol table info available. #13 0x000055d453c930ab in thread_apply_all_command(char const*, int) () No symbol table info available. #14 0x000055d45395243a in cmd_func(cmd_list_element*, char const*, int) () No symbol table info available. #15 0x000055d453c97871 in execute_command(char const*, int) () No symbol table info available. #16 0x000055d453ae2aa0 in catch_command_errors(void (*)(char const*, int), char const*, int) () No symbol table info available. #17 0x000055d453ae4dd3 in captured_main_1(captured_main_args*) () No symbol table info available. #18 0x000055d453ae531f in gdb_main(captured_main_args*) () No symbol table info available. #19 0x000055d453886b60 in main () No symbol table info available. So it seems it hangs in a SYSCALL_CANCEL (write, fd, buf, nbytes); call in __libc_write(). Lastly when I killed GIMP when it hangs, most of the time, I don't have much more on stdout/stderr, but one time, this message appeared too: Exception ignored in: <module 'threading' from '/usr/lib64/python3.9/threading.py'> Traceback (most recent call last): File "/usr/lib64/python3.9/threading.py", line 1408, in _shutdown def _shutdown(): KeyboardInterrupt: Since Python and threading seem involved in this issue, I am guessing it is linked to the bug.
Actually this is a better backtrace of gdb after I added its debug data. Hopefully it's more useful: #0 __GI___libc_write (nbytes=32768, buf=0x55a40bc38f81, fd=1) at ../sysdeps/unix/sysv/linux/write.c:26 sc_ret = 12288 sc_cancel_oldtype = 0 __arg3 = <optimized out> _a2 = <optimized out> sc_ret = <optimized out> __value = <optimized out> sc_ret = <optimized out> __arg1 = <optimized out> _a3 = <optimized out> resultvar = <optimized out> __arg2 = <optimized out> _a1 = <optimized out> #1 __GI___libc_write (fd=1, buf=0x55a40bc38f81, nbytes=32768) at ../sysdeps/unix/sysv/linux/write.c:24 No locals. #2 0x00007f1050caed0d in _IO_new_file_write (f=0x7f1050df2520 <_IO_2_1_stdout_>, data=0x55a40bc38f81, n=32768) at fileops.c:1181 count = <optimized out> to_do = 32768 #3 0x00007f1050cae066 in new_do_write (fp=fp@entry=0x7f1050df2520 <_IO_2_1_stdout_>, data=data@entry=0x55a40bc38f81 "d out>\n log_func = 0x4a8dd0 <gimp_message_log_func>\n domain_fatal_mask = <optimized out>\n masquerade_fatal = 0\n test_level = G_LOG_LEVEL_WARNING\n was_fatal = <optimi"..., to_do=to_do@entry=32768) at libioP.h:948 count = <optimized out> #4 0x00007f1050caf42e in _IO_new_file_xsputn (n=36516, data=<optimized out>, f=<optimized out>) at fileops.c:1255 block_size = <optimized out> do_write = <optimized out> s = 0x55a40bc38f81 "d out>\n log_func = 0x4a8dd0 <gimp_message_log_func>\n domain_fatal_mask = <optimized out>\n masquerade_fatal = 0\n test_level = G_LOG_LEVEL_WARNING\n was_fatal = <optimi"... to_do = 33283 must_flush = <optimized out> count = <optimized out> s = <optimized out> to_do = <optimized out> must_flush = <optimized out> count = <optimized out> p = <optimized out> block_size = <optimized out> do_write = <optimized out> #5 _IO_new_file_xsputn (f=0x7f1050df2520 <_IO_2_1_stdout_>, data=<optimized out>, n=36516) at fileops.c:1197 s = <optimized out> to_do = 36516 must_flush = 0 count = 0 p = <optimized out> block_size = <optimized out> do_write = <optimized out> #6 0x00007f1050ca36bb in __GI__IO_fputs ( str=0x55a40bc382e0 "#0 __libc_read (nbytes=255, buf=0x7fffec289870, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26\n sc_ret = -512\n sc_cancel_oldtype = 0\n __arg3 = <optimized out>\n _a2 = <opti"..., fp=0x7f1050df2520 <_IO_2_1_stdout_>) at libioP.h:948 _IO_acquire_lock_file = 0x7f1050df2520 <_IO_2_1_stdout_> len = 36516 result = -1 #7 0x000055a3fcbc6c99 in vfprintf_styled_no_gdbfmt (stream=0x55a3ff4e0190, style=..., filter=<optimized out>, format=<optimized out>, args=<optimized out>) at /usr/include/c++/10/bits/basic_string.h:186 str = "#0 __libc_read (nbytes=255, buf=0x7fffec289870, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26\n sc_ret = -512\n sc_cancel_oldtype = 0\n __arg3 = <optimized out>\n _a2 = <opti"... #8 0x000055a3fcbaf3a7 in ui_out::call_do_message (this=this@entry=0x7ffcf61a09c0, style=..., format=<optimized out>) at ../../gdb/ui-out.c:597 args = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7ffcf61a0900, reg_save_area = 0x7ffcf61a0840}} #9 0x000055a3fcbafaa9 in ui_out::vmessage (this=this@entry=0x7ffcf61a09c0, in_style=..., format=<optimized out>, format@entry=0x55a3fced6ccf "%s", args=args@entry=0x7ffcf61a0a70) at ../../gdb/ui-out.c:766 current_substring = <optimized out> intvals = {0, 0} str = <optimized out> val = <optimized out> piece = @0x55a40b8d4050: {string = 0x55a40ae48171 "%s", argclass = string_arg, n_int_args = 0} __for_range = @0x7ffcf61a0950: {m_pieces = std::vector of length 1, capacity 1 = {{string = 0x55a40ae48171 "%s", argclass = string_arg, n_int_args = 0}}, m_storage = std::unique_ptr<char> = {get() = 0x55a40ae48170 ""}} Python Exception <class 'gdb.error'> value has been optimized out: __for_begin = Python Exception <class 'gdb.error'> value has been optimized out: __for_end = fpieces = {m_pieces = std::vector of length 1, capacity 1 = {{string = 0x55a40ae48171 "%s", argclass = string_arg, n_int_args = 0}}, m_storage = std::unique_ptr<char> = {get() = 0x55a40ae48170 ""}} style = {m_foreground = {m_simple = true, {m_value = -1, {m_red = 255 '\377', m_green = 255 '\377', m_blue = 255 '\377'}}}, m_background = { m_simple = true, {m_value = -1, {m_red = 255 '\377', m_green = 255 '\377', m_blue = 255 '\377'}}}, m_intensity = ui_file_style::NORMAL, m_reverse = false} __PRETTY_FUNCTION__ = "void ui_out::vmessage(const ui_file_style&, const char*, __va_list_tag*)" #10 0x000055a3fcbc598a in vfprintf_maybe_filtered(ui_file *, const char *, typedef __va_list_tag __va_list_tag *, bool, bool) (stream=0x55a3ff4e0190, format=0x55a3fced6ccf "%s", args=0x7ffcf61a0a70, filter=<optimized out>, gdbfmt=<optimized out>) at ../../gdb/utils.c:2064 flags = {m_enum_value = <optimized out>} #11 0x000055a3fcbc5f37 in vfprintf_filtered (args=0x7ffcf61a0a70, format=0x55a3fced6ccf "%s", stream=<optimized out>) at ../../gdb/utils.c:2077 No locals. #12 printf_filtered (format=format@entry=0x55a3fced6ccf "%s") at ../../gdb/utils.c:2207 args = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7ffcf61a0b50, reg_save_area = 0x7ffcf61a0a90}} #13 0x000055a3fcb7b18b in thr_try_catch_cmd (thr=0x7ffcf61a0b60, cmd=0x7ffcf61a30a5 "backtrace full", from_tty=0, flags=...) at /usr/include/c++/10/bits/basic_string.h:186 cmd_result = "#0 __libc_read (nbytes=255, buf=0x7fffec289870, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26\n sc_ret = -512\n sc_cancel_oldtype = 0\n __arg3 = <optimized out>\n _a2 = <opti"... thr_header = "\nThread 1 (Thread 0x7f8cf6deef00 (LWP 152870) \"gimp-2.99\"):\n" #14 0x000055a3fcb7d0ab in thread_apply_all_command (cmd=<optimized out>, from_tty=0) at ../../gdb/thread.c:1649 thr = 0x55a3ff4e5640 __for_range = std::vector of length 21, capacity 21 = {0x55a3ff4ec0b0, 0x55a3ff4ebd70, 0x55a3ff4eba30, 0x55a3ff4eb6f0, 0x55a3ff4eb3b0, 0x55a3ff4eb070, 0x55a3ff4ead30, 0x55a3ff4e95c0, 0x55a3ff4e9280, 0x55a3ff4e8f40, 0x55a3ff4e8c00, 0x55a3ff4e88c0, 0x55a3ff4e8580, 0x55a3ff4e8240, 0x55a3ff4e7f00, 0x55a3ff4e7bc0, 0x55a3ff4e7350, 0x55a3ff4e7010, 0x55a3ff4e6cb0, 0x55a3ff4e5980, 0x55a3ff4e5640} Python Exception <class 'gdb.error'> value has been optimized out: __for_begin = Python Exception <class 'gdb.error'> value has been optimized out: __for_end = thr_list_cpy = std::vector of length 21, capacity 21 = {0x55a3ff4ec0b0, 0x55a3ff4ebd70, 0x55a3ff4eba30, 0x55a3ff4eb6f0, 0x55a3ff4eb3b0, 0x55a3ff4eb070, 0x55a3ff4ead30, 0x55a3ff4e95c0, 0x55a3ff4e9280, 0x55a3ff4e8f40, 0x55a3ff4e8c00, 0x55a3ff4e88c0, 0x55a3ff4e8580, 0x55a3ff4e8240, 0x55a3ff4e7f00, 0x55a3ff4e7bc0, 0x55a3ff4e7350, 0x55a3ff4e7010, 0x55a3ff4e6cb0, 0x55a3ff4e5980, 0x55a3ff4e5640} inc_dec_ref = {m_thrds = <optimized out>} sorter = <optimized out> restore_thread = {m_dont_restore = false, m_thread = {m_obj = 0x55a3ff4e5640}, m_inf = {m_obj = 0x55a3ff4a49d0}, m_selected_frame_id = { stack_addr = 0, code_addr = 0, special_addr = 0, stack_status = FID_STACK_INVALID, code_addr_p = 0, special_addr_p = 0, artificial_depth = 0}, m_selected_frame_level = -1, m_was_stopped = true, m_lang = language_c} ascending = false flags = {quiet = false, cont = false, silent = false} group = {_M_elems = {{options = {m_array = 0x55a3fd225700 <ascending_option_def>, m_size = 1}, ctx = 0x7ffcf61a0c24}, {options = { m_array = 0x55a3fd225600 <thr_qcs_flags_option_defs>, m_size = 3}, ctx = 0x7ffcf61a0c25}}} tc = <optimized out> __PRETTY_FUNCTION__ = "void thread_apply_all_command(const char*, int)" #15 0x000055a3fc83c43a in cmd_func (cmd=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at ../../gdb/cli/cli-decode.c:2181 restore_suppress = {{m_dummy = {<No data fields>}, m_item = {<scoped_restore_base> = {m_saved_var = <optimized out>}, m_saved_value = <optimized out>}, dont_use = <optimized out>}, m_instantiated = <optimized out>} #16 0x000055a3fcb81871 in execute_command (p=<optimized out>, p@entry=<error reading variable: value has been optimized out>, from_tty=0, from_tty@entry=<error reading variable: value has been optimized out>) at ../../gdb/top.c:668 default_args_and_arg = "" without_whitespace = "" cmd = 0x7ffcf61a30a5 "backtrace full" arg = 0x7ffcf61a30a5 "backtrace full" default_args = "" was_sync = 0 save_repeat_args = <optimized out> args_pointer = 0x7ffcf61a30a5 "backtrace full" c = 0x55a3ff485060 line = 0x7ffcf61a3094 "thread apply all backtrace full" cmd_start = 0x7ffcf61a3094 "thread apply all backtrace full" cleanup_if_error = {<scope_exit_base<scope_exit<void (*)()> >> = {m_released = false}, m_exit_function = <optimized out>} cleanup = {m_value = 0x55a40095b100} __PRETTY_FUNCTION__ = "void execute_command(const char*, int)" #17 0x000055a3fc9ccaa0 in catch_command_errors (command=<optimized out>, arg=<optimized out>, from_tty=<optimized out>) at ../../gdb/main.c:457 was_sync = 0 #18 0x000055a3fc9cedd3 in captured_main_1 (context=0x7ffcf61a0ff0) at ../../gdb/main.c:1218 cmdarg_p = <optimized out> argc = <optimized out> argv = <optimized out> quiet = 1 set_args = 0 inhibit_home_gdbinit = 0 symarg = 0x0 execarg = 0x0 pidarg = 0x7ffcf61a30b7 "152870" corearg = 0x0 pid_or_core_arg = 0x0 cdarg = 0x0 ttyarg = 0x0 print_help = 0 print_version = 0 print_configuration = 0 cmdarg_vec = std::vector of length 2, capacity 2 = {{type = CMDARG_COMMAND, string = 0x7ffcf61a3083 "info threads"}, {type = CMDARG_COMMAND, string = 0x7ffcf61a3094 "thread apply all backtrace full"}} dirarg = std::vector of length 0, capacity 0 i = 1 save_auto_load = <optimized out> ret = <optimized out> stat_reporter = {m_msg_type = false, m_time_enabled = -1, m_space_enabled = -1, m_symtab_enabled = -1, m_start_cpu_time = {__d = {__r = 33491}}, m_start_wall_time = {__d = {__r = 117800711570014}}, m_start_space = 0, m_start_nr_symtabs = 0, m_start_nr_compunit_symtabs = 0, m_start_nr_blocks = 0} tmp_warn_preprint = std::unique_ptr<char> = {get() = 0x0} signal_stack = {m_stack = std::unique_ptr<char> = {get() = 0x55a3ff1ba4d0 "\002"}, m_old_stack = {ss_sp = 0x0, ss_flags = 2, ss_size = 0}} system_gdbinit = std::vector of length 1, capacity 1 = {"/etc/gdbinit"} home_gdbinit = "" local_gdbinit = "" OPT_SE = OPT_SE OPT_CD = OPT_CD OPT_ANNOTATE = OPT_ANNOTATE OPT_STATISTICS = OPT_STATISTICS OPT_TUI = OPT_TUI OPT_NOWINDOWS = OPT_NOWINDOWS OPT_WINDOWS = OPT_WINDOWS OPT_IX = OPT_IX OPT_IEX = OPT_IEX OPT_READNOW = OPT_READNOW OPT_READNEVER = OPT_READNEVER #19 0x000055a3fc9cf31f in captured_main (data=<optimized out>) at ../../gdb/main.c:1243 context = <optimized out> context = <optimized out> ex = <optimized out> #20 gdb_main (args=<optimized out>) at ../../gdb/main.c:1268 No locals. #21 0x000055a3fc770b60 in main (argc=<optimized out>, argv=<optimized out>) at ../../gdb/gdb.c:40 args = {argc = 8, argv = 0x7ffcf61a1288, interpreter_p = 0x55a3fcd7f004 "console"}
Since GDB is blocking on a write, my first question would be: is the thing on the other side of GDB's stdout consuming the data? Because at first glance it looks like this would happen if the stdout pipe buffer was full.
> Since GDB is blocking on a write, my first question would be: is the thing on the other side of GDB's stdout consuming the data? Yes, unless there is a bug somewhere. See here: https://gitlab.gnome.org/GNOME/gimp/-/blob/b0547a72c6b6fae3630ed089a9090ff43dc94af2/libgimpbase/gimputils.c#L1354 We have on one side a process calling gdb, on the other side a process waiting on the other side of the pipe to read. The writing side seems to be blocked on the execvp() calling gdb, whereas the reading side is blocked on the read(). Note that I tried to protect the read() with a select() or poll() first, even with a timeout. But somehow these get blocked too (as though there was no timeout, the calls just hang).
(In reply to Jehan from comment #3) > > Since GDB is blocking on a write, my first question would be: is the thing on the other side of GDB's stdout consuming the data? > > Yes, unless there is a bug somewhere. > > See here: > https://gitlab.gnome.org/GNOME/gimp/-/blob/ > b0547a72c6b6fae3630ed089a9090ff43dc94af2/libgimpbase/gimputils.c#L1354 > > We have on one side a process calling gdb, on the other side a process > waiting on the other side of the pipe to read. Sounds good. > The writing side seems to be blocked on the execvp() calling gdb, whereas > the reading side is blocked on the read(). I don't understand what you mean here, you can't block on an execvp call. > Note that I tried to protect the read() with a select() or poll() first, > even with a timeout. But somehow these get blocked too (as though there was > no timeout, the calls just hang). As a sanity check, once things are stuck, can you validate (by checking /proc/$pid/fd) that the fd GDB is blocked writing on points to the same thing as the fd that your consumer process is blocked reading on?
> I don't understand what you mean here, you can't block on an execvp call. Sorry, my wording was wrong. I just meant that excvp() succeeds (i.e. never returns), runs gdb as it should, but simply the new gdb process doesn't end. I can see the gdb process with a `ps` for instance. So the reading end awaits forever too (even with a poll() or a select() with timeout; something just locks everything). > As a sanity check, once things are stuck, can you validate (by checking /proc/$pid/fd) that the fd GDB is blocked writing on points to the same thing as the fd that your consumer process is blocked reading on? $ ps -ef |grep 'gimp\|gdb' jehan 32689 23033 12 18:13 pts/0 00:00:02 app/gimp-2.99 jehan 32879 32689 21 18:13 pts/0 00:00:02 gdb -batch -ex info threads -ex thread apply all backtrace full -p 32689 [… and others …] $ ls /proc/32689/fd/17 -ltr lr-x------. 1 jehan jehan 64 Dec 4 18:14 /proc/32689/fd/17 -> 'pipe:[195141]' (I did check with some stupid printf that out_fd[0] is 17, hence that's the fd the main process is blocking on) $ ls /proc/32879/fd/1 -ltr l-wx------. 1 jehan jehan 64 Dec 4 18:13 /proc/32879/fd/1 -> 'pipe:[195141]' (where gdb is indeed outputting)