Bug 28640 - gdb hangs when running "thread apply all backtrace full"
Summary: gdb hangs when running "thread apply all backtrace full"
Status: UNCONFIRMED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 10.1
: P2 critical
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-01 19:03 UTC by Jehan
Modified: 2021-12-04 17:18 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jehan 2021-12-01 19:03:49 UTC
Distribution: Fedora 33
gdb --version: GNU gdb (GDB) Fedora 10.2-4.fc33

The problem was also reported to happen on GNU gdb (Gentoo 11.1 vanilla) 11.1.

# Description of the issue

For a few years now, we have a debugging procedure in GIMP, when a warning or critical error occurs, GIMP forks and runs gdb in the separate process to grab a backtrace (popping up in a dialog, allowing people to report such bugs) calling "thread apply all backtrace full". It has worked well so far.

Recently some people (me included) started to experience very weird hanging of gdb. The forked process would not return. Even worse, even using select() with a timeout (hoping not to block the main process waiting forever for gdb), the select() call would not return either, which is very weird and bad.

The report: https://gitlab.gnome.org/GNOME/gimp/-/issues/7539

# Workaround

Running just "backtrace full" works fine. gdb returns, GIMP gets the trace and can continue normally. So that's my workaround so far. I am not getting the traces of all threads, just the main one. But it's a bit annoying (we do have some processing happening in threads, and even though most of the time, the bugs are in the main thread, inspecting other threads is useful).

# More hints

Another of GIMP developers, the one who experience this on Gentoo said that the problem happens only in gdb with python support, in their case:

> When I installed gdb with -python USE flag in Gentoo, there was no hang (window with backtrace was displayed).

# Backtrace

I attached another gdb on the blocked gdb and this is the backtrace I got each time:

Using host libthread_db library "/lib64/libthread_db.so.1".
b__GI___libc_write (nbytes=20480, buf=0x55f6489c8bcf, fd=1) at ../sysdeps/unix/sysv/linux/write.c:26
26	  return SYSCALL_CANCEL (write, fd, buf, nbytes);
(gdb) bt full
#0  __GI___libc_write (nbytes=32768, buf=0x55d460e0e06e, fd=1) at ../sysdeps/unix/sysv/linux/write.c:26
        sc_ret = 12288
        sc_cancel_oldtype = 0
        __arg3 = <optimized out>
        _a2 = <optimized out>
        sc_ret = <optimized out>
        __value = <optimized out>
        sc_ret = <optimized out>
        __arg1 = <optimized out>
        _a3 = <optimized out>
        resultvar = <optimized out>
        __arg2 = <optimized out>
        _a1 = <optimized out>
#1  __GI___libc_write (fd=1, buf=0x55d460e0e06e, nbytes=32768) at ../sysdeps/unix/sysv/linux/write.c:24
No locals.
#2  0x00007f6a67300d0d in _IO_new_file_write (f=0x7f6a67444520 <_IO_2_1_stdout_>, data=0x55d460e0e06e, n=32768) at fileops.c:1181
        count = <optimized out>
        to_do = 32768
#3  0x00007f6a67300066 in new_do_write (fp=fp@entry=0x7f6a67444520 <_IO_2_1_stdout_>, 
    data=data@entry=0x55d460e0e06e "ssage_log_func>\n        domain_fatal_mask = <optimized out>\n        masquerade_fatal = 0\n        test_level = G_LOG_LEVEL_WARNING\n        was_fatal = <optimized out>\n        was_recursion = <optimized"..., to_do=to_do@entry=32768) at libioP.h:948
        count = <optimized out>
#4  0x00007f6a6730142e in _IO_new_file_xsputn (n=36481, data=<optimized out>, f=<optimized out>) at fileops.c:1255
        block_size = <optimized out>
        do_write = <optimized out>
        s = 0x55d460e0e06e "ssage_log_func>\n        domain_fatal_mask = <optimized out>\n        masquerade_fatal = 0\n        test_level = G_LOG_LEVEL_WARNING\n        was_fatal = <optimized out>\n        was_recursion = <optimized"...
        to_do = 33235
        must_flush = <optimized out>
        count = <optimized out>
        s = <optimized out>
        to_do = <optimized out>
        must_flush = <optimized out>
        count = <optimized out>
        p = <optimized out>
        block_size = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        do_write = <optimized out>
#5  _IO_new_file_xsputn (f=0x7f6a67444520 <_IO_2_1_stdout_>, data=<optimized out>, n=36481) at fileops.c:1197
        s = <optimized out>
        to_do = 36481
        must_flush = 0
        count = 0
        p = <optimized out>
        block_size = <optimized out>
        do_write = <optimized out>
#6  0x00007f6a672f56bb in __GI__IO_fputs (
    str=0x55d460e0d3c0 "#0  __libc_read (nbytes=255, buf=0x7ffd92a1fe30, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26\n        sc_ret = -512\n        sc_cancel_oldtype = 0\n        __arg3 = <optimized out>\n        _a2 = <opti"..., fp=0x7f6a67444520 <_IO_2_1_stdout_>) at libioP.h:948
        _IO_acquire_lock_file = 0x7f6a67444520 <_IO_2_1_stdout_>
        len = 36481
        result = -1
#7  0x000055d453cdcc99 in vfprintf_styled_no_gdbfmt(ui_file*, ui_file_style const&, bool, char const*, __va_list_tag*) ()
No symbol table info available.
#8  0x000055d453cc53a7 in ui_out::call_do_message(ui_file_style const&, char const*, ...) ()
No symbol table info available.
#9  0x000055d453cc5aa9 in ui_out::vmessage(ui_file_style const&, char const*, __va_list_tag*) ()
No symbol table info available.
#10 0x000055d453cdb98a in vfprintf_maybe_filtered(ui_file*, char const*, __va_list_tag*, bool, bool) ()
No symbol table info available.
#11 0x000055d453cdbf37 in printf_filtered(char const*, ...) ()
No symbol table info available.
#12 0x000055d453c9118b in thr_try_catch_cmd(thread_info*, char const*, int, qcs_flags const&) ()
No symbol table info available.
#13 0x000055d453c930ab in thread_apply_all_command(char const*, int) ()
No symbol table info available.
#14 0x000055d45395243a in cmd_func(cmd_list_element*, char const*, int) ()
No symbol table info available.
#15 0x000055d453c97871 in execute_command(char const*, int) ()
No symbol table info available.
#16 0x000055d453ae2aa0 in catch_command_errors(void (*)(char const*, int), char const*, int) ()
No symbol table info available.
#17 0x000055d453ae4dd3 in captured_main_1(captured_main_args*) ()
No symbol table info available.
#18 0x000055d453ae531f in gdb_main(captured_main_args*) ()
No symbol table info available.
#19 0x000055d453886b60 in main ()
No symbol table info available.


So it seems it hangs in a SYSCALL_CANCEL (write, fd, buf, nbytes); call in __libc_write().

Lastly when I killed GIMP when it hangs, most of the time, I don't have much more on stdout/stderr, but one time, this message appeared too:

Exception ignored in: <module 'threading' from '/usr/lib64/python3.9/threading.py'>
Traceback (most recent call last):
  File "/usr/lib64/python3.9/threading.py", line 1408, in _shutdown
    def _shutdown():
KeyboardInterrupt:

Since Python and threading seem involved in this issue, I am guessing it is linked to the bug.
Comment 1 Jehan 2021-12-01 19:09:16 UTC
Actually this is a better backtrace of gdb after I added its debug data. Hopefully it's more useful:

#0  __GI___libc_write (nbytes=32768, buf=0x55a40bc38f81, fd=1) at ../sysdeps/unix/sysv/linux/write.c:26
        sc_ret = 12288
        sc_cancel_oldtype = 0
        __arg3 = <optimized out>
        _a2 = <optimized out>
        sc_ret = <optimized out>
        __value = <optimized out>
        sc_ret = <optimized out>
        __arg1 = <optimized out>
        _a3 = <optimized out>
        resultvar = <optimized out>
        __arg2 = <optimized out>
        _a1 = <optimized out>
#1  __GI___libc_write (fd=1, buf=0x55a40bc38f81, nbytes=32768) at ../sysdeps/unix/sysv/linux/write.c:24
No locals.
#2  0x00007f1050caed0d in _IO_new_file_write (f=0x7f1050df2520 <_IO_2_1_stdout_>, data=0x55a40bc38f81, n=32768) at fileops.c:1181
        count = <optimized out>
        to_do = 32768
#3  0x00007f1050cae066 in new_do_write (fp=fp@entry=0x7f1050df2520 <_IO_2_1_stdout_>, 
    data=data@entry=0x55a40bc38f81 "d out>\n        log_func = 0x4a8dd0 <gimp_message_log_func>\n        domain_fatal_mask = <optimized out>\n        masquerade_fatal = 0\n        test_level = G_LOG_LEVEL_WARNING\n        was_fatal = <optimi"..., to_do=to_do@entry=32768) at libioP.h:948
        count = <optimized out>
#4  0x00007f1050caf42e in _IO_new_file_xsputn (n=36516, data=<optimized out>, f=<optimized out>) at fileops.c:1255
        block_size = <optimized out>
        do_write = <optimized out>
        s = 0x55a40bc38f81 "d out>\n        log_func = 0x4a8dd0 <gimp_message_log_func>\n        domain_fatal_mask = <optimized out>\n        masquerade_fatal = 0\n        test_level = G_LOG_LEVEL_WARNING\n        was_fatal = <optimi"...
        to_do = 33283
        must_flush = <optimized out>
        count = <optimized out>
        s = <optimized out>
        to_do = <optimized out>
        must_flush = <optimized out>
        count = <optimized out>
        p = <optimized out>
        block_size = <optimized out>
       do_write = <optimized out>
#5  _IO_new_file_xsputn (f=0x7f1050df2520 <_IO_2_1_stdout_>, data=<optimized out>, n=36516) at fileops.c:1197
        s = <optimized out>
        to_do = 36516
        must_flush = 0
        count = 0
        p = <optimized out>
        block_size = <optimized out>
        do_write = <optimized out>
#6  0x00007f1050ca36bb in __GI__IO_fputs (
    str=0x55a40bc382e0 "#0  __libc_read (nbytes=255, buf=0x7fffec289870, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26\n        sc_ret = -512\n        sc_cancel_oldtype = 0\n        __arg3 = <optimized out>\n        _a2 = <opti"..., fp=0x7f1050df2520 <_IO_2_1_stdout_>) at libioP.h:948
        _IO_acquire_lock_file = 0x7f1050df2520 <_IO_2_1_stdout_>
        len = 36516
        result = -1
#7  0x000055a3fcbc6c99 in vfprintf_styled_no_gdbfmt (stream=0x55a3ff4e0190, style=..., filter=<optimized out>, format=<optimized out>, args=<optimized out>)
    at /usr/include/c++/10/bits/basic_string.h:186
        str = "#0  __libc_read (nbytes=255, buf=0x7fffec289870, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26\n        sc_ret = -512\n        sc_cancel_oldtype = 0\n        __arg3 = <optimized out>\n        _a2 = <opti"...
#8  0x000055a3fcbaf3a7 in ui_out::call_do_message (this=this@entry=0x7ffcf61a09c0, style=..., format=<optimized out>) at ../../gdb/ui-out.c:597
        args = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7ffcf61a0900, reg_save_area = 0x7ffcf61a0840}}
#9  0x000055a3fcbafaa9 in ui_out::vmessage (this=this@entry=0x7ffcf61a09c0, in_style=..., format=<optimized out>, format@entry=0x55a3fced6ccf "%s", 
    args=args@entry=0x7ffcf61a0a70) at ../../gdb/ui-out.c:766
        current_substring = <optimized out>
        intvals = {0, 0}
        str = <optimized out>
        val = <optimized out>
        piece = @0x55a40b8d4050: {string = 0x55a40ae48171 "%s", argclass = string_arg, n_int_args = 0}
        __for_range = @0x7ffcf61a0950: {m_pieces = std::vector of length 1, capacity 1 = {{string = 0x55a40ae48171 "%s", argclass = string_arg, 
              n_int_args = 0}}, m_storage = std::unique_ptr<char> = {get() = 0x55a40ae48170 ""}}
Python Exception <class 'gdb.error'> value has been optimized out: 
        __for_begin = 
Python Exception <class 'gdb.error'> value has been optimized out: 
        __for_end = 
        fpieces = {m_pieces = std::vector of length 1, capacity 1 = {{string = 0x55a40ae48171 "%s", argclass = string_arg, n_int_args = 0}}, 
          m_storage = std::unique_ptr<char> = {get() = 0x55a40ae48170 ""}}
        style = {m_foreground = {m_simple = true, {m_value = -1, {m_red = 255 '\377', m_green = 255 '\377', m_blue = 255 '\377'}}}, m_background = {
            m_simple = true, {m_value = -1, {m_red = 255 '\377', m_green = 255 '\377', m_blue = 255 '\377'}}}, m_intensity = ui_file_style::NORMAL, 
         m_reverse = false}
        __PRETTY_FUNCTION__ = "void ui_out::vmessage(const ui_file_style&, const char*, __va_list_tag*)"
#10 0x000055a3fcbc598a in vfprintf_maybe_filtered(ui_file *, const char *, typedef __va_list_tag __va_list_tag *, bool, bool) (stream=0x55a3ff4e0190, 
    format=0x55a3fced6ccf "%s", args=0x7ffcf61a0a70, filter=<optimized out>, gdbfmt=<optimized out>) at ../../gdb/utils.c:2064
        flags = {m_enum_value = <optimized out>}
#11 0x000055a3fcbc5f37 in vfprintf_filtered (args=0x7ffcf61a0a70, format=0x55a3fced6ccf "%s", stream=<optimized out>) at ../../gdb/utils.c:2077
No locals.
#12 printf_filtered (format=format@entry=0x55a3fced6ccf "%s") at ../../gdb/utils.c:2207
        args = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7ffcf61a0b50, reg_save_area = 0x7ffcf61a0a90}}
#13 0x000055a3fcb7b18b in thr_try_catch_cmd (thr=0x7ffcf61a0b60, cmd=0x7ffcf61a30a5 "backtrace full", from_tty=0, flags=...)
    at /usr/include/c++/10/bits/basic_string.h:186
        cmd_result = "#0  __libc_read (nbytes=255, buf=0x7fffec289870, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26\n        sc_ret = -512\n        sc_cancel_oldtype = 0\n        __arg3 = <optimized out>\n        _a2 = <opti"...
        thr_header = "\nThread 1 (Thread 0x7f8cf6deef00 (LWP 152870) \"gimp-2.99\"):\n"
#14 0x000055a3fcb7d0ab in thread_apply_all_command (cmd=<optimized out>, from_tty=0) at ../../gdb/thread.c:1649
        thr = 0x55a3ff4e5640
        __for_range = std::vector of length 21, capacity 21 = {0x55a3ff4ec0b0, 0x55a3ff4ebd70, 0x55a3ff4eba30, 0x55a3ff4eb6f0, 0x55a3ff4eb3b0, 
          0x55a3ff4eb070, 0x55a3ff4ead30, 0x55a3ff4e95c0, 0x55a3ff4e9280, 0x55a3ff4e8f40, 0x55a3ff4e8c00, 0x55a3ff4e88c0, 0x55a3ff4e8580, 0x55a3ff4e8240, 
          0x55a3ff4e7f00, 0x55a3ff4e7bc0, 0x55a3ff4e7350, 0x55a3ff4e7010, 0x55a3ff4e6cb0, 0x55a3ff4e5980, 0x55a3ff4e5640}
Python Exception <class 'gdb.error'> value has been optimized out: 
        __for_begin = 
Python Exception <class 'gdb.error'> value has been optimized out: 
        __for_end = 
        thr_list_cpy = std::vector of length 21, capacity 21 = {0x55a3ff4ec0b0, 0x55a3ff4ebd70, 0x55a3ff4eba30, 0x55a3ff4eb6f0, 0x55a3ff4eb3b0, 
          0x55a3ff4eb070, 0x55a3ff4ead30, 0x55a3ff4e95c0, 0x55a3ff4e9280, 0x55a3ff4e8f40, 0x55a3ff4e8c00, 0x55a3ff4e88c0, 0x55a3ff4e8580, 0x55a3ff4e8240, 
          0x55a3ff4e7f00, 0x55a3ff4e7bc0, 0x55a3ff4e7350, 0x55a3ff4e7010, 0x55a3ff4e6cb0, 0x55a3ff4e5980, 0x55a3ff4e5640}
        inc_dec_ref = {m_thrds = <optimized out>}
        sorter = <optimized out>
        restore_thread = {m_dont_restore = false, m_thread = {m_obj = 0x55a3ff4e5640}, m_inf = {m_obj = 0x55a3ff4a49d0}, m_selected_frame_id = {
            stack_addr = 0, code_addr = 0, special_addr = 0, stack_status = FID_STACK_INVALID, code_addr_p = 0, special_addr_p = 0, artificial_depth = 0}, 
          m_selected_frame_level = -1, m_was_stopped = true, m_lang = language_c}
        ascending = false
        flags = {quiet = false, cont = false, silent = false}
        group = {_M_elems = {{options = {m_array = 0x55a3fd225700 <ascending_option_def>, m_size = 1}, ctx = 0x7ffcf61a0c24}, {options = {
                m_array = 0x55a3fd225600 <thr_qcs_flags_option_defs>, m_size = 3}, ctx = 0x7ffcf61a0c25}}}
        tc = <optimized out>
        __PRETTY_FUNCTION__ = "void thread_apply_all_command(const char*, int)"
#15 0x000055a3fc83c43a in cmd_func (cmd=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at ../../gdb/cli/cli-decode.c:2181
        restore_suppress = {{m_dummy = {<No data fields>}, m_item = {<scoped_restore_base> = {m_saved_var = <optimized out>}, 
              m_saved_value = <optimized out>}, dont_use = <optimized out>}, m_instantiated = <optimized out>}
#16 0x000055a3fcb81871 in execute_command (p=<optimized out>, p@entry=<error reading variable: value has been optimized out>, from_tty=0, 
    from_tty@entry=<error reading variable: value has been optimized out>) at ../../gdb/top.c:668
        default_args_and_arg = ""
        without_whitespace = ""
        cmd = 0x7ffcf61a30a5 "backtrace full"
        arg = 0x7ffcf61a30a5 "backtrace full"
        default_args = ""
        was_sync = 0
        save_repeat_args = <optimized out>
        args_pointer = 0x7ffcf61a30a5 "backtrace full"
        c = 0x55a3ff485060
        line = 0x7ffcf61a3094 "thread apply all backtrace full"
        cmd_start = 0x7ffcf61a3094 "thread apply all backtrace full"
        cleanup_if_error = {<scope_exit_base<scope_exit<void (*)()> >> = {m_released = false}, m_exit_function = <optimized out>}
        cleanup = {m_value = 0x55a40095b100}
        __PRETTY_FUNCTION__ = "void execute_command(const char*, int)"
#17 0x000055a3fc9ccaa0 in catch_command_errors (command=<optimized out>, arg=<optimized out>, from_tty=<optimized out>) at ../../gdb/main.c:457
        was_sync = 0
#18 0x000055a3fc9cedd3 in captured_main_1 (context=0x7ffcf61a0ff0) at ../../gdb/main.c:1218
        cmdarg_p = <optimized out>
        argc = <optimized out>
        argv = <optimized out>
        quiet = 1
        set_args = 0
        inhibit_home_gdbinit = 0
        symarg = 0x0
        execarg = 0x0
        pidarg = 0x7ffcf61a30b7 "152870"
        corearg = 0x0
        pid_or_core_arg = 0x0
        cdarg = 0x0
        ttyarg = 0x0
        print_help = 0
        print_version = 0
        print_configuration = 0
        cmdarg_vec = std::vector of length 2, capacity 2 = {{type = CMDARG_COMMAND, string = 0x7ffcf61a3083 "info threads"}, {type = CMDARG_COMMAND, 
            string = 0x7ffcf61a3094 "thread apply all backtrace full"}}
        dirarg = std::vector of length 0, capacity 0
        i = 1
        save_auto_load = <optimized out>
        ret = <optimized out>
        stat_reporter = {m_msg_type = false, m_time_enabled = -1, m_space_enabled = -1, m_symtab_enabled = -1, m_start_cpu_time = {__d = {__r = 33491}}, 
          m_start_wall_time = {__d = {__r = 117800711570014}}, m_start_space = 0, m_start_nr_symtabs = 0, m_start_nr_compunit_symtabs = 0, 
          m_start_nr_blocks = 0}
        tmp_warn_preprint = std::unique_ptr<char> = {get() = 0x0}
        signal_stack = {m_stack = std::unique_ptr<char> = {get() = 0x55a3ff1ba4d0 "\002"}, m_old_stack = {ss_sp = 0x0, ss_flags = 2, ss_size = 0}}
        system_gdbinit = std::vector of length 1, capacity 1 = {"/etc/gdbinit"}
        home_gdbinit = ""
        local_gdbinit = ""
        OPT_SE = OPT_SE
        OPT_CD = OPT_CD
        OPT_ANNOTATE = OPT_ANNOTATE
        OPT_STATISTICS = OPT_STATISTICS
        OPT_TUI = OPT_TUI
        OPT_NOWINDOWS = OPT_NOWINDOWS
        OPT_WINDOWS = OPT_WINDOWS
        OPT_IX = OPT_IX
        OPT_IEX = OPT_IEX
        OPT_READNOW = OPT_READNOW
        OPT_READNEVER = OPT_READNEVER
#19 0x000055a3fc9cf31f in captured_main (data=<optimized out>) at ../../gdb/main.c:1243
        context = <optimized out>
        context = <optimized out>
        ex = <optimized out>
#20 gdb_main (args=<optimized out>) at ../../gdb/main.c:1268
No locals.
#21 0x000055a3fc770b60 in main (argc=<optimized out>, argv=<optimized out>) at ../../gdb/gdb.c:40
        args = {argc = 8, argv = 0x7ffcf61a1288, interpreter_p = 0x55a3fcd7f004 "console"}
Comment 2 Simon Marchi 2021-12-02 00:03:17 UTC
Since GDB is blocking on a write, my first question would be: is the thing on the other side of GDB's stdout consuming the data?  Because at first glance it looks like this would happen if the stdout pipe buffer was full.
Comment 3 Jehan 2021-12-02 00:20:14 UTC
> Since GDB is blocking on a write, my first question would be: is the thing on the other side of GDB's stdout consuming the data?

Yes, unless there is a bug somewhere.

See here: https://gitlab.gnome.org/GNOME/gimp/-/blob/b0547a72c6b6fae3630ed089a9090ff43dc94af2/libgimpbase/gimputils.c#L1354

We have on one side a process calling gdb, on the other side a process waiting on the other side of the pipe to read.

The writing side seems to be blocked on the execvp() calling gdb, whereas the reading side is blocked on the read().

Note that I tried to protect the read() with a select() or poll() first, even with a timeout. But somehow these get blocked too (as though there was no timeout, the calls just hang).
Comment 4 Simon Marchi 2021-12-02 02:05:58 UTC
(In reply to Jehan from comment #3)
> > Since GDB is blocking on a write, my first question would be: is the thing on the other side of GDB's stdout consuming the data?
> 
> Yes, unless there is a bug somewhere.
> 
> See here:
> https://gitlab.gnome.org/GNOME/gimp/-/blob/
> b0547a72c6b6fae3630ed089a9090ff43dc94af2/libgimpbase/gimputils.c#L1354
> 
> We have on one side a process calling gdb, on the other side a process
> waiting on the other side of the pipe to read.

Sounds good.
 
> The writing side seems to be blocked on the execvp() calling gdb, whereas
> the reading side is blocked on the read().

I don't understand what you mean here, you can't block on an execvp call.
 
> Note that I tried to protect the read() with a select() or poll() first,
> even with a timeout. But somehow these get blocked too (as though there was
> no timeout, the calls just hang).

As a sanity check, once things are stuck, can you validate (by checking /proc/$pid/fd) that the fd GDB is blocked writing on points to the same thing as the fd that your consumer process is blocked reading on?
Comment 5 Jehan 2021-12-04 17:18:03 UTC
> I don't understand what you mean here, you can't block on an execvp call.

Sorry, my wording was wrong.
I just meant that excvp() succeeds (i.e. never returns), runs gdb as it should, but simply the new gdb process doesn't end. I can see the gdb process with a `ps` for instance. So the reading end awaits forever too (even with a poll() or a select() with timeout; something just locks everything).

> As a sanity check, once things are stuck, can you validate (by checking /proc/$pid/fd) that the fd GDB is blocked writing on points to the same thing as the fd that your consumer process is blocked reading on?

$ ps -ef |grep 'gimp\|gdb'
jehan      32689   23033 12 18:13 pts/0    00:00:02 app/gimp-2.99
jehan      32879   32689 21 18:13 pts/0    00:00:02 gdb -batch -ex info threads -ex thread apply all backtrace full -p 32689
[… and others …]

$ ls /proc/32689/fd/17 -ltr
lr-x------. 1 jehan jehan 64 Dec  4 18:14 /proc/32689/fd/17 -> 'pipe:[195141]'

(I did check with some stupid printf that out_fd[0] is 17, hence that's the fd the main process is blocking on)

$ ls /proc/32879/fd/1 -ltr
l-wx------. 1 jehan jehan 64 Dec  4 18:13 /proc/32879/fd/1 -> 'pipe:[195141]'

(where gdb is indeed outputting)