See https://github.com/mozilla/rr/issues/1677 Ref https://sourceware.org/bugzilla/show_bug.cgi?id=16440 Ref https://sourceware.org/bugzilla/show_bug.cgi?id=17546 This only happens when I press extra enter when debugging julia with rr replay. Backtrace below. Please let me know if other info is needed. ``` (gdb) bt #0 0x00007ffff61de2a8 in raise () from /usr/lib/libc.so.6 #1 0x00007ffff61df72a in abort () from /usr/lib/libc.so.6 #2 0x00007ffff7bbcedf in rl_callback_read_char () from /usr/lib/libreadline.so.6 #3 0x00000000005c7377 in rl_callback_read_char_wrapper (client_data=<optimized out>) at event-top.c:153 #4 0x00000000005c6f49 in stdin_event_handler (error=<optimized out>, client_data=0x0) at event-top.c:409 #5 0x00000000005c5eed in handle_file_event (file_ptr=0x1401bf0, ready_mask=<optimized out>) at event-loop.c:708 #6 0x00000000005c6382 in gdb_wait_for_event (block=block@entry=0) at event-loop.c:834 #7 0x00000000005c66e8 in gdb_do_one_event () at event-loop.c:298 #8 0x00000000005c678b in start_event_loop () at event-loop.c:347 #9 0x00000000005c74fe in cli_command_loop (data=<optimized out>) at event-top.c:168 #10 0x00000000005c0425 in current_interp_command_loop () at interps.c:317 #11 0x00000000005c0e6a in captured_command_loop (data=data@entry=0x0) at main.c:318 #12 0x00000000005be4a0 in catch_errors (func=func@entry=0x5c0e57 <captured_command_loop>, func_args=func_args@entry=0x0, errstring=errstring@entry=0x7827ba "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240 #13 0x00000000005c21ce in captured_main (data=data@entry=0x7fffffffdde0) at main.c:1156 #14 0x00000000005be4a0 in catch_errors (func=func@entry=0x5c14bb <captured_main>, func_args=func_args@entry=0x7fffffffdde0, errstring=errstring@entry=0x7827ba "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240 #15 0x00000000005c21eb in gdb_main (args=args@entry=0x7fffffffdde0) at main.c:1164 #16 0x0000000000462e1e in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:32 ```
Sounds like something is calling target_terminal_ours and not then missing calling target_terminal_inferior, to unregister stdin from the event loop while the program is running. I wonder whether the users/palves/immediate_quit branch: https://sourceware.org/ml/gdb-patches/2016-03/msg00351.html fixes this, since patches #20 -> #26 fix some cases like that.
Hmmm, I tried to see when are target_terminal_inferior and target_terminal_ours called (debug session[1] prompt (gdb) is the outer debugger, (rr) is the debugger to be debugged...) and it seems that the last one called before the readline error is actually target_terminal_inferior. [1] https://gist.github.com/yuyichao/d2484c6c9e5a53233f35
Try tracking calls to gdb_rl_callback_handler_remove / gdb_rl_callback_handler_install then.
Only install/remove [1]. Combined [2]. [1] https://gist.github.com/yuyichao/6b77277e9d21b0cc9e97 [2] https://gist.github.com/yuyichao/377b388115768bf8d290
Thanks. Still can't tell why this is happening, but I think we'll be getting there soon enough. Could you track calls to add_file_handler / delete_file_handler, with condition "fd == 0" as well? Because it looks like that for some reason, while target_terminal_inferior is in effect, still stdin ends up registered in the event loop somehow.
All 6 https://gist.github.com/yuyichao/03db6cc574ba683b1b9c Looks like delete_file_handler was not called on 0 at all?.....
For a program that doesn't have this problem https://gist.github.com/yuyichao/6d370d68acfdaa40213a
Huh. If you put a breakpoint on remote_terminal_inferior, and then step through it, does it not reach the delete_file_handler call? If it does reach it, is input_fd not 0? If it isn't 0, can you set a watchpoint, so we can try to understand why not?
OK, finally back to this issue. A few things that I discovered, 1. This happens when replaying python too, it is likely that the initial slow jit in julia makes the window that this can happen bigger. 2. The only call to `add_file_handler(fd=0, ...)` is not from any of the (remote|target)_terminal_(ours|inferior) it is called from `gdb_setup_readline` (see bt). ``` #0 add_file_handler (fd=0, proc=proc@entry=0x5c6f5e <stdin_event_handler>, client_data=client_data@entry=0x0) at event-loop.c:389 #1 0x00000000005c7a3e in gdb_setup_readline () at event-top.c:1032 #2 0x00000000004f00ce in tui_resume (data=<optimized out>) at ./tui/tui-interp.c:177 #3 0x00000000005c0701 in interp_set (interp=0xd2f840, top_level=top_level@entry=1) at interps.c:187 #4 0x00000000005c1d91 in captured_main (data=data@entry=0x7fffffffdda0) at main.c:973 #5 0x00000000005be514 in catch_errors (func=func@entry=0x5c152f <captured_main>, func_args=func_args@entry=0x7fffffffdda0, errstring=errstring@entry=0x78283a "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240 #6 0x00000000005c225f in gdb_main (args=args@entry=0x7fffffffdda0) at main.c:1164 #7 0x0000000000462e1e in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:32 ``` 3. There are calls to `remote_terminal_inferior` however `target_async_permitted` is always `0` so it never does anything. The value is set to `0` with bt ``` Old value = 1 New value = 0 maint_set_target_async_command (args=<optimized out>, from_tty=<optimized out>, c=<optimized out>) at target.c:3890 3890 } (gdb) bt #0 maint_set_target_async_command (args=<optimized out>, from_tty=<optimized out>, c=<optimized out>) at target.c:3890 #1 0x00000000004da294 in do_sfunc (c=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at ./cli/cli-decode.c:121 #2 0x00000000004e1ac7 in do_set_command (arg=<optimized out>, arg@entry=0x148f8c7 "0", from_tty=from_tty@entry=0, c=c@entry=0xc55fd0) at ./cli/cli-setshow.c:455 #3 0x000000000067eb18 in execute_command (p=<optimized out>, p@entry=0x148f8b0 "maint set target-async 0", from_tty=0) at top.c:460 #4 0x00000000005c7612 in command_handler ( command=0x148f8b0 "maint set target-async 0") at event-top.c:463 #5 0x000000000067f7a4 in command_loop () at top.c:546 #6 0x000000000067f7ea in read_command_file (stream=stream@entry=0x14ce000) at top.c:285 #7 0x00000000004de474 in script_from_file (stream=stream@entry=0x14ce000, file=file@entry=0x7fffffffe257 "/proc/20193/fd/4") at ./cli/cli-script.c:1698 #8 0x00000000004deb37 in source_script_from_stream (stream=0x14ce000, file=file@entry=0x7fffffffe257 "/proc/20193/fd/4", file_to_open=<optimized out>) at ./cli/cli-cmds.c:578 #9 0x00000000004e06b5 in source_script_with_search ( file=file@entry=0x7fffffffe257 "/proc/20193/fd/4", from_tty=from_tty@entry=1, search_path=search_path@entry=0) at ./cli/cli-cmds.c:618 #10 0x00000000004e07bc in source_script ( file=file@entry=0x7fffffffe257 "/proc/20193/fd/4", from_tty=from_tty@entry=1) at ./cli/cli-cmds.c:628 #11 0x00000000005c12f4 in catch_command_errors_const ( command=0x4e07ae <source_script>, arg=0x7fffffffe257 "/proc/20193/fd/4", from_tty=1) at main.c:395 #12 0x00000000005c21ab in captured_main (data=data@entry=0x7fffffffdda0) at main.c:1128 #13 0x00000000005be514 in catch_errors ( func=func@entry=0x5c152f <captured_main>, func_args=func_args@entry=0x7fffffffdda0, errstring=errstring@entry=0x78283a "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240 #14 0x00000000005c225f in gdb_main (args=args@entry=0x7fffffffdda0) at main.c:1164 #15 0x0000000000462e1e in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:32 ``` Is this something caused by rr?
(In reply to Yichao Yu from comment #9) > OK, finally back to this issue. A few things that I discovered, Thanks. > 3. There are calls to `remote_terminal_inferior` however > `target_async_permitted` is always `0` so it never does anything. > The value is set to `0` with bt > command=0x148f8b0 "maint set target-async 0") at event-top.c:463 Ah... Is there a reason RR disables target-async? Users are not supposed to need this. It's an escape hatch for gdb bugs, so I imagine it was done to paper over some old GDB bug? Changing this setting affects the target backend / event-loop integration, which affects stdin/readline handling -- different code paths. The !target-async paths are getting less tested as time progresses. All hosts except DJGPP do target async when remote debugging by default, I believe... > Is this something caused by rr? So I'd think that stopping RR from using the "maint set target-async 0" command would make this go away. Did you try the users/palves/immediate_quit branch? It may also fix this since https://sourceware.org/ml/gdb-patches/2016-03/msg00358.html removes the target_async_permitted checks from remote_terminal_inferior/remote_terminal_ours.
On the users/palves/immediate_quit branch, it seems that remote_terminal_ours is never called since terminal_state is always terminal_is_ours and therefore remote_async_terminal_ours_p is always 0 and remote_terminal_inferior never bother to do anything.
Thanks. Indeed... I fixed it in the users/palves/immediate_quit_pr19855 branch. Could you give that a try?
Yep, the users/palves/immediate_quit_pr19855 branch seems to have fixed this. Thanks!
Alright, thanks for all the detective work, and for confirming!
I pinged the rr issue tracker about this https://github.com/mozilla/rr/issues/1677#issuecomment-203408556 Is this option deprecated or should it be tested more?
Since it's a maintenance command, users should not rely on it, and it may disappear at any moment without advance notice. It's mainly a convenience for maintainers to try to reproduce issues that would normally only happen on non-GNU/Linux hosts. It changes an internal implementation detail, and other than "maint target-async on" enabling support for more features, like background execution commands and non-stop mode, it should not have user-visible side effects. Once all hosts support "maint set target-async on", the non-target-async code paths in gdb will be removed along with the command. That may take a long while though, so I'm not holding my breath... In sum, if you found a problem that led to the use of "maint set target-async off", please let us know about it, if you haven't already, so we can try to fix the problem properly.
rr sets target-async to 0 because otherwise we get an error continuing under gdb: ../../gdb/target.c:602: internal-error: default_execution_direction: to_execution_direction must be implemented for reverse async rr uses the remote target interface. The remote target interface does not implement to_execution_direction, hence once we started advertising support for reverse execution, we hit this error unless we also set target-async to 0.
Eh, funny coincidence, I just posted a fix for that today. See Bug 19840, and https://sourceware.org/ml/gdb-patches/2016-03/msg00558.html. I'll make sure to include that in 7.11.1 too.
OK, great! I apologize for not reporting that issue much earlier, I definitely should have. At some point I'll update rr to sniff the gdb version (using Python if necessary) so we can stop issuing those target-async commands.
This should be fixed on master now.
(and by "this" I meant the readline abort.)
FYI, Bug 19840 (the "maint target-async on" crash internal-error is now fixed on master and (future) 7.11.1.
The whole immediate_quit/Ctrl-C rework is in (soon to be released) gdb 7.12, so I think we can close this now.