Bug 19855 - readline abort when debugging with rr
Summary: readline abort when debugging with rr
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: server (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 7.12
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-23 04:59 UTC by Yichao Yu
Modified: 2016-08-08 18:41 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yichao Yu 2016-03-23 04:59:54 UTC
See https://github.com/mozilla/rr/issues/1677
Ref https://sourceware.org/bugzilla/show_bug.cgi?id=16440
Ref https://sourceware.org/bugzilla/show_bug.cgi?id=17546

This only happens when I press extra enter when debugging julia with rr replay.

Backtrace below. Please let me know if other info is needed.

```
(gdb) bt
#0  0x00007ffff61de2a8 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff61df72a in abort () from /usr/lib/libc.so.6
#2  0x00007ffff7bbcedf in rl_callback_read_char () from /usr/lib/libreadline.so.6
#3  0x00000000005c7377 in rl_callback_read_char_wrapper (client_data=<optimized out>) at event-top.c:153
#4  0x00000000005c6f49 in stdin_event_handler (error=<optimized out>, client_data=0x0) at event-top.c:409
#5  0x00000000005c5eed in handle_file_event (file_ptr=0x1401bf0, ready_mask=<optimized out>) at event-loop.c:708
#6  0x00000000005c6382 in gdb_wait_for_event (block=block@entry=0) at event-loop.c:834
#7  0x00000000005c66e8 in gdb_do_one_event () at event-loop.c:298
#8  0x00000000005c678b in start_event_loop () at event-loop.c:347
#9  0x00000000005c74fe in cli_command_loop (data=<optimized out>) at event-top.c:168
#10 0x00000000005c0425 in current_interp_command_loop () at interps.c:317
#11 0x00000000005c0e6a in captured_command_loop (data=data@entry=0x0) at main.c:318
#12 0x00000000005be4a0 in catch_errors (func=func@entry=0x5c0e57 <captured_command_loop>, func_args=func_args@entry=0x0, errstring=errstring@entry=0x7827ba "", 
    mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240
#13 0x00000000005c21ce in captured_main (data=data@entry=0x7fffffffdde0) at main.c:1156
#14 0x00000000005be4a0 in catch_errors (func=func@entry=0x5c14bb <captured_main>, func_args=func_args@entry=0x7fffffffdde0, errstring=errstring@entry=0x7827ba "", 
    mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240
#15 0x00000000005c21eb in gdb_main (args=args@entry=0x7fffffffdde0) at main.c:1164
#16 0x0000000000462e1e in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:32
```
Comment 1 Pedro Alves 2016-03-23 10:14:28 UTC
Sounds like something is calling target_terminal_ours and not then missing calling target_terminal_inferior, to unregister stdin from the event loop while the program is running.

I wonder whether the users/palves/immediate_quit branch:
  https://sourceware.org/ml/gdb-patches/2016-03/msg00351.html
fixes this, since patches #20 -> #26 fix some cases like that.
Comment 2 Yichao Yu 2016-03-23 13:04:45 UTC
Hmmm, I tried to see when are target_terminal_inferior and target_terminal_ours called (debug session[1] prompt (gdb) is the outer debugger, (rr) is the debugger to be debugged...) and it seems that the last one called before the readline error is actually target_terminal_inferior.

[1] https://gist.github.com/yuyichao/d2484c6c9e5a53233f35
Comment 3 Pedro Alves 2016-03-23 13:09:09 UTC
Try tracking calls to gdb_rl_callback_handler_remove / 
gdb_rl_callback_handler_install then.
Comment 4 Yichao Yu 2016-03-23 13:32:18 UTC
Only install/remove [1]. Combined [2].


[1] https://gist.github.com/yuyichao/6b77277e9d21b0cc9e97
[2] https://gist.github.com/yuyichao/377b388115768bf8d290
Comment 5 Pedro Alves 2016-03-23 13:48:21 UTC
Thanks.  

Still can't tell why this is happening, but I think we'll be 
getting there soon enough.

Could you track calls to add_file_handler / delete_file_handler,
with condition "fd == 0" as well?

Because it looks like that for some reason, while target_terminal_inferior
is in effect, still stdin ends up registered in the event loop somehow.
Comment 6 Yichao Yu 2016-03-23 14:04:35 UTC
All 6 https://gist.github.com/yuyichao/03db6cc574ba683b1b9c

Looks like delete_file_handler was not called on 0 at all?.....
Comment 7 Yichao Yu 2016-03-23 15:27:12 UTC
For a program that doesn't have this problem https://gist.github.com/yuyichao/6d370d68acfdaa40213a
Comment 8 Pedro Alves 2016-03-23 22:49:14 UTC
Huh.  

If you put a breakpoint on remote_terminal_inferior, and then step through it, does it not reach the delete_file_handler call?  

If it does reach it, is input_fd not 0?

If it isn't 0, can you set a watchpoint, so we can try to understand why not?
Comment 9 Yichao Yu 2016-03-27 21:29:41 UTC
OK, finally back to this issue. A few things that I discovered,

1. This happens when replaying python too, it is likely that the initial slow jit in julia makes the window that this can happen bigger.
2. The only call to `add_file_handler(fd=0, ...)` is not from any of the (remote|target)_terminal_(ours|inferior) it is called from `gdb_setup_readline` (see bt).

    ```
    #0  add_file_handler (fd=0, proc=proc@entry=0x5c6f5e <stdin_event_handler>, client_data=client_data@entry=0x0) at event-loop.c:389
    #1  0x00000000005c7a3e in gdb_setup_readline () at event-top.c:1032
    #2  0x00000000004f00ce in tui_resume (data=<optimized out>) at ./tui/tui-interp.c:177
    #3  0x00000000005c0701 in interp_set (interp=0xd2f840, top_level=top_level@entry=1) at interps.c:187
    #4  0x00000000005c1d91 in captured_main (data=data@entry=0x7fffffffdda0) at main.c:973
    #5  0x00000000005be514 in catch_errors (func=func@entry=0x5c152f <captured_main>, func_args=func_args@entry=0x7fffffffdda0, errstring=errstring@entry=0x78283a "", 
        mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240
    #6  0x00000000005c225f in gdb_main (args=args@entry=0x7fffffffdda0) at main.c:1164
    #7  0x0000000000462e1e in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:32
```

3. There are calls to `remote_terminal_inferior` however `target_async_permitted` is always `0` so it never does anything.


    The value is set to `0` with bt

    ```
    Old value = 1
    New value = 0
    maint_set_target_async_command (args=<optimized out>, from_tty=<optimized out>, 
        c=<optimized out>) at target.c:3890
    3890    }
    (gdb) bt
    #0  maint_set_target_async_command (args=<optimized out>, 
        from_tty=<optimized out>, c=<optimized out>) at target.c:3890
    #1  0x00000000004da294 in do_sfunc (c=<optimized out>, args=<optimized out>, 
        from_tty=<optimized out>) at ./cli/cli-decode.c:121
    #2  0x00000000004e1ac7 in do_set_command (arg=<optimized out>, 
        arg@entry=0x148f8c7 "0", from_tty=from_tty@entry=0, c=c@entry=0xc55fd0)
        at ./cli/cli-setshow.c:455
    #3  0x000000000067eb18 in execute_command (p=<optimized out>, 
        p@entry=0x148f8b0 "maint set target-async 0", from_tty=0) at top.c:460
    #4  0x00000000005c7612 in command_handler (
        command=0x148f8b0 "maint set target-async 0") at event-top.c:463
    #5  0x000000000067f7a4 in command_loop () at top.c:546
    #6  0x000000000067f7ea in read_command_file (stream=stream@entry=0x14ce000)
        at top.c:285
    #7  0x00000000004de474 in script_from_file (stream=stream@entry=0x14ce000, 
        file=file@entry=0x7fffffffe257 "/proc/20193/fd/4") at ./cli/cli-script.c:1698
    #8  0x00000000004deb37 in source_script_from_stream (stream=0x14ce000, 
        file=file@entry=0x7fffffffe257 "/proc/20193/fd/4", 
        file_to_open=<optimized out>) at ./cli/cli-cmds.c:578
    #9  0x00000000004e06b5 in source_script_with_search (
        file=file@entry=0x7fffffffe257 "/proc/20193/fd/4", 
        from_tty=from_tty@entry=1, search_path=search_path@entry=0)
        at ./cli/cli-cmds.c:618
    #10 0x00000000004e07bc in source_script (
        file=file@entry=0x7fffffffe257 "/proc/20193/fd/4", from_tty=from_tty@entry=1)
        at ./cli/cli-cmds.c:628
    #11 0x00000000005c12f4 in catch_command_errors_const (
        command=0x4e07ae <source_script>, arg=0x7fffffffe257 "/proc/20193/fd/4", 
        from_tty=1) at main.c:395
    #12 0x00000000005c21ab in captured_main (data=data@entry=0x7fffffffdda0)
        at main.c:1128
    #13 0x00000000005be514 in catch_errors (
        func=func@entry=0x5c152f <captured_main>, 
        func_args=func_args@entry=0x7fffffffdda0, 
        errstring=errstring@entry=0x78283a "", mask=mask@entry=RETURN_MASK_ALL)
        at exceptions.c:240
    #14 0x00000000005c225f in gdb_main (args=args@entry=0x7fffffffdda0)
        at main.c:1164
    #15 0x0000000000462e1e in main (argc=<optimized out>, argv=<optimized out>)
        at gdb.c:32
    ```

    Is this something caused by rr?
Comment 10 Pedro Alves 2016-03-30 10:15:58 UTC
(In reply to Yichao Yu from comment #9)
> OK, finally back to this issue. A few things that I discovered,

Thanks.

> 3. There are calls to `remote_terminal_inferior` however
> `target_async_permitted` is always `0` so it never does anything.

>     The value is set to `0` with bt

>         command=0x148f8b0 "maint set target-async 0") at event-top.c:463

Ah...  Is there a reason RR disables target-async?  Users are not supposed to need this.  It's an escape hatch for gdb bugs, so I imagine it was done to paper over some old GDB bug?  Changing this setting affects the target backend / event-loop integration, which affects stdin/readline handling -- different code paths.

The !target-async paths are getting less tested as time progresses.  All 
hosts except DJGPP do target async when remote debugging by default, I believe...

>     Is this something caused by rr?

So I'd think that stopping RR from using the "maint set target-async 0" command would make this go away.

Did you try the users/palves/immediate_quit branch?  It may also fix this
since https://sourceware.org/ml/gdb-patches/2016-03/msg00358.html removes the target_async_permitted checks from remote_terminal_inferior/remote_terminal_ours.
Comment 11 Yichao Yu 2016-03-30 12:43:24 UTC
On the users/palves/immediate_quit branch, it seems that remote_terminal_ours is never called since terminal_state is always terminal_is_ours and therefore remote_async_terminal_ours_p is always 0 and remote_terminal_inferior never bother to do anything.
Comment 12 Pedro Alves 2016-03-30 13:10:17 UTC
Thanks.  Indeed...  I fixed it in the users/palves/immediate_quit_pr19855 branch.  Could you give that a try?
Comment 13 Yichao Yu 2016-03-30 13:21:26 UTC
Yep, the users/palves/immediate_quit_pr19855 branch seems to have fixed this. Thanks!
Comment 14 Pedro Alves 2016-03-30 14:34:35 UTC
Alright, thanks for all the detective work, and for confirming!
Comment 15 Yichao Yu 2016-03-30 14:43:10 UTC
I pinged the rr issue tracker about this https://github.com/mozilla/rr/issues/1677#issuecomment-203408556

Is this option deprecated or should it be tested more?
Comment 16 Pedro Alves 2016-03-30 14:53:12 UTC
Since it's a maintenance command, users should not rely on it, and it may disappear at any moment without advance notice.

It's mainly a convenience for maintainers to try to reproduce issues that would normally only happen on non-GNU/Linux hosts.

It changes an internal implementation detail, and other than "maint target-async on" enabling support for more features, like background execution commands and non-stop mode, it should not have user-visible side effects.

Once all hosts support "maint set target-async on", the non-target-async code paths in gdb will be removed along with the command.  That may take a long while though, so I'm not holding my breath...

In sum, if you found a problem that led to the use of "maint set target-async off", please let us know about it, if you haven't already, so we can try to fix the problem properly.
Comment 17 Robert O'Callahan 2016-03-30 19:28:13 UTC
rr sets target-async to 0 because otherwise we get an error continuing under gdb:
../../gdb/target.c:602: internal-error: default_execution_direction: to_execution_direction must be implemented for reverse async

rr uses the remote target interface. The remote target interface does not implement to_execution_direction, hence once we started advertising support for reverse execution, we hit this error unless we also set target-async to 0.
Comment 18 Pedro Alves 2016-03-30 20:30:54 UTC
Eh, funny coincidence, I just posted a fix for that today.  See Bug 19840, and https://sourceware.org/ml/gdb-patches/2016-03/msg00558.html.  I'll make sure to include that in 7.11.1 too.
Comment 19 Robert O'Callahan 2016-03-30 21:17:26 UTC
OK, great! I apologize for not reporting that issue much earlier, I definitely should have.

At some point I'll update rr to sniff the gdb version (using Python if necessary) so we can stop issuing those target-async commands.
Comment 20 Pedro Alves 2016-04-13 09:52:34 UTC
This should be fixed on master now.
Comment 21 Pedro Alves 2016-04-13 09:54:27 UTC
(and by "this" I meant the readline abort.)
Comment 22 Pedro Alves 2016-04-13 13:40:06 UTC
FYI, Bug 19840 (the "maint target-async on" crash internal-error is now fixed on master and (future) 7.11.1.
Comment 23 Pedro Alves 2016-08-08 18:41:35 UTC
The whole immediate_quit/Ctrl-C rework is in (soon to be released) gdb 7.12, so I think we can close this now.