Bug 26614 - AddressSanitizer: heap-use-after-free of extended_remote_target in remote_async_inferior_event_handler
Summary: AddressSanitizer: heap-use-after-free of extended_remote_target in remote_asy...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: remote (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 10.2
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-14 14:50 UTC by Tom de Vries
Modified: 2021-02-04 18:43 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
gdb.sum.gz (2.77 KB, application/gzip)
2020-09-14 14:52 UTC, Tom de Vries
Details
gdb.log.gz (10.46 KB, application/gzip)
2020-09-14 14:52 UTC, Tom de Vries
Details
gdb.log.gz from tmp.4 (14.09 KB, application/gzip)
2020-09-15 09:42 UTC, Tom de Vries
Details
Tentative patch, implementing the "listener" approach (933 bytes, patch)
2020-11-30 12:06 UTC, Tom de Vries
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2020-09-14 14:50:37 UTC
[ On x86_64 laptop, with openSUSE Leap 15.2. ]

With this patch (not sure yet whether it's relevant) in place:
...
diff --git a/gdb/testsuite/lib/gdbserver-support.exp b/gdb/testsuite/lib/gdbserver-support.
exp
index a2cc80f28d..7b9c0eef6e 100644
--- a/gdb/testsuite/lib/gdbserver-support.exp
+++ b/gdb/testsuite/lib/gdbserver-support.exp
@@ -451,8 +451,10 @@ proc gdbserver_exit { is_mi } {
            # We use expect rather than gdb_expect because
            # we want to suppress printing exception messages, otherwise,
            # remote_expect, invoked by gdb_expect, prints the exceptions.
+           set read_prompt 0
            expect {
                -i "$gdb_spawn_id" -re "$gdb_prompt $" {
+                   set read_prompt 1
                    exp_continue
                }
                -i "$server_spawn_id" eof {
@@ -463,6 +465,7 @@ proc gdbserver_exit { is_mi } {
                    warning "Timed out waiting for EOF in server after $monitor_exit"
                }
            }
+           gdb_assert {$read_prompt}
        }
     }
     close_gdbserver
...
and running in parallel with:
...
$ stress -c 5
...
I ran into:
...
(gdb) PASS: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 2
Remote debugging from host ::1, port 34088^M
Process build/gdb/testsuite/outputs/gdb.multi/multi-target/multi-target created; pid = 8649^M
monitor exit^M
(gdb) Killing process(es): 8649^M
PASS: gdb.multi/multi-target.exp: continue: non-stop=on: $read_prompt
ERROR: GDB process no longer exists
GDB process exited with wait status 8627 exp14 0 0 CHILDKILLED SIGABRT SIGABRT
UNRESOLVED: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 5
...
Comment 1 Tom de Vries 2020-09-14 14:52:35 UTC
Created attachment 12839 [details]
gdb.sum.gz
Comment 2 Tom de Vries 2020-09-14 14:52:49 UTC
Created attachment 12840 [details]
gdb.log.gz
Comment 3 Tom de Vries 2020-09-14 14:56:22 UTC
Also spotted on OBS build with gdb 10.0.90 snapshot:
...
binaries-testsuite.openSUSE_Factory_PPC.ppc/usr/share/doc/packages/gdb-testresults/gdb-ppc-suse-linux-m32.sum:UNRESOLVED: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 5
binaries-testsuite.openSUSE_Leap_42.3.x86_64/usr/share/doc/packages/gdb-testresults/gdb-x86_64-suse-linux-m32.sum:UNRESOLVED: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 5
...
Comment 4 Tom de Vries 2020-09-15 09:41:03 UTC
Did same as in comment 0, but build gdb with -fsanitizer=address and gcc-11, and started a loop on the test-case, saving logs after each run.  After a bit of running we have:
...
$ grep -ic AddressSanitizer tmp.*/gdb.log
tmp.10/gdb.log:2
tmp.11/gdb.log:0
tmp.12/gdb.log:0
tmp.13/gdb.log:0
tmp.14/gdb.log:2
tmp.15/gdb.log:0
tmp.1/gdb.log:0
tmp.2/gdb.log:0
tmp.3/gdb.log:0
tmp.4/gdb.log:2
tmp.5/gdb.log:0
tmp.6/gdb.log:0
tmp.7/gdb.log:0
tmp.8/gdb.log:0
tmp.9/gdb.log:0
...

The tmp.4 one in more detail:
...
(gdb) PASS: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 2
Remote debugging from host ::1, port 39192
Process /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.multi/multi-target/multi-target created; pid = 7222
monitor exit
Killing process(es): 7222
FAIL: gdb.multi/multi-target.exp: continue: non-stop=on: $read_prompt
inferior 5
(gdb) PASS: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 5
Remote debugging from host ::1, port 45000
Process /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.multi/multi-target/multi-target created; pid = 7234
=================================================================
==7196==ERROR: AddressSanitizer: heap-use-after-free on address 0x6170000bf258 at pc 0x000001481755 bp 0x7fff05b20840 sp 0x7fff05b20838
READ of size 8 at 0x6170000bf258 thread T0
    #0 0x1481754 in std::_Hashtable<gdbarch*, std::pair<gdbarch* const, remote_arch_state>, std::allocator<std::pair<gdbarch* const, remote_arch_state> >, std::__detail::_Select1st, std::equal_to<gdbarch*>, std::hash<gdbarch*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_bucket_index(unsigned long) const /usr/include/c++/11/bits/hashtable.h:719
    #1 0x147c8ab in std::_Hashtable<gdbarch*, std::pair<gdbarch* const, remote_arch_state>, std::allocator<std::pair<gdbarch* const, remote_arch_state> >, std::__detail::_Select1st, std::equal_to<gdbarch*>, std::hash<gdbarch*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::find(gdbarch* const&) /usr/include/c++/11/bits/hashtable.h:1500
    #2 0x147852c in std::unordered_map<gdbarch*, remote_arch_state, std::hash<gdbarch*>, std::equal_to<gdbarch*>, std::allocator<std::pair<gdbarch* const, remote_arch_state> > >::find(gdbarch* const&) /usr/include/c++/11/bits/unordered_map.h:869
    #3 0x14306db in remote_state::get_remote_arch_state(gdbarch*) /home/vries/gdb_versions/devel/src/gdb/remote.c:1203
    #4 0x14309dc in remote_target::get_remote_state() /home/vries/gdb_versions/devel/src/gdb/remote.c:1232
    #5 0x1470c08 in remote_async_inferior_event_handler /home/vries/gdb_versions/devel/src/gdb/remote.c:14169
    #6 0xaa9f6b in check_async_event_handlers() /home/vries/gdb_versions/devel/src/gdb/async-event.c:295
    #7 0x1e93ab4 in gdb_do_one_event() /home/vries/gdb_versions/devel/src/gdbsupport/event-loop.cc:194
    #8 0x118f5f9 in start_event_loop /home/vries/gdb_versions/devel/src/gdb/main.c:356
    #9 0x118f8ed in captured_command_loop /home/vries/gdb_versions/devel/src/gdb/main.c:416
    #10 0x1192d6a in captured_main /home/vries/gdb_versions/devel/src/gdb/main.c:1253
    #11 0x1192dfa in gdb_main(captured_main_args*) /home/vries/gdb_versions/devel/src/gdb/main.c:1268
    #12 0x97b380 in main /home/vries/gdb_versions/devel/src/gdb/gdb.c:32
    #13 0x7f550c211349 in __libc_start_main ../csu/libc-start.c:308
    #14 0x97b199 in _start (/home/vries/gdb_versions/devel/build/gdb/gdb+0x97b199)

0x6170000bf258 is located 600 bytes inside of 648-byte region [0x6170000bf000,0x6170000bf288)
freed by thread T0 here:
    #0 0x7f550f516a57 in operator delete(void*, unsigned long) (/usr/lib64/libasan.so.6+0xaea57)
    #1 0x148b1fe in extended_remote_target::~extended_remote_target() /home/vries/gdb_versions/devel/src/gdb/remote.c:958
    #2 0x143b483 in remote_target::close() /home/vries/gdb_versions/devel/src/gdb/remote.c:4074
    #3 0x16cb90f in target_close(target_ops*) /home/vries/gdb_versions/devel/src/gdb/target.c:3230
    #4 0x16a2635 in decref_target(target_ops*) /home/vries/gdb_versions/devel/src/gdb/target.c:557
    #5 0x16a2abb in target_stack::unpush(target_ops*) /home/vries/gdb_versions/devel/src/gdb/target.c:645
    #6 0x16d01ef in inferior::unpush_target(target_ops*) /home/vries/gdb_versions/devel/src/gdb/inferior.h:356
    #7 0x16a2877 in unpush_target(target_ops*) /home/vries/gdb_versions/devel/src/gdb/target.c:607
    #8 0x16a2adf in unpush_target_and_assert /home/vries/gdb_versions/devel/src/gdb/target.c:655
    #9 0x16a2c57 in pop_all_targets_at_and_above(strata) /home/vries/gdb_versions/devel/src/gdb/target.c:678
    #10 0x1442749 in remote_unpush_target /home/vries/gdb_versions/devel/src/gdb/remote.c:5522
    #11 0x1458c16 in remote_target::readchar(int) /home/vries/gdb_versions/devel/src/gdb/remote.c:9137
    #12 0x145b25b in remote_target::getpkt_or_notif_sane_1(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int, int, int*) /home/vries/gdb_versions/devel/src/gdb/remote.c:9683
    #13 0x145bc9a in remote_target::getpkt_sane(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int) /home/vries/gdb_versions/devel/src/gdb/remote.c:9790
    #14 0x145b040 in remote_target::getpkt(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int) /home/vries/gdb_versions/devel/src/gdb/remote.c:9623
    #15 0x145780b in remote_target::remote_read_bytes_1(unsigned long, unsigned char*, unsigned long, int, unsigned long*) /home/vries/gdb_versions/devel/src/gdb/remote.c:8860
    #16 0x145805e in remote_target::remote_read_bytes(unsigned long, unsigned char*, unsigned long, int, unsigned long*) /home/vries/gdb_versions/devel/src/gdb/remote.c:8987
    #17 0x146113a in remote_target::xfer_partial(target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/vries/gdb_versions/devel/src/gdb/remote.c:10987
    #18 0x16a4004 in raw_memory_xfer_partial(target_ops*, unsigned char*, unsigned char const*, unsigned long, long, unsigned long*) /home/vries/gdb_versions/devel/src/gdb/target.c:918
    #19 0x16a4fcf in target_xfer_partial(target_ops*, target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/vries/gdb_versions/devel/src/gdb/target.c:1156
    #20 0x16a5d65 in target_read_partial /home/vries/gdb_versions/devel/src/gdb/target.c:1387
    #21 0x16a5f19 in target_read(target_ops*, target_object, char const*, unsigned char*, unsigned long, long) /home/vries/gdb_versions/devel/src/gdb/target.c:1427
    #22 0x16a5666 in target_read_raw_memory(unsigned long, unsigned char*, long) /home/vries/gdb_versions/devel/src/gdb/target.c:1260
    #23 0xd22f2a in dcache_read_line /home/vries/gdb_versions/devel/src/gdb/dcache.c:336
    #24 0xd232b7 in dcache_peek_byte /home/vries/gdb_versions/devel/src/gdb/dcache.c:403
    #25 0xd23845 in dcache_read_memory_partial(target_ops*, dcache_struct*, unsigned long, unsigned char*, unsigned long, unsigned long*) /home/vries/gdb_versions/devel/src/gdb/dcache.c:484
    #26 0x16a47da in memory_xfer_partial_1 /home/vries/gdb_versions/devel/src/gdb/target.c:1041
    #27 0x16a4a1e in memory_xfer_partial /home/vries/gdb_versions/devel/src/gdb/target.c:1084
    #28 0x16a4f44 in target_xfer_partial(target_ops*, target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/vries/gdb_versions/devel/src/gdb/target.c:1141
    #29 0x18203d4 in read_value_memory(value*, long, int, unsigned long, unsigned char*, unsigned long) /home/vries/gdb_versions/devel/src/gdb/valops.c:956

previously allocated by thread T0 here:
    #0 0x7f550f515c37 in operator new(unsigned long) (/usr/lib64/libasan.so.6+0xadc37)
    #1 0x14429f0 in remote_target::open_1(char const*, int, int) /home/vries/gdb_versions/devel/src/gdb/remote.c:5562
    #2 0x14405e6 in extended_remote_target::open(char const*, int) /home/vries/gdb_versions/devel/src/gdb/remote.c:4907
    #3 0x16a0f3c in open_target /home/vries/gdb_versions/devel/src/gdb/target.c:242
    #4 0xc19ff5 in do_sfunc /home/vries/gdb_versions/devel/src/gdb/cli/cli-decode.c:111
    #5 0xc221db in cmd_func(cmd_list_element*, char const*, int) /home/vries/gdb_versions/devel/src/gdb/cli/cli-decode.c:2181
    #6 0x16feda6 in execute_command(char const*, int) /home/vries/gdb_versions/devel/src/gdb/top.c:668
    #7 0xee9dc9 in command_handler(char const*) /home/vries/gdb_versions/devel/src/gdb/event-top.c:588
    #8 0xeea6a8 in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /home/vries/gdb_versions/devel/src/gdb/event-top.c:773
    #9 0xee8a12 in gdb_rl_callback_handler /home/vries/gdb_versions/devel/src/gdb/event-top.c:219
    #10 0x7f550f24aead in rl_callback_read_char (/lib64/libreadline.so.7+0x31ead)

SUMMARY: AddressSanitizer: heap-use-after-free /usr/include/c++/11/bits/hashtable.h:719 in std::_Hashtable<gdbarch*, std::pair<gdbarch* const, remote_arch_state>, std::allocator<std::pair<gdbarch* const, remote_arch_state> >, std::__detail::_Select1st, std::equal_to<gdbarch*>, std::hash<gdbarch*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_bucket_index(unsigned long) const
Shadow bytes around the buggy address:
  0x0c2e8000fdf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2e8000fe00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000fe10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000fe20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000fe30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c2e8000fe40: fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd
  0x0c2e8000fe50: fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2e8000fe60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2e8000fe70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000fe80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000fe90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==7196==ABORTING
Killing process(es): 7234
FAIL: gdb.multi/multi-target.exp: continue: non-stop=on: $read_prompt
...
Comment 5 Tom de Vries 2020-09-15 09:42:00 UTC
Created attachment 12842 [details]
gdb.log.gz from tmp.4
Comment 6 Tom de Vries 2020-09-15 10:14:35 UTC
My understanding of what's going on:
- gdb is executing remote_target::readchar
- a SERIAL_EOF is read
- consequently, "remote_unpush_target (this)" is called, destroying
  the associated extended_remote_target object
- subsequently an event is handled that has a pointer to the destroyed object
- the destroyed object is accessed, and address sanitizer triggers

Pedro, could you take a look?  This looks connected to the multi-target support.
Comment 7 Pedro Alves 2020-09-25 23:58:35 UTC
I've been trying to reproduce this, but without success so far...

I'm using 7361f908da9fbeceb0b65ac89b3dcecc4d8c57c2, just before your patch to handle the prompt in gdbserver_exit.  Then I applied the patch in this bug's description, the one with "gdb_assert {$read_prompt}", since I see from the tmp.4 logs you attached that you're using that.  Then I'm running the multi-target.exp testcase in a loop, and using "stress -c $NUM_CORES" at the same time.  I double check that all CPUs are pegged at 100%.

I can't figure out how this happens by looking at the code or the logs either.

remote_target::~remote_target gets rid of the async event handler:

 remote_target::~remote_target ()
 {
 ...
   if (rs->remote_async_inferior_event_token)
     delete_async_event_handler (&rs->remote_async_inferior_event_token);
 ...

So I don't understand how come we later end up in:

...
    #3 0x14306db in remote_state::get_remote_arch_state(gdbarch*) /home/vries/gdb_versions/devel/src/gdb/remote.c:1203
    #4 0x14309dc in remote_target::get_remote_state() /home/vries/gdb_versions/devel/src/gdb/remote.c:1232
    #5 0x1470c08 in remote_async_inferior_event_handler /home/vries/gdb_versions/devel/src/gdb/remote.c:14169
...

for that target.  That shouldn't ever happen.  There's an early return in
'remote_target::~remote_target()' which could explain this, but that shouldn't be taken.

Looking at the tmp.4 log, just before the ASAN ERROR we see:

inferior 2
[Switching to inferior 2 [process 7222] (/home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.multi/multi-target/multi-target)]
[Switching to thread 2.1 (Thread 7222.7222)]
#0  function1 () at /home/vries/gdb_versions/devel/src/gdb/testsuite/gdb.multi/multi-target.c:51
51	  while (wait_for_gdb)
(gdb) PASS: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 2
Remote debugging from host ::1, port 39192
Process /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.multi/multi-target/multi-target created; pid = 7222
monitor exit
Killing process(es): 7222
FAIL: gdb.multi/multi-target.exp: continue: non-stop=on: $read_prompt
inferior 5
(gdb) PASS: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 5
Remote debugging from host ::1, port 45000
Process /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.multi/multi-target/multi-target created; pid = 7234
=================================================================
[1m[31m==7196==ERROR: AddressSanitizer: heap-use-after-free on address 


This was multi-target.exp's cleanup_gdbservers being called.  The inferiors with gdbserver connections are inferiors 2 and 5.  So first gdb switches to inferior 2, and issues "monitor exit".  Then switches to inferior 5 and seemingly that's what causes the bad code path.

This frame:

 #29 0x18203d4 in read_value_memory(value*, long, int, unsigned long, unsigned char*, unsigned long) /home/vries/gdb_versions/devel/src/gdb/valops.c:956

is most probably GDB printing the selected frame after switching to inferior 5.

It's almost as if while reading the current frame for inferior 5, in remote_target::readchar() GDB gets the SERIAL_EOF for the file descriptor of the remote connection of inferior 2, the one we had called "monitor exit" for (!).  How can that be?

Totally confused, and I don't think I can do better without being able to reproduce and sprinkle debug logs throughout.  :-/
Comment 8 Tom de Vries 2020-09-29 13:25:08 UTC
(In reply to Pedro Alves from comment #7)

I now also have trouble reproducing this.
Comment 9 Tom de Vries 2020-11-29 09:19:51 UTC
(In reply to Tom de Vries from comment #8)
> (In reply to Pedro Alves from comment #7)
> 
> I now also have trouble reproducing this.

Ok, I managed to reproduce this again using:
- build from gdb-10.1-release tag
- CFLAGS/CXXFLAGS -O0 -g
- gcc 7.5.0
- target board unix/-m32 on a no-PIE-by-default platform
- stress -c 10

Triggered once (run 271) in a run of a 1000.
Comment 10 Tom de Vries 2020-11-29 10:20:33 UTC
I did another series with the same circumstances, but with gdb setup to generate a backtrace after running:
...
diff --git a/gdb/testsuite/gdb.multi/multi-target.exp.tcl b/gdb/testsuite/gdb.multi
/multi-target.exp.tcl
index 8dcd413f58..e2c78214c4 100644
--- a/gdb/testsuite/gdb.multi/multi-target.exp.tcl
+++ b/gdb/testsuite/gdb.multi/multi-target.exp.tcl
@@ -106,11 +106,13 @@ proc cleanup_gdbservers { } {
 
 proc setup {non-stop} {
     global gcorefile gcore_created
-    global binfile
+    global binfile GDB
 
     cleanup_gdbservers
-    clean_restart ${binfile}
-
+    save_vars GDB {
+       set GDB "/usr/bin/gdb -iex \"set trace-commands on\" -batch -ex run -ex backtrace -ex quit --args $GDB"
+       clean_restart ${binfile}
+    }
     # multi-target depends on target running in non-stop mode.  Force
     # it on for remote targets, until this is the default.
     gdb_test_no_output "maint set target-non-stop on"
...

At run 28, I ran into:
...
(gdb) PASS: gdb.multi/multi-target-continue.exp: continue: non-stop=on: inferior 2
Remote debugging from host ::1, port 42094^M
Process /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.multi/multi-target-continue/multi-target-continue created; pid = 25075^M
monitor exit^M
(gdb) Killing process(es): 25075^M
^M
Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.^M
0x00000000004aa8fd in mark_async_event_handler (async_handler_ptr=0x0) at /home/vries/gdb_versions/devel/src/gdb/async-event.c:269^M
269       async_handler_ptr->ready = 1;^M
+backtrace^M
#0  0x00000000004aa8fd in mark_async_event_handler (async_handler_ptr=0x0) at /home/vries/gdb_versions/devel/src/gdb/async-event.c:269^M
#1  0x00000000008da4c9 in remote_async_inferior_event_handler (data=0x1f59990) at /home/vries/gdb_versions/devel/src/gdb/remote.c:14179^M
#2  0x00000000004aa95e in check_async_event_handlers () at /home/vries/gdb_versions/devel/src/gdb/async-event.c:295^M
#3  0x0000000000d0ead1 in gdb_do_one_event () at /home/vries/gdb_versions/devel/src/gdbsupport/event-loop.cc:194^M
#4  0x0000000000795ec7 in start_event_loop () at /home/vries/gdb_versions/devel/src/gdb/main.c:356^M
#5  0x0000000000795fe7 in captured_command_loop () at /home/vries/gdb_versions/devel/src/gdb/main.c:416^M
#6  0x0000000000797662 in captured_main (data=0x7fffffffd310) at /home/vries/gdb_versions/devel/src/gdb/main.c:1253^M
...
Comment 11 Tom de Vries 2020-11-29 11:24:01 UTC
As experiment, I tried:
...
diff --git a/gdb/remote.c b/gdb/remote.c
index 59075cb09f..4dd165255e 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -4084,6 +4084,7 @@ remote_target::~remote_target ()
     return;
 
   serial_close (rs->remote_desc);
+  rs->remote_desc = nullptr;
 
   /* We are destroying the remote target, so we should discard
      everything of this target.  */
@@ -4093,6 +4094,7 @@ remote_target::~remote_target ()
     delete_async_event_handler (&rs->remote_async_inferior_event_token);
 
   delete rs->notif_state;
+  rs->notif_state = nullptr;
 }
 
 /* Query the remote side for the text, data and bss offsets.  */
...

And got (again at run 28):
...
(gdb) PASS: gdb.multi/multi-target-continue.exp: continue: non-stop=on: inferior 2
Remote debugging from host ::1, port 42904
Process /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.multi/multi-target-continue/multi-target-continue created; pid = 27519
monitor exit
(gdb) Killing process(es): 27519

FAIL: gdb.multi/multi-target-continue.exp: continue: non-stop=on: inferior 5 (timeout)
Remote debugging from host ::1, port 43110
Process /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.multi/multi-target-continue/multi-target-continue created; pid = 27531
Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
0x00000000008e5a69 in std::equal_to<gdbarch*>::operator() (this=0x1f59be0, __x=@0x7fffffffd100: 0x2264740, __y=<error reading variable>) at /usr/include/c++/7/bits/stl_function.h:356
356           { return __x == __y; }
+backtrace
#0  0x00000000008e5a69 in std::equal_to<gdbarch*>::operator() (this=0x1f59be0, __x=@0x7fffffffd100: 0x2264740, __y=<error reading variable>) at /usr/include/c++/7/bits/stl_function.h:356
#1  0x00000000008e5085 in std::__detail::_Equal_helper<gdbarch*, std::pair<gdbarch* const, remote_arch_state>, std::__detail::_Select1st, std::equal_to<gdbarch*>, unsigned long, false>::_S_equals (__eq=..., __extract=..., __k=@0x7fffffffd100: 0x2264740, __n=0x60) at /usr/include/c++/7/bits/hashtable_policy.h:1444
#2  0x00000000008e4424 in std::__detail::_Hashtable_base<gdbarch*, std::pair<gdbarch* const, remote_arch_state>, std::__detail::_Select1st, std::equal_to<gdbarch*>, std::hash<gdbarch*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Hashtable_traits<false, false, true> >::_M_equals (this=0x1f59be0, __k=@0x7fffffffd100: 0x2264740, __c=36063040, __n=0x60) at /usr/include/c++/7/bits/hashtable_policy.h:1815
#3  0x00000000008e3077 in std::_Hashtable<gdbarch*, std::pair<gdbarch* const, remote_arch_state>, std::allocator<std::pair<gdbarch* const, remote_arch_state> >, std::__detail::_Select1st, std::equal_to<gdbarch*>, std::hash<gdbarch*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node (this=0x1f59be0, __n=0, __k=@0x7fffffffd100: 0x2264740, __code=36063040) at /usr/include/c++/7/bits/hashtable.h:1551
#4  0x00000000008e167a in std::_Hashtable<gdbarch*, std::pair<gdbarch* const, remote_arch_state>, std::allocator<std::pair<gdbarch* const, remote_arch_state> >, std::__detail::_Select1st, std::equal_to<gdbarch*>, std::hash<gdbarch*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_node (this=0x1f59be0, __bkt=0, __key=@0x7fffffffd100: 0x2264740, __c=36063040) at /usr/include/c++/7/bits/hashtable.h:642
#5  0x00000000008df5ac in std::_Hashtable<gdbarch*, std::pair<gdbarch* const, remote_arch_state>, std::allocator<std::pair<gdbarch* const, remote_arch_state> >, std::__detail::_Select1st, std::equal_to<gdbarch*>, std::hash<gdbarch*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::find (this=0x1f59be0, __k=@0x7fffffffd100: 0x2264740) at /usr/include/c++/7/bits/hashtable.h:1425
#6  0x00000000008ddc6b in std::unordered_map<gdbarch*, remote_arch_state, std::hash<gdbarch*>, std::equal_to<gdbarch*>, std::allocator<std::pair<gdbarch* const, remote_arch_state> > >::find (this=0x1f59be0, __x=@0x7fffffffd100: 0x2264740) at /usr/include/c++/7/bits/unordered_map.h:920
#7  0x00000000008be731 in remote_state::get_remote_arch_state (this=0x1f599a8, gdbarch=0x2264740) at /home/vries/gdb_versions/devel/src/gdb/remote.c:1203
#8  0x00000000008be84f in remote_target::get_remote_state (this=0x1f59990) at /home/vries/gdb_versions/devel/src/gdb/remote.c:1232
#9  0x00000000008da485 in remote_async_inferior_event_handler (data=0x1f59990) at /home/vries/gdb_versions/devel/src/gdb/remote.c:14171
#10 0x00000000004aa95e in check_async_event_handlers () at /home/vries/gdb_versions/devel/src/gdb/async-event.c:295
#11 0x0000000000d0eaef in gdb_do_one_event () at /home/vries/gdb_versions/devel/src/gdbsupport/event-loop.cc:194
#12 0x0000000000795ec7 in start_event_loop () at /home/vries/gdb_versions/devel/src/gdb/main.c:356
#13 0x0000000000795fe7 in captured_command_loop () at /home/vries/gdb_versions/devel/src/gdb/main.c:416
#14 0x0000000000797662 in captured_main (data=0x7fffffffd310) at /home/vries/gdb_versions/devel/src/gdb/main.c:1253
#15 0x00000000007976c8 in gdb_main (args=0x7fffffffd310) at /home/vries/gdb_versions/devel/src/gdb/main.c:1268
#16 0x0000000000415a9e in main (argc=5, argv=0x7fffffffd418) at /home/vries/gdb_versions/devel/src/gdb/gdb.c:32
+quit
...
Comment 12 Simon Marchi 2020-11-29 14:53:03 UTC
It looks very similar to a problem I hit while doing some unrelated changes.  I think the reason is:

static void
remote_async_inferior_event_handler (gdb_client_data data)
{
  inferior_event_handler (INF_REG_EVENT);  <--- remote target can get destroyed anywhere here

  remote_target *remote = (remote_target *) data;
  remote_state *rs = remote->get_remote_state ();  <--- we dereference the remote target here

  /* inferior_event_handler may have consumed an event pending on the
     infrun side without calling target_wait on the REMOTE target, or
     may have pulled an event out of a different target.  Keep trying
     for this remote target as long it still has either pending events
     or unacknowledged notifications.  */

  if (rs->notif_state->pending_event[notif_client_stop.id] != NULL
      || !rs->stop_reply_queue.empty ())
    mark_async_event_handler (rs->remote_async_inferior_event_token);
}


inferior_event_target can lead to the remote target to be destroyed.  An "exit" stop reply from the target can lead to it, but communication failure is another one.

The reason why it is difficult to reproduce is that inferior_event_handler is also called from remote_async_serial_handler, which doesn't have the same code that dereferences the remote target.  So the bug can only trigger when inferior_event_handler is called through remote_async_inferior_event_handler, which likely happens less often than being called through remote_async_serial_handler.

I did write this patch below to make it so inferior_event_handler is always called through remote_async_inferior_event_handler, that automatically triggers the bug in the simplest cases (the inferior exits with "target remote").  It makes it so remote_async_serial_handler doesn't call remote_async_inferior_event_handler anymore, but just marks the remote target's async handler.

What I like from this patch is that there's now a single path from the remote target to inferior_event_handler, so less variations to consider, more determinism.  In the patch log, I talk about an exit event, but you can replace that in your head by "communication failure" which leads to the remote target being destroyed, which is can really happen at any time.


From 5c358d0604e0ac3e2bd9aca02234cb20f1caea42 Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@polymtl.ca>
Date: Fri, 27 Nov 2020 13:32:55 -0500
Subject: [PATCH] gdb: make remote_async_serial_handler just mark remote
 target's async event

The purpose of this patch is to uncover an existing bug that is
difficult to trigger otherwise.  But it makes the remote target flow a
bit simpler to understand, which is in my opinion a welcome change.

The bug is a possible use-after-free in
remote_async_inferior_event_handler:

    static void
    remote_async_inferior_event_handler (gdb_client_data data)
    {
      inferior_event_handler (INF_REG_EVENT);

      remote_target *remote = (remote_target *) data;
      remote_state *rs = remote->get_remote_state ();

      ...
    }

These are the steps that lead to it:

1. inferior_event_handler is called, deep down that calls
   remote_target::wait, then infrun handles the returned event
2. the event returned by remote_target::wait is an exit event
3. remote_target::mourn_inferior gets called, which leads to the remote
   target getting unpushed and deleted, because its refcount drops to 0
4. back to remote_async_inferior_event_handler, the
   `remote->get_remote_state ()` line uses the remote target object
   after it was deleted.

The reason why we don't see this bug on master (at least not often) is
that inferior_event_handler is most often called through
remote_async_inferior_event_handler, as a reaction of the file
descriptor used to talk to the remote side being readable:

    static void
    remote_async_serial_handler (struct serial *scb, void *context)
    {
      /* Don't propogate error information up to the client.  Instead let
         the client find out about the error by querying the target.  */
      inferior_event_handler (INF_REG_EVENT);
    }

I don't how exactly, but I am pretty sure (as I hit it in some rare
occasion) that some particular scheduling and event sequence can lead to
the exit event being handled through the inferior_event_handler call in
remote_async_inferior_event_token and to the bug described above.

Change-Id: If8d0daab97aa5338fe60d9e9b5a4ba3b5746eaea
---
 gdb/remote.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 71f814efb365..53ff8b63a1dc 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -14161,14 +14161,12 @@ remote_target::is_async_p ()
    will be able to delay notifying the client of an event until the
    point where an entire packet has been received.  */
 
-static serial_event_ftype remote_async_serial_handler;
-
 static void
 remote_async_serial_handler (struct serial *scb, void *context)
 {
-  /* Don't propogate error information up to the client.  Instead let
-     the client find out about the error by querying the target.  */
-  inferior_event_handler (INF_REG_EVENT);
+  remote_state *rs = (remote_state *) context;
+
+  mark_async_event_handler (rs->remote_async_inferior_event_token);
 }
 
 static void
-- 
2.29.2
Comment 13 Simon Marchi 2020-11-29 15:15:35 UTC
If that's indeed the problem, I can see two solutions:

1. Have some kind of listener that the function can install locally to get notified if the target gets destroyed, and skip the following code if so.
2. That code exists to re-mark the async event handler if we still have things to report, because it is automatically cleared by check_async_event_handlers.  Instead, we can make check_async_event_handlers leave the async event handler marked and let the target clear it in its ::wait implementation when it no longer has anything to report.  Or, as a stop-gap solution, have the remote_async_inferior_event_handler re-mark the handler before calling inferior_event_handler and clear it in ::wait if there's nothing else to report.

I have a patch that does that (in the middle of a big work-in-progress series), as you can see it gets rid of the problematic code:

   https://review.lttng.org/c/binutils-gdb/+/4050/11
Comment 14 Tom de Vries 2020-11-30 10:32:08 UTC
(In reply to Simon Marchi from comment #12)
> It looks very similar to a problem I hit while doing some unrelated changes.
> I think the reason is:
> 
> static void
> remote_async_inferior_event_handler (gdb_client_data data)
> {
>   inferior_event_handler (INF_REG_EVENT);  <--- remote target can get
> destroyed anywhere here
> 
>   remote_target *remote = (remote_target *) data;
>   remote_state *rs = remote->get_remote_state ();  <--- we dereference the
> remote target here
> 
>   /* inferior_event_handler may have consumed an event pending on the
>      infrun side without calling target_wait on the REMOTE target, or
>      may have pulled an event out of a different target.  Keep trying
>      for this remote target as long it still has either pending events
>      or unacknowledged notifications.  */
> 
>   if (rs->notif_state->pending_event[notif_client_stop.id] != NULL
>       || !rs->stop_reply_queue.empty ())
>     mark_async_event_handler (rs->remote_async_inferior_event_token);
> }
> 
> 
> inferior_event_target can lead to the remote target to be destroyed.  An
> "exit" stop reply from the target can lead to it, but communication failure
> is another one.
> 

Agreed, it's the same problem.  Thanks for helping me out here.

> The reason why it is difficult to reproduce is that inferior_event_handler
> is also called from remote_async_serial_handler, which doesn't have the same
> code that dereferences the remote target.  So the bug can only trigger when
> inferior_event_handler is called through
> remote_async_inferior_event_handler, which likely happens less often than
> being called through remote_async_serial_handler.
> 

I see.

> I did write this patch below to make it so inferior_event_handler is always
> called through remote_async_inferior_event_handler, that automatically
> triggers the bug in the simplest cases (the inferior exits with "target
> remote").  It makes it so remote_async_serial_handler doesn't call
> remote_async_inferior_event_handler anymore, but just marks the remote
> target's async handler.
> 

Yep, works for me to trigger the problem.  Using this patch, I trigger the original failure with -m32:
...
Running src/gdb/testsuite/gdb.multi/multi-target-continue.exp ...
ERROR: GDB process no longer exists
...
and this one with -m64:
...
Running src/gdb/testsuite/gdb.multi/multi-target-no-resumed.exp ...
ERROR: GDB process no longer exists
...
when running gdb.multi/*.exp.

> What I like from this patch is that there's now a single path from the
> remote target to inferior_event_handler, so less variations to consider,
> more determinism.

Agreed, that's a good thing :)
Comment 15 Tom de Vries 2020-11-30 11:02:42 UTC
(In reply to Simon Marchi from comment #13)
> If that's indeed the problem, I can see two solutions:
> 
> 1. Have some kind of listener that the function can install locally to get
> notified if the target gets destroyed, and skip the following code if so.

OK, I see how that would work.

> 2. That code exists to re-mark the async event handler if we still have
> things to report, because it is automatically cleared by
> check_async_event_handlers.  Instead, we can make check_async_event_handlers
> leave the async event handler marked and let the target clear it in its
> ::wait implementation when it no longer has anything to report.  Or, as a
> stop-gap solution, have the remote_async_inferior_event_handler re-mark the
> handler before calling inferior_event_handler and clear it in ::wait if
> there's nothing else to report.
> 
> I have a patch that does that (in the middle of a big work-in-progress
> series), as you can see it gets rid of the problematic code:
> 
>    https://review.lttng.org/c/binutils-gdb/+/4050/11

I've looked at this briefly, but unfortunately I'm unfamiliar with this code, so I'm not able to comment.

FWIW, I wrote an adhoc patch to test my understanding of the problem.  It indeed fixes the ERRORs triggered by the trigger patch:
...
diff --git a/gdb/async-event.c b/gdb/async-event.c
index ffc0edcde8..702b434365 100644
--- a/gdb/async-event.c
+++ b/gdb/async-event.c
@@ -327,3 +327,18 @@ delete_async_event_handler
   xfree (*async_handler_ptr);
   *async_handler_ptr = NULL;
 }
+
+extern bool has_async_event_handler (gdb_client_data data);
+
+bool
+has_async_event_handler (gdb_client_data data)
+{
+  async_event_handler *ptr;
+  for (ptr = async_event_handler_list.first_handler;
+       ptr != NULL;
+       ptr = ptr->next_handler)
+    if (ptr->client_data == data)
+      return true;
+
+  return false;
+}
diff --git a/gdb/remote.c b/gdb/remote.c
index 59075cb09f..8e42c32249 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -14155,15 +14155,19 @@ 
 static void
 remote_async_inferior_event_handler (gdb_client_data data)
 {
+  extern bool has_async_event_handler (gdb_client_data data);
+  gdb_assert (has_async_event_handler (data));
   inferior_event_handler (INF_REG_EVENT);
+  if (!has_async_event_handler (data))
+    return;
 
   remote_target *remote = (remote_target *) data;
   remote_state *rs = remote->get_remote_state ();
...
and indeed that works.

AFAIU, this patch is not good enough because it tests for equality with a pointer to a deleted object.
Comment 16 Tom de Vries 2020-11-30 12:06:52 UTC
Created attachment 13008 [details]
Tentative patch, implementing the "listener" approach
Comment 17 Simon Marchi 2020-11-30 14:11:06 UTC
(In reply to Tom de Vries from comment #15)
> diff --git a/gdb/async-event.c b/gdb/async-event.c
> index ffc0edcde8..702b434365 100644
> --- a/gdb/async-event.c
> +++ b/gdb/async-event.c
> @@ -327,3 +327,18 @@ delete_async_event_handler
>    xfree (*async_handler_ptr);
>    *async_handler_ptr = NULL;
>  }
> +
> +extern bool has_async_event_handler (gdb_client_data data);
> +
> +bool
> +has_async_event_handler (gdb_client_data data)
> +{
> +  async_event_handler *ptr;
> +  for (ptr = async_event_handler_list.first_handler;
> +       ptr != NULL;
> +       ptr = ptr->next_handler)
> +    if (ptr->client_data == data)
> +      return true;
> +
> +  return false;
> +}
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 59075cb09f..8e42c32249 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -14155,15 +14155,19 @@ 
>  static void
>  remote_async_inferior_event_handler (gdb_client_data data)
>  {
> +  extern bool has_async_event_handler (gdb_client_data data);
> +  gdb_assert (has_async_event_handler (data));
>    inferior_event_handler (INF_REG_EVENT);
> +  if (!has_async_event_handler (data))
> +    return;
>  
>    remote_target *remote = (remote_target *) data;
>    remote_state *rs = remote->get_remote_state ();
> ...
> and indeed that works.
> 
> AFAIU, this patch is not good enough because it tests for equality with a
> pointer to a deleted object.

Indeed, another listener could have been allocated in the mean time with the same address, causing a false test result.  But the idea is the right one.

I would prefer the second approach though, because it fixes the problem by reducing overall complexity, rather than adding more things.  And it's something I would have proposed for merging for another as a base for some other work, I would just send it a bit earlier than expected.
Comment 18 Tom de Vries 2020-11-30 14:35:15 UTC
(In reply to Simon Marchi from comment #17)
> (In reply to Tom de Vries from comment #15)
> > diff --git a/gdb/async-event.c b/gdb/async-event.c
> > index ffc0edcde8..702b434365 100644
> > --- a/gdb/async-event.c
> > +++ b/gdb/async-event.c
> > @@ -327,3 +327,18 @@ delete_async_event_handler
> >    xfree (*async_handler_ptr);
> >    *async_handler_ptr = NULL;
> >  }
> > +
> > +extern bool has_async_event_handler (gdb_client_data data);
> > +
> > +bool
> > +has_async_event_handler (gdb_client_data data)
> > +{
> > +  async_event_handler *ptr;
> > +  for (ptr = async_event_handler_list.first_handler;
> > +       ptr != NULL;
> > +       ptr = ptr->next_handler)
> > +    if (ptr->client_data == data)
> > +      return true;
> > +
> > +  return false;
> > +}
> > diff --git a/gdb/remote.c b/gdb/remote.c
> > index 59075cb09f..8e42c32249 100644
> > --- a/gdb/remote.c
> > +++ b/gdb/remote.c
> > @@ -14155,15 +14155,19 @@ 
> >  static void
> >  remote_async_inferior_event_handler (gdb_client_data data)
> >  {
> > +  extern bool has_async_event_handler (gdb_client_data data);
> > +  gdb_assert (has_async_event_handler (data));
> >    inferior_event_handler (INF_REG_EVENT);
> > +  if (!has_async_event_handler (data))
> > +    return;
> >  
> >    remote_target *remote = (remote_target *) data;
> >    remote_state *rs = remote->get_remote_state ();
> > ...
> > and indeed that works.
> > 
> > AFAIU, this patch is not good enough because it tests for equality with a
> > pointer to a deleted object.
> 
> Indeed, another listener could have been allocated in the mean time with the
> same address, causing a false test result.  But the idea is the right one.
> 
> I would prefer the second approach though, because it fixes the problem by
> reducing overall complexity, rather than adding more things.  And it's
> something I would have proposed for merging for another as a base for some
> other work, I would just send it a bit earlier than expected.

I agree with that, for master.

I was thinking though in the direction of having a fix on top of 10.1, possibly as part of 10.2, otherwise as part of the openSUSE package.  Such a patch would have to be the minimal-fix kind, and there, AFAIU the listener approach seems more appropriate.
Comment 19 Simon Marchi 2020-11-30 14:57:27 UTC
(In reply to Tom de Vries from comment #18)
> I was thinking though in the direction of having a fix on top of 10.1,
> possibly as part of 10.2, otherwise as part of the openSUSE package.  Such a
> patch would have to be the minimal-fix kind, and there, AFAIU the listener
> approach seems more appropriate.

Yeah that makes sense.
Comment 20 Pedro Alves 2020-11-30 15:00:57 UTC
target_ops is reference counted.  Wouldn't this patch fix the issue?

diff --git c/gdb/remote.c w/gdb/remote.c
index 71f814efb36..5a5ac1ee92f 100644
--- c/gdb/remote.c
+++ w/gdb/remote.c
@@ -14174,9 +14174,13 @@ remote_async_serial_handler (struct serial *scb, void *context)
 static void
 remote_async_inferior_event_handler (gdb_client_data data)
 {
+  remote_target *remote = (remote_target *) data;
+  /* Hold a strong reference to the remote target while handling an
+     event, since that could result in closing the connection.  */
+  auto remote_ref = target_ops_ref::new_reference (remote);
+
   inferior_event_handler (INF_REG_EVENT);
 
-  remote_target *remote = (remote_target *) data;
   remote_state *rs = remote->get_remote_state ();
 
   /* inferior_event_handler may have consumed an event pending on the
Comment 21 Simon Marchi 2020-11-30 15:10:48 UTC
(In reply to Pedro Alves from comment #20)
> target_ops is reference counted.  Wouldn't this patch fix the issue?
> 
> diff --git c/gdb/remote.c w/gdb/remote.c
> index 71f814efb36..5a5ac1ee92f 100644
> --- c/gdb/remote.c
> +++ w/gdb/remote.c
> @@ -14174,9 +14174,13 @@ remote_async_serial_handler (struct serial *scb,
> void *context)
>  static void
>  remote_async_inferior_event_handler (gdb_client_data data)
>  {
> +  remote_target *remote = (remote_target *) data;
> +  /* Hold a strong reference to the remote target while handling an
> +     event, since that could result in closing the connection.  */
> +  auto remote_ref = target_ops_ref::new_reference (remote);
> +
>    inferior_event_handler (INF_REG_EVENT);
>  
> -  remote_target *remote = (remote_target *) data;
>    remote_state *rs = remote->get_remote_state ();
>  
>    /* inferior_event_handler may have consumed an event pending on the

Hmm, right.  I have my patch series almost ready to send, and I think it's a desirable change in any case, so I'll still send it for review.
Comment 22 Tom de Vries 2021-01-07 12:27:02 UTC
(In reply to Tom de Vries from comment #14)
> (In reply to Simon Marchi from comment #12)
> > I did write this patch below to make it so inferior_event_handler is always
> > called through remote_async_inferior_event_handler, that automatically
> > triggers the bug in the simplest cases (the inferior exits with "target
> > remote").  It makes it so remote_async_serial_handler doesn't call
> > remote_async_inferior_event_handler anymore, but just marks the remote
> > target's async handler.
> > 
> 
> Yep, works for me to trigger the problem.  Using this patch, I trigger the
> original failure with -m32:
> ...
> Running src/gdb/testsuite/gdb.multi/multi-target-continue.exp ...
> ERROR: GDB process no longer exists
> ...
> and this one with -m64:
> ...
> Running src/gdb/testsuite/gdb.multi/multi-target-no-resumed.exp ...
> ERROR: GDB process no longer exists
> ...
> when running gdb.multi/*.exp.
> 
> > What I like from this patch is that there's now a single path from the
> > remote target to inferior_event_handler, so less variations to consider,
> > more determinism.
> 
> Agreed, that's a good thing :)

Using the trigger patch on trunk no longer works.

I've bisected this to commit 79952e69634 "Make scoped_restore_current_thread's cdtors exception free (RFC)".
Comment 23 Tom de Vries 2021-01-07 12:34:52 UTC
(In reply to Tom de Vries from comment #22)
> (In reply to Tom de Vries from comment #14)
> > (In reply to Simon Marchi from comment #12)
> > > I did write this patch below to make it so inferior_event_handler is always
> > > called through remote_async_inferior_event_handler, that automatically
> > > triggers the bug in the simplest cases (the inferior exits with "target
> > > remote").  It makes it so remote_async_serial_handler doesn't call
> > > remote_async_inferior_event_handler anymore, but just marks the remote
> > > target's async handler.
> > > 
> > 
> > Yep, works for me to trigger the problem.  Using this patch, I trigger the
> > original failure with -m32:
> > ...
> > Running src/gdb/testsuite/gdb.multi/multi-target-continue.exp ...
> > ERROR: GDB process no longer exists
> > ...
> > and this one with -m64:
> > ...
> > Running src/gdb/testsuite/gdb.multi/multi-target-no-resumed.exp ...
> > ERROR: GDB process no longer exists
> > ...
> > when running gdb.multi/*.exp.
> > 
> > > What I like from this patch is that there's now a single path from the
> > > remote target to inferior_event_handler, so less variations to consider,
> > > more determinism.
> > 
> > Agreed, that's a good thing :)
> 
> Using the trigger patch on trunk no longer works.
> 
> I've bisected this to commit 79952e69634 "Make
> scoped_restore_current_thread's cdtors exception free (RFC)".

OK, I read here ( https://sourceware.org/pipermail/gdb-patches/2020-July/170276.html ):
...
The last patch is a follow up that avoids swallowing exceptions in
scoped_restore_current_thread's dtor that I'm thinking would be a bit
too invasive to put in GDB 10, I think it could do with a longer
baking period in master.
...

So, that's not a good candidate to fix gdb 10.
Comment 24 Tom de Vries 2021-01-07 13:40:30 UTC
(In reply to Pedro Alves from comment #20)
> target_ops is reference counted.  Wouldn't this patch fix the issue?
> 
> diff --git c/gdb/remote.c w/gdb/remote.c
> index 71f814efb36..5a5ac1ee92f 100644
> --- c/gdb/remote.c
> +++ w/gdb/remote.c
> @@ -14174,9 +14174,13 @@ remote_async_serial_handler (struct serial *scb,
> void *context)
>  static void
>  remote_async_inferior_event_handler (gdb_client_data data)
>  {
> +  remote_target *remote = (remote_target *) data;
> +  /* Hold a strong reference to the remote target while handling an
> +     event, since that could result in closing the connection.  */
> +  auto remote_ref = target_ops_ref::new_reference (remote);
> +
>    inferior_event_handler (INF_REG_EVENT);
>  
> -  remote_target *remote = (remote_target *) data;
>    remote_state *rs = remote->get_remote_state ();
>  
>    /* inferior_event_handler may have consumed an event pending on the

Submitted: https://sourceware.org/pipermail/gdb-patches/2021-January/174760.html
Comment 25 Tom de Vries 2021-01-07 16:10:20 UTC
(In reply to Simon Marchi from comment #21)
> (In reply to Pedro Alves from comment #20)
> > target_ops is reference counted.  Wouldn't this patch fix the issue?
> > 
> > diff --git c/gdb/remote.c w/gdb/remote.c
> > index 71f814efb36..5a5ac1ee92f 100644
> > --- c/gdb/remote.c
> > +++ w/gdb/remote.c
> > @@ -14174,9 +14174,13 @@ remote_async_serial_handler (struct serial *scb,
> > void *context)
> >  static void
> >  remote_async_inferior_event_handler (gdb_client_data data)
> >  {
> > +  remote_target *remote = (remote_target *) data;
> > +  /* Hold a strong reference to the remote target while handling an
> > +     event, since that could result in closing the connection.  */
> > +  auto remote_ref = target_ops_ref::new_reference (remote);
> > +
> >    inferior_event_handler (INF_REG_EVENT);
> >  
> > -  remote_target *remote = (remote_target *) data;
> >    remote_state *rs = remote->get_remote_state ();
> >  
> >    /* inferior_event_handler may have consumed an event pending on the
> 
> Hmm, right.  I have my patch series almost ready to send, and I think it's a
> desirable change in any case, so I'll still send it for review.

Which was submitted here: https://sourceware.org/pipermail/gdb-patches/2020-November/173633.html
Comment 26 cvs-commit@gcc.gnu.org 2021-01-07 20:41:06 UTC
The gdb-10-branch branch has been updated by Tom de Vries <vries@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f498adf03ff6c9ee76a8bdb8078ece11c176798c

commit f498adf03ff6c9ee76a8bdb8078ece11c176798c
Author: Pedro Alves <pedro@palves.net>
Date:   Thu Jan 7 21:41:03 2021 +0100

    [gdb/remote] Fix invalid pointer in remote_async_serial_handler
    
    On rare occasions, we run into this ERROR/UNRESOLVED on gdb-10-branch:
    ...
    (gdb) PASS: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 2
    Remote debugging from host ::1, port 34088^M
    Process outputs/gdb.multi/multi-target/multi-target created; pid = 8649^M
    monitor exit^M
    (gdb) Killing process(es): 8649^M
    ERROR: GDB process no longer exists
    GDB process exited with wait status 8627 exp14 0 0 CHILDKILLED SIGABRT SIGABRT
    UNRESOLVED: gdb.multi/multi-target.exp: continue: non-stop=on: inferior 5
    ...
    
    A trigger patch makes the crash happen all the time:
    ...
    diff --git a/gdb/remote.c b/gdb/remote.c
    index 71f814efb365..53ff8b63a1dc 100644
    --- a/gdb/remote.c
    +++ b/gdb/remote.c
    @@ -14161,14 +14161,12 @@ remote_target::is_async_p ()
        will be able to delay notifying the client of an event until the
        point where an entire packet has been received.  */
    
    -static serial_event_ftype remote_async_serial_handler;
    -
     static void
     remote_async_serial_handler (struct serial *scb, void *context)
     {
    -  /* Don't propogate error information up to the client.  Instead let
    -     the client find out about the error by querying the target.  */
    -  inferior_event_handler (INF_REG_EVENT);
    +  remote_state *rs = (remote_state *) context;
    +
    +  mark_async_event_handler (rs->remote_async_inferior_event_token);
     }
    
     static void
    ...
    
    And using -fsanitizer=address we can get a more elaborate error message:
    ...
    ==7196==ERROR: AddressSanitizer: heap-use-after-free on address \
      0x6170000bf258 at pc 0x000001481755 bp 0x7fff05b20840 sp 0x7fff05b20838
    READ of size 8 at 0x6170000bf258 thread T0
        #0 0x1481754 in std::_Hashtable<gdbarch*, std::pair<gdbarch* const,
        remote_arch_state>, std::allocator<std::pair<gdbarch* const,
        remote_arch_state> >, std::__detail::_Select1st, std::equal_to<gdbarch*>,
        std::hash<gdbarch*>, std::__detail::_Mod_range_hashing,
        std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy,
        std::__detail::_Hashtable_traits<false, false, true>
        >::_M_bucket_index(unsigned long) const
        /usr/include/c++/11/bits/hashtable.h:719
        #1 0x147c8ab in std::_Hashtable<gdbarch*, std::pair<gdbarch* const,
        remote_arch_state>, std::allocator<std::pair<gdbarch* const,
        remote_arch_state> >, std::__detail::_Select1st, std::equal_to<gdbarch*>,
        std::hash<gdbarch*>, std::__detail::_Mod_range_hashing,
        std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy,
        std::__detail::_Hashtable_traits<false, false, true> >::find(gdbarch*
        const&) /usr/include/c++/11/bits/hashtable.h:1500
        #2 0x147852c in std::unordered_map<gdbarch*, remote_arch_state,
        std::hash<gdbarch*>, std::equal_to<gdbarch*>,
        std::allocator<std::pair<gdbarch* const, remote_arch_state> >
        >::find(gdbarch* const&) /usr/include/c++/11/bits/unordered_map.h:869
        #3 0x14306db in remote_state::get_remote_arch_state(gdbarch*)
        src/gdb/remote.c:1203
        #4 0x14309dc in remote_target::get_remote_state() src/gdb/remote.c:1232
        #5 0x1470c08 in remote_async_inferior_event_handler src/gdb/remote.c:14169
        #6 0xaa9f6b in check_async_event_handlers() src/gdb/async-event.c:295
        #7 0x1e93ab4 in gdb_do_one_event() src/gdbsupport/event-loop.cc:194
        #8 0x118f5f9 in start_event_loop src/gdb/main.c:356
        #9 0x118f8ed in captured_command_loop src/gdb/main.c:416
        #10 0x1192d6a in captured_main src/gdb/main.c:1253
        #11 0x1192dfa in gdb_main(captured_main_args*) src/gdb/main.c:1268
        #12 0x97b380 in main src/gdb/gdb.c:32
        #13 0x7f550c211349 in __libc_start_main ../csu/libc-start.c:308
        #14 0x97b199 in _start (build/gdb/gdb+0x97b199)
    
    0x6170000bf258 is located 600 bytes inside of 648-byte region \
      [0x6170000bf000,0x6170000bf288)
    freed by thread T0 here:
        #0 0x7f550f516a57 in operator delete(void*, unsigned long)
        (/usr/lib64/libasan.so.6+0xaea57)
        #1 0x148b1fe in extended_remote_target::~extended_remote_target()
        src/gdb/remote.c:958
        #2 0x143b483 in remote_target::close() src/gdb/remote.c:4074
        #3 0x16cb90f in target_close(target_ops*) src/gdb/target.c:3230
        #4 0x16a2635 in decref_target(target_ops*) src/gdb/target.c:557
        #5 0x16a2abb in target_stack::unpush(target_ops*) src/gdb/target.c:645
        #6 0x16d01ef in inferior::unpush_target(target_ops*)
        src/gdb/inferior.h:356
        #7 0x16a2877 in unpush_target(target_ops*) src/gdb/target.c:607
        #8 0x16a2adf in unpush_target_and_assert src/gdb/target.c:655
        #9 0x16a2c57 in pop_all_targets_at_and_above(strata) src/gdb/target.c:678
        #10 0x1442749 in remote_unpush_target src/gdb/remote.c:5522
        #11 0x1458c16 in remote_target::readchar(int) src/gdb/remote.c:9137
        #12 0x145b25b in remote_target::getpkt_or_notif_sane_1(std::vector<char,
        gdb::default_init_allocator<char, std::allocator<char> > >*, int, int,
        int*) src/gdb/remote.c:9683
        #13 0x145bc9a in remote_target::getpkt_sane(std::vector<char,
        gdb::default_init_allocator<char, std::allocator<char> > >*, int)
        src/gdb/remote.c:9790
        #14 0x145b040 in remote_target::getpkt(std::vector<char,
        gdb::default_init_allocator<char, std::allocator<char> > >*, int)
        src/gdb/remote.c:9623
        #15 0x145780b in remote_target::remote_read_bytes_1(unsigned long,
        unsigned char*, unsigned long, int, unsigned long*) src/gdb/remote.c:8860
        #16 0x145805e in remote_target::remote_read_bytes(unsigned long,
        unsigned char*, unsigned long, int, unsigned long*) src/gdb/remote.c:8987
        #17 0x146113a in remote_target::xfer_partial(target_object, char const*,
        unsigned char*, unsigned char const*, unsigned long, unsigned long,
        unsigned long*) src/gdb/remote.c:10987
        #18 0x16a4004 in raw_memory_xfer_partial(target_ops*, unsigned char*,
        unsigned char const*, unsigned long, long, unsigned long*)
        src/gdb/target.c:918
        #19 0x16a4fcf in target_xfer_partial(target_ops*, target_object, char
        const*, unsigned char*, unsigned char const*, unsigned long, unsigned
        long, unsigned long*) src/gdb/target.c:1156
        #20 0x16a5d65 in target_read_partial src/gdb/target.c:1387
        #21 0x16a5f19 in target_read(target_ops*, target_object, char const*,
        unsigned char*, unsigned long, long) src/gdb/target.c:1427
        #22 0x16a5666 in target_read_raw_memory(unsigned long, unsigned char*,
        long) src/gdb/target.c:1260
        #23 0xd22f2a in dcache_read_line src/gdb/dcache.c:336
        #24 0xd232b7 in dcache_peek_byte src/gdb/dcache.c:403
        #25 0xd23845 in dcache_read_memory_partial(target_ops*, dcache_struct*,
        unsigned long, unsigned char*, unsigned long, unsigned long*) src/gdb/dcache.c:484
        #26 0x16a47da in memory_xfer_partial_1 src/gdb/target.c:1041
        #27 0x16a4a1e in memory_xfer_partial src/gdb/target.c:1084
        #28 0x16a4f44 in target_xfer_partial(target_ops*, target_object,
        char const*, unsigned char*, unsigned char const*, unsigned long,
        unsigned long, unsigned long*) src/gdb/target.c:1141
        #29 0x18203d4 in read_value_memory(value*, long, int, unsigned long,
        unsigned char*, unsigned long) src/gdb/valops.c:956
    
    previously allocated by thread T0 here:
        #0 0x7f550f515c37 in operator new(unsigned long)
        (/usr/lib64/libasan.so.6+0xadc37)
        #1 0x14429f0 in remote_target::open_1(char const*, int, int)
        src/gdb/remote.c:5562
        #2 0x14405e6 in extended_remote_target::open(char const*, int)
        src/gdb/remote.c:4907
        #3 0x16a0f3c in open_target src/gdb/target.c:242
        #4 0xc19ff5 in do_sfunc src/gdb/cli/cli-decode.c:111
        #5 0xc221db in cmd_func(cmd_list_element*, char const*, int)
        src/gdb/cli/cli-decode.c:2181
        #6 0x16feda6 in execute_command(char const*, int) src/gdb/top.c:668
        #7 0xee9dc9 in command_handler(char const*) src/gdb/event-top.c:588
        #8 0xeea6a8 in command_line_handler(std::unique_ptr<char,
        gdb::xfree_deleter<char> >&&) src/gdb/event-top.c:773
        #9 0xee8a12 in gdb_rl_callback_handler src/gdb/event-top.c:219
        #10 0x7f550f24aead in rl_callback_read_char
        (/lib64/libreadline.so.7+0x31ead)
    ...
    
    The problem is here in remote_async_inferior_event_handler:
    ...
    static void
    remote_async_inferior_event_handler (gdb_client_data data)
    {
      inferior_event_handler (INF_REG_EVENT);
    
      remote_target *remote = (remote_target *) data;
      remote_state *rs = remote->get_remote_state ();
    ...
    
    The remote target (passed in the data argument) can be destroyed during the
    call to inferior_event_handler.  If so, the call to remote->get_remote_state
    () is done using a dangling pointer.
    
    Fix this by increasing the reference count on the remote target before calling
    inferior_event_handler, such that it won't get destroyed until right before
    returning from remote_async_inferior_event_handler.
    
    Tested on x86_64-linux.
    
    Intended for gdb-10-branch.
    
    The problem has stopped reproducing with the trigger patch since master commit
    79952e69634 "Make scoped_restore_current_thread's cdtors exception free
    (RFC)".  We could still apply this to master though.
    
    gdb/ChangeLog:
    
    2021-01-07  Pedro Alves  <pedro@palves.net>
                Simon Marchi  <simon.marchi@polymtl.ca>
                Tom de Vries  <tdevries@suse.de>
    
            PR remote/26614
            * remote.c (remote_async_inferior_event_handler):  Hold a strong
            reference to the remote target while handling an event.
Comment 27 Joel Brobecker 2021-01-31 06:16:47 UTC
Hello!

Can we confirm whether Pedro's patch which was pushed to gdb-10 is sufficient to fix this PR (in which case we can close it), or are there  further fixes needed?

Thank you!
Comment 28 Tom de Vries 2021-01-31 06:58:21 UTC
(In reply to Joel Brobecker from comment #27)
> Hello!
> 
> Can we confirm whether Pedro's patch which was pushed to gdb-10 is
> sufficient to fix this PR (in which case we can close it), or are there 
> further fixes needed?

Well, the reason I didn't close this PR when committing to gdb-10-branch, is because at the time the PR still existed on master.

A patch series has been submitted for master ( at https://sourceware.org/pipermail/gdb-patches/2020-November/173633.html ) but is still in review AFAICT.
Comment 29 Tom de Vries 2021-01-31 07:16:29 UTC
(In reply to Tom de Vries from comment #28)
> A patch series has been submitted for master ( at
> https://sourceware.org/pipermail/gdb-patches/2020-November/173633.html ) but
> is still in review AFAICT.

And, patch 2 out of that series has a "PR gdb/26614" line in the ChangeLog, so when it's committed it should show up here.
Comment 30 Simon Marchi 2021-02-04 18:43:41 UTC
Oops, the version I pushed doesn't have the PR tag.  But the series is pushed, so we can close this now.