Bug 31380 - [gdb/dap] Ensure responses are flushed to client before exiting
Summary: [gdb/dap] Ensure responses are flushed to client before exiting
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: dap (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 15.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-14 11:23 UTC by Tom de Vries
Modified: 2024-02-22 10:40 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2024-02-14 11:23:14 UTC
In dap, we want to flush responses to the client before exiting, as stated here in main_loop:
...
        # Got the terminate request.  This is handled by the                                                          
        # JSON-writing thread, so that we can ensure that all                                                         
	# responses are flushed to the client before exiting.                                                         
        self.write_queue.put(None)
...

Now that we print "<dap thread name>: terminating" message, we can check for this (by proxy) by doing:
...
diff --git a/gdb/testsuite/gdb.dap/eof.exp b/gdb/testsuite/gdb.dap/eof.exp
index 9c17725c0d0..575a98d6042 100644
--- a/gdb/testsuite/gdb.dap/eof.exp
+++ b/gdb/testsuite/gdb.dap/eof.exp
@@ -35,3 +35,4 @@ catch "wait -i $gdb_spawn_id"
 unset gdb_spawn_id
 
 dap_check_log_file
+dap_check_log_file_re "JSON writer: terminating"
diff --git a/gdb/testsuite/lib/dap-support.exp b/gdb/testsuite/lib/dap-support.exp
index 979dfa2cd73..b56d8bee244 100644
--- a/gdb/testsuite/lib/dap-support.exp
+++ b/gdb/testsuite/lib/dap-support.exp
@@ -358,6 +358,17 @@ proc dap_check_log_file {} {
     }
 }
 
+proc dap_check_log_file_re { re } {
+    global dap_gdb_instance
+
+    set fd [open [standard_output_file "dap.log.$dap_gdb_instance"]]
+    set contents [read $fd]
+    close $fd
+
+    set ok [regexp $re $contents]
+    gdb_assert {$ok} "log file matched $re"
+}
+
 # Cleanly shut down gdb.  TERMINATE is passed as the terminateDebuggee
 # parameter to the request.
 proc dap_shutdown {{terminate false}} {
...

This currently fails more often than not:
...
FAIL: gdb.dap/eof.exp: log file matched JSON writer: terminating
...

This ( https://sourceware.org/pipermail/gdb-patches/2024-February/206416.html ) mentions:
...
However, I suppose this approach does not really work, because main_loop then proceeds to return.

One way to fix this would be to have start_json_writer return the thread
object, and then have main_loop join the thread to terminate after
writing the None.  This would mean making this particular thread
non-daemon though.
...

I posted a first attempt at fixing this here ( https://sourceware.org/pipermail/gdb-patches/2024-February/206568.html ):
...
diff --git a/gdb/python/lib/gdb/dap/io.py b/gdb/python/lib/gdb/dap/io.py
index 5149edae977..491cf208ae3 100644
--- a/gdb/python/lib/gdb/dap/io.py
+++ b/gdb/python/lib/gdb/dap/io.py
@@ -65,10 +65,7 @@ def start_json_writer(stream, queue):
         while True:
             obj = queue.get()
             if obj is None:
-                # This is an exit request.  The stream is already
-                # flushed, so all that's left to do is request an
-                # exit.
-                send_gdb("quit")
+                queue.task_done()
                 break
             obj["seq"] = seq
             seq = seq + 1
@@ -79,5 +76,6 @@ def start_json_writer(stream, queue):
             stream.write(header_bytes)
             stream.write(body_bytes)
             stream.flush()
+            queue.task_done()

     start_thread("JSON writer", _json_writer)
diff --git a/gdb/python/lib/gdb/dap/server.py b/gdb/python/lib/gdb/dap/server.py
index 7cc5a4681ee..ca4465ff207 100644
--- a/gdb/python/lib/gdb/dap/server.py
+++ b/gdb/python/lib/gdb/dap/server.py
@@ -229,6 +229,8 @@ class Server:
         # JSON-writing thread, so that we can ensure that all
         # responses are flushed to the client before exiting.
         self.write_queue.put(None)
+        self.write_queue.join()
+        send_gdb("quit")

     @in_dap_thread
     def send_event_later(self, event, body=None):
...

But I still run into the same fail.

All that this patch does is guarantee that the dap main thread doesn't finish before the JSON writer thread.

But there's a race with the SIGHUP that's handled by GDB's main thread.  AFAIU there's nothing guaranteeing that the dap main thread finishes before GDB's main thread.

If we're going to add such a constraint, it could make sense to stop dap earlier, before GDB's main thread looses the ability to handle posted (python) events.
Comment 1 Tom Tromey 2024-02-15 18:19:27 UTC
+dap_check_log_file_re "JSON writer: terminating"

I suspect the issue is that the writer thread notifies
the queue -- but then the log is written when the thread
actually stops.  However, that means there is a race:
gdb may stop before the thread exits.

I believe that adding a log just before the final
call to task_done in the writer, then checking for this
would work reliably.
Comment 2 Tom Tromey 2024-02-15 18:21:15 UTC
Also:

+        send_gdb("quit")

I wonder if this is really needed after all.
I applied the queue changes (which IMO should go in ASAP)
but left this line out, and it all seems to work.
The only reason io.py had that send_quit was to try
to deal with the flushing issue, which as you found out
was broken all along.
Comment 3 Tom de Vries 2024-02-16 11:46:23 UTC
FYI, I have a fix as part of this ( https://github.com/vries/gdb/pull/new/dap/exiting-race-4 ) branch.  To be submitted once cleaned up.
Comment 4 Tom de Vries 2024-02-19 08:25:33 UTC
Fix submitted as part of this ( https://sourceware.org/pipermail/gdb-patches/2024-February/206652.html ) series.
Comment 5 Tom de Vries 2024-02-19 08:43:26 UTC
(In reply to Tom Tromey from comment #1)
> +dap_check_log_file_re "JSON writer: terminating"
> 
> I suspect the issue is that the writer thread notifies
> the queue -- but then the log is written when the thread
> actually stops.  However, that means there is a race:
> gdb may stop before the thread exits.
> 
> I believe that adding a log just before the final
> call to task_done in the writer, then checking for this
> would work reliably.

I've addressed this by waiting in the DAP main thread for the JSON writer thread to exit.  This ensures that the "JSON writer: terminating" is in the log.
Comment 6 Sourceware Commits 2024-02-21 09:46:38 UTC
The master branch has been updated by Tom de Vries <vries@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7c34de9efdb13fff482ed0c4de0dffee1e1880f3

commit 7c34de9efdb13fff482ed0c4de0dffee1e1880f3
Author: Tom de Vries <tdevries@suse.de>
Date:   Wed Feb 21 10:46:08 2024 +0100

    [gdb/dap] Join JSON writer thread with DAP thread
    
    The DAP interpreter runs in its own thread, and starts a few threads:
    - the JSON reader thread,
    - the JSON writer thread, and
    - the inferior output reader thread.
    
    As part of the DAP shutdown, both the JSON reader thread and the JSON writer
    thread, as well as the DAP main thread run to exit, but these exits are not
    ordered in any way.
    
    Wait in the main DAP thread for the exit of the JSON writer thread.
    
    This makes sure that the responses are flushed to the DAP client before DAP
    shutdown.
    
    An earlier version of this patch used Queue.task_done() to accomplish the
    same, but that didn't guarantee writing the "<thread name>: terminating"
    log entry from thread_wrapper before DAP shutdown.
    
    Tested on aarch64-linux.
    
    Approved-By: Tom Tromey <tom@tromey.com>
    
    PR dap/31380
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31380
Comment 7 Sourceware Commits 2024-02-22 10:35:45 UTC
The master branch has been updated by Tom de Vries <vries@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=706c6624c26b2c954713c872021594bd3a156dd3

commit 706c6624c26b2c954713c872021594bd3a156dd3
Author: Tom de Vries <tdevries@suse.de>
Date:   Thu Feb 22 11:35:26 2024 +0100

    [gdb/dap] Fix race between dap exit and gdb exit
    
    When DAP shuts down due to an EOF event, there's a race between:
    - gdb's main thread handling a SIGHUP, and
    - the DAP main thread exiting.
    
    Fix this by waiting for DAP's main thread exit during the gdb_exiting event.
    
    Tested on aarch64-linux.
    
    Approved-By: Tom Tromey <tom@tromey.com>
    
    PR dap/31380
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31380
Comment 8 Tom de Vries 2024-02-22 10:40:15 UTC
Fixed.