14618 – gdbserver, scheduler-locking, thread exits and TARGET_WAITKIND_NO_RESUMED

Bug 14618 - gdbserver, scheduler-locking, thread exits and TARGET_WAITKIND_NO_RESUMED

Summary: gdbserver, scheduler-locking, thread exits and TARGET_WAITKIND_NO_RESUMED

Status:	RESOLVED FIXED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	threads (show other bugs)
Version:	7.4

Importance:	P2 critical
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-09-25 09:18 UTC by Tom Tan
Modified:	2015-11-30 22:06 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Tan 2012-09-25 09:18:44 UTC

I got this when running the gdb test case "gdb.threads/no-unwaited-for-left" with remote target(gdbserver).

Everytime if I "set scheduler-locking on" in gdb and then resume from a thread, the other threads will also be resumed. I think the "on" mode should only resume the current thread.

Comment 1 Pedro Alves 2012-09-25 11:42:03 UTC

scheduler-locking does work with gdbserver, otherwise schedlock.exp would fail.

... but only as long as the thread being debugged doesn't exit.
This is the "TARGET_WAITKIND_NO_RESUMED / graceful leader-exits" point in

http://sourceware.org/gdb/wiki/LocalRemoteFeatureParity

The GDB side fix (that added TARGET_WAITKIND_NO_RESUMED and those
testcases was <http://sourceware.org/ml/gdb-patches/2011-10/msg00704.html>.

When the (only) running thread exits, I think gdbserver ends up resuming the whole inferior...

Comment 2 Tom Tan 2012-09-26 07:58:29 UTC

Sorry, the title really doesn't describe the problem accurately. And thanks Pedro for the wiki page, I missed that doc.

In no-unwaited-for-left.exp, if scheduler-locking is set to be "on", then after the running thread exits, GDB server exits instead of the inferior is resumed(my former description is not exact either). I set a break point after the new thread exits and it doesn't execute to that code. 

I followed the code a bit and found that the last_status of gdbserver is TARGET_WAITKIND_EXITED after the thread exits, then the process is mourned.

Comment 3 Pedro Alves 2012-09-26 17:14:29 UTC

Right.  As I said, GDBserver ends up resuming all threads of the inferior if the single thread that had been resumed exits.

In this case, after the inferior has been resumed, it naturally exits.
And when the inferior exits, then GDBserver exits as well; that much is normal.

The bit that decides to resume the inferior has this comment, in linux-low.c:

  /* If we were only supposed to resume one thread, only wait for
     that thread - if it's still alive.  If it died, however - which
     can happen if we're coming from the thread death case below -
     then we need to make sure we restart the other threads.  We could
     pick a thread at random or restart all; restarting all is less
     arbitrary.  */

This is of course, totally bogus.

The right fix it to add support for the new TARGET_WAITKIND_NO_RESUMED event type to the remote protocol, and teach gdbserver to use it.

Comment 4 Pedro Alves 2014-02-27 15:11:40 UTC

The hangs themselves are now fixed on GDBserver mainline.

  https://sourceware.org/ml/gdb-patches/2014-02/msg00826.html

GDBserver's Linux core now knows about TARGET_WAITKIND_NO_RESUMED, but
I've left out adding that to the RSP ("[PATCH 5/5] Add TARGET_WAITKIND_NO_RESUMED support to the RSP" at https://sourceware.org/ml/gdb-patches/2014-01/msg00897.html), because I worried about races in non-stop mode.

Assume N is the new stop reply for TARGET_WAITKIND_NO_RESUMED, like in the patch above and say, the traffic goes like this:

(no thread was resumed before #1)

GDB sends two steps in a row.

 #1 -> vCont;s:555
 #2 <- OK

(555 exits)

 #3 <- %Stop:N   (no more resumed threads left)

 #4 -> vCont;s:666
 #5 <- OK

And now GDB goes and processes #3, and mistakenly believes no thread is left running on the target, while in fact, 666 is left running.  Sounds like we need something else in addition to TARGET_WAITKIND_NO_RESUMED, maybe thread exit events (active only for stepped threads, by default, for efficiency).

I'll leave this open until this is fully sorted out.

Comment 5 Sourceware Commits 2015-04-02 12:53:11 UTC

The master branch has been updated by Yao Qi <qiyao@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=cafda5977a98aef514ff86daca2fa94205bdd34e

commit cafda5977a98aef514ff86daca2fa94205bdd34e
Author: Yao Qi <yao.qi@linaro.org>
Date:   Thu Apr 2 13:51:31 2015 +0100

    kfail two tests in no-unwaited-for-left.exp for remote target
    
    I see these two fails in no-unwaited-for-left.exp in remote testing
    for aarch64-linux target.
    
    ...
    continue
    Continuing.
    warning: Remote failure reply: E.No unwaited-for children left.
    
    [Thread 1084] #2 stopped.
    (gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when thread 2 exits
    
    ....
    continue
    Continuing.
    warning: Remote failure reply: E.No unwaited-for children left.
    
    [Thread 1081] #1 stopped.
    (gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when the main thread exits
    
    I checked the gdb.log on buildbot, and find that these two fails also
    appear on Debian-i686-native-extended-gdbserver and Fedora-ppc64be-native-gdbserver-m64.
    I recall that they are about local/remote parity, and related RSP is missing.
    There has been already a PR 14618 about it.  This patch is to kfail them
    on remote target.
    
    gdb/testsuite:
    
    2015-04-02  Yao Qi  <yao.qi@linaro.org>
    
    	* gdb.threads/no-unwaited-for-left.exp: Set up kfail if target
    	is remote.

Comment 6 Sourceware Commits 2015-11-30 19:56:58 UTC

The master branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f4836ba964a96364f39c7eab8b8b2f8656d14d05

commit f4836ba964a96364f39c7eab8b8b2f8656d14d05
Author: Pedro Alves <palves@redhat.com>
Date:   Mon Nov 30 16:05:24 2015 +0000

    infrun: Fix TARGET_WAITKIND_NO_RESUMED handling in non-stop mode
    
    Running the testsuite against gdbserver with "maint set target-non-stop on"
    stumbled on a set of problems.  See code comments for details.
    
    This handles my concerns expressed in PR14618.
    
    gdb/ChangeLog:
    2015-11-30  Pedro Alves  <palves@redhat.com>
    
    	PR 14618
    	* infrun.c (handle_no_resumed): New function.
    	(handle_inferior_event_1) <TARGET_WAITKIND_NO_RESUMED>: Defer to
    	handle_no_resumed.

Comment 7 Sourceware Commits 2015-11-30 19:57:04 UTC

The master branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f2faf941ae49653ff6e1485adfee299313d47c91

commit f2faf941ae49653ff6e1485adfee299313d47c91
Author: Pedro Alves <palves@redhat.com>
Date:   Mon Nov 30 16:05:25 2015 +0000

    Implement TARGET_WAITKIND_NO_RESUMED in the remote protocol
    
    Testing with "maint set target-non-stop on" causes regressions in
    tests that rely on TARGET_WAITKIND_NO_RESUMED, which isn't modelled on
    the RSP.  In real all-stop, gdbserver detects the situation and
    reporst error to GDB, and so the tests (e.g.,
    gdb.threads/no-unwaited-for-left.exp) at fail quickly.  But with
    "maint set target-non-stop on", GDB instead hangs forever waiting for
    a stop reply that never comes, and so the tests take longer to time
    out.
    
    This adds a new "N" stop reply packet that maps 1-1 to
    TARGET_WAITKIND_NO_RESUMED.
    
    gdb/ChangeLog:
    2015-11-30  Pedro Alves  <palves@redhat.com>
    
    	PR 14618
    	* NEWS (New remote packets): Mention the N stop reply.
    	* remote.c (remote_protocol_features): Add "no-resumed" entry.
    	(remote_query_supported): Report no-resumed+ support.
    	(remote_parse_stop_reply): Handle 'N'.
    	(process_stop_reply): Handle TARGET_WAITKIND_NO_RESUMED.
    	(remote_wait_as): Handle 'N' / TARGET_WAITKIND_NO_RESUMED.
    	(_initialize_remote): Register "set/show remote
    	no-resumed-stop-reply" commands.
    
    gdb/doc/ChangeLog:
    2015-11-30  Pedro Alves  <palves@redhat.com>
    
    	PR 14618
    	* gdb.texinfo (Stop Reply Packets): Document the N stop reply.
    	(Remote Configuration): Add the "set/show remote
    	no-resumed-stop-reply" to the available settings table.
    	(General Query Packets): Document the "no-resumed" qSupported
    	feature.
    
    gdb/gdbserver/ChangeLog:
    2015-11-30  Pedro Alves  <palves@redhat.com>
    
    	PR 14618
    	* linux-low.c (linux_wait_1): If the last resumed thread is gone,
    	report TARGET_WAITKIND_NO_RESUMED.
    	* remote-utils.c (prepare_resume_reply): Handle
    	TARGET_WAITKIND_NO_RESUMED.
    	* server.c (report_no_resumed): New global.
    	(handle_query) <qSupported>: Handle "no-resumed+".  Report
    	"no-resumed+" support.
    	(resume): When the target reports TARGET_WAITKIND_NO_RESUMED, only
    	return error if the client doesn't support no-resumed events.
    	(push_stop_notification): New function.
    	(handle_target_event): Use it.  Report TARGET_WAITKIND_NO_RESUMED
    	events if the client supports them.
    
    gdb/testsuite/ChangeLog:
    2015-11-30  Pedro Alves  <palves@redhat.com>
    
    	* gdb.threads/no-unwaited-for-left.exp: Remove setup_kfail calls.

Comment 8 Pedro Alves 2015-11-30 22:06:07 UTC

This is now fixed.  gdb.threads/no-unwaited-for-left.exp passes with gdbserver too now.