This is the mail archive of the
mailing list for the GDB project.
Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)
- From: Pedro Alves <palves at redhat dot com>
- To: Patrick Palka <patrick at parcs dot ath dot cx>
- Cc: Sergio Durigan Junior <sergiodj at redhat dot com>, "gdb-patches at sourceware dot org" <gdb-patches at sourceware dot org>
- Date: Mon, 17 Aug 2015 15:02:57 +0100
- Subject: Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)
- Authentication-results: sourceware.org; auth=none
- References: <1433878062-23560-1-git-send-email-patrick at parcs dot ath dot cx> <1434466413-28892-1-git-send-email-patrick at parcs dot ath dot cx> <87mvym67i5 dot fsf_-_ at redhat dot com> <CA+C-WL-Ujsn1ccGNS-wD=RUdbJF1DcGxVYSLU6ubcGThdCQXYg at mail dot gmail dot com> <87zj1un7ro dot fsf at redhat dot com> <55CD27AE dot 2090002 at redhat dot com> <CA+C-WL_RNgONqtRRgzdiCJm0UF0278uKs82LLK43b6c12MKBJQ at mail dot gmail dot com> <CA+C-WL-CEpAHZr_aVx+ReoX_rEk=4qfoHYckJo7Ldv4fZm5A5Q at mail dot gmail dot com>
On 08/17/2015 02:28 PM, Patrick Palka wrote:
> Ah, you already addressed this: a warning is not emitted because
> stdout is closed..
> But because the problem only occurs under extended-gdbserver, I'm
> inclined to think the issue is with the testsuite driver, in
> particular with the gdb_exit implementation in
> lib/gdbserver-support.exp. One potential issue I notice in this proc
> is that when we send "monitor exit" to GDB, we don't necessarily wait
> for the command to finish (i.e. for the gdb prompt to get printed).
> As soon as the server is observed to get killed, we continue with
> exiting. Dunno if that's substantial..
That's very plausible, at least.
Maybe that prompt got stuck in the expect buffer, and it confused
something else later on?
Another theory related to that could be that the new GDB started just
while the previous gdb is saving history and has just momentarily
renamed the history file to gdbinit-history.gdb_history-gdb-$PID~.
But AFAICS, that shouldn't happen because that gdb_exit calls
gdbserver_orig_gdb_exit at the end, which only returns after
the previous gdb exits...
Did anyone ever manage to reproduce this?
One thing I'd try is making dejagnu's local_exec (close_wait_program in master)
print the result of the "wait -i". That will show whether gdb exited
due to a normal exit, or whether it was killed by SIGTERM or SIGKILL.
And then I'd try hacking gdb_safe_append_history to output debug logs
to a file instead of stdout (e.g., /tmp/gdb-log).
Another would be to add a "show history filename" to the test, to make sure
that the gdb that fails to load the previous history actually tried to
read the file we expect it to be reading.
Also, I think it's time to try to get all the buildslaves to use
dejagnu master, to pick up http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00005.html.
Who knows, maybe that race/rogue kill could also explain this problem.
The x86_64 Fedora slaves have been running with that for a while, and
we no longer see attach-many-short-lived-threads.exp failures there, and
we keep seeing them on the other slaves (which don't have that fix).