This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)
- From: Pedro Alves <palves at redhat dot com>
- To: Sergio Durigan Junior <sergiodj at redhat dot com>, Patrick Palka <patrick at parcs dot ath dot cx>
- Cc: "gdb-patches at sourceware dot org" <gdb-patches at sourceware dot org>
- Date: Fri, 14 Aug 2015 00:26:38 +0100
- Subject: Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)
- Authentication-results: sourceware.org; auth=none
- References: <1433878062-23560-1-git-send-email-patrick at parcs dot ath dot cx> <1434466413-28892-1-git-send-email-patrick at parcs dot ath dot cx> <87mvym67i5 dot fsf_-_ at redhat dot com> <CA+C-WL-Ujsn1ccGNS-wD=RUdbJF1DcGxVYSLU6ubcGThdCQXYg at mail dot gmail dot com> <87zj1un7ro dot fsf at redhat dot com>
On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote:
> On Thursday, August 13 2015, Patrick Palka wrote:
>
>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>> <sergiodj@redhat.com> wrote:
>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>
>>>> We still do not handle "set history size unlimited" correctly. In
>>>> particular, after writing to the history file, we truncate the history
>>>> even if it is unlimited.
>>>>
>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>> history is not stifled (i.e. if it's unlimited). This bug causes the
>>>> history file to be truncated to zero on exit when one has "set history
>>>> size unlimited" in their gdbinit file. Although this code exists in GDB
>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>> GDB 7.9 (PR gdb/17820).
>>>
>>> Hey Patrick,
>>>
>>> Looking at the BuildBot logs today, I found that this new test is
>>> failing occasionally on native-extended-gdbserver testing. Take a look
>>> at the following build:
>>>
>>> <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>
>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>
>>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>
>>> The gdb.log is here:
>>>
>>> <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>
>>> I haven't really investigated to determine what's going on here, but let
>>> me know if you need any help with this.
>>
>> Hi Sergio,
>>
>> Have you seen these spurious FAILs pop up recently? I wonder if the
>> fixes to SIGTERM handling I made a few weeks ago may have fixed this
>> as well.
>
> Hey Patrick,
>
> Yeah, I still see those FAIL's from time to time:
>
> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593>
> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674>
>
> I may be wrong, but I noticed that the FAIL's happen only on the PPC
> buildslaves.
>
Here's a theory:
(gdb) server show commands
1 set height 0
2 set width 0
3 target extended-remote localhost:2348
4 print 1
5 monitor exit
6 set height 0
7 set width 0
8 target extended-remote localhost:2349
(gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
Remote debugging from host 127.0.0.1
monitor exit
(gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
GNU gdb (GDB) 7.10.50.20150812-cvs
...
(gdb) set height 0
(gdb) set width 0
(gdb) spawn ../gdbserver/gdbserver --once --multi :2350
Listening on port 2350
target extended-remote localhost:2350
Remote debugging using localhost:2350
(gdb) server show commands
1 set height 0
2 set width 0
3 target extended-remote localhost:2350
(gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
Observations:
#1 - In the FAILing case, the history only shows commands invoked in that
gdb session. That means that either that GDB invocation failed to
load the history file, or the previous invocation failed to write it.
#2 - dejagnu's standard_close, what is ultimately used to exit gdb,
does this:
2.1 - send gdb a SIGINT. This might help by interrupting any command
that gdb might be stuck in, getting it back to
the event loop processing input.
2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.
2.3 - if gdb doesn't exit after 5 seconds, kills it
with SIGTERM. If it still doesn't exit after another 5
seconds, dejagnu kill it with SIGKILL.
Unless you're running dejagnu git master, 2.1 and 2.2 happen
in parallel.
I'm assuming that 2.3 is not happening. It's 2.2 that normally
causes gdb to exit. See event-top.c:stdin_event_handler.
- Because, due to 2.2, stdin/stdout are already closed when gdb is
running the teardown code (quit_force, what saves history), any
warning/output that gdb might output goes nowhere, it won't
appear in gdb.log.
#3 - If you'll recall, not long ago we made the code that appends
the commands to the history file roughly:
- mv old-history-file old-history-file.tmp
- append to old-history-file.tmp
- mv old-history-file.tmp old-history-file
That is top.c:gdb_safe_append_history.
AFAICS, SIGINT is not set to SA_RESTART. Now imagine what happens
if the SIGINT lands exactly just when gdb is about to call
rename here, in gdb_safe_append_history:
ret = rename (local_history_filename, history_filename);
saved_errno = errno;
if (ret < 0 && saved_errno != EEXIST)
warning (_("Could not rename %s to %s: %s"),
local_history_filename, history_filename,
safe_strerror (saved_errno));
}
and rename fails with EINTR. Then the following gdb invocation
would not find any previous history file to load, which would
perfectly explain that FAIL.
If we still have the build/test directory of that test run,
if the theory is right, I think we'd find a
leftover gdbinit-history.gdb_history-gdb-$PID~ file.
What doesn't look right in the theory is that the rename man
does not document rename as failing with EINTR... But
maybe it does?
Thanks,
Pedro Alves