This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)
- From: Patrick Palka <patrick at parcs dot ath dot cx>
- To: Pedro Alves <palves at redhat dot com>
- Cc: Sergio Durigan Junior <sergiodj at redhat dot com>, "gdb-patches at sourceware dot org" <gdb-patches at sourceware dot org>
- Date: Mon, 17 Aug 2015 09:15:44 -0400
- Subject: Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)
- Authentication-results: sourceware.org; auth=none
- References: <1433878062-23560-1-git-send-email-patrick at parcs dot ath dot cx> <1434466413-28892-1-git-send-email-patrick at parcs dot ath dot cx> <87mvym67i5 dot fsf_-_ at redhat dot com> <CA+C-WL-Ujsn1ccGNS-wD=RUdbJF1DcGxVYSLU6ubcGThdCQXYg at mail dot gmail dot com> <87zj1un7ro dot fsf at redhat dot com> <55CD27AE dot 2090002 at redhat dot com>
On Thu, Aug 13, 2015 at 7:26 PM, Pedro Alves <palves@redhat.com> wrote:
> On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote:
>> On Thursday, August 13 2015, Patrick Palka wrote:
>>
>>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>>> <sergiodj@redhat.com> wrote:
>>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>>
>>>>> We still do not handle "set history size unlimited" correctly. In
>>>>> particular, after writing to the history file, we truncate the history
>>>>> even if it is unlimited.
>>>>>
>>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>>> history is not stifled (i.e. if it's unlimited). This bug causes the
>>>>> history file to be truncated to zero on exit when one has "set history
>>>>> size unlimited" in their gdbinit file. Although this code exists in GDB
>>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>>> GDB 7.9 (PR gdb/17820).
>>>>
>>>> Hey Patrick,
>>>>
>>>> Looking at the BuildBot logs today, I found that this new test is
>>>> failing occasionally on native-extended-gdbserver testing. Take a look
>>>> at the following build:
>>>>
>>>> <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>>
>>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>>
>>>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>>
>>>> The gdb.log is here:
>>>>
>>>> <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>>
>>>> I haven't really investigated to determine what's going on here, but let
>>>> me know if you need any help with this.
>>>
>>> Hi Sergio,
>>>
>>> Have you seen these spurious FAILs pop up recently? I wonder if the
>>> fixes to SIGTERM handling I made a few weeks ago may have fixed this
>>> as well.
>>
>> Hey Patrick,
>>
>> Yeah, I still see those FAIL's from time to time:
>>
>> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593>
>> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674>
>>
>> I may be wrong, but I noticed that the FAIL's happen only on the PPC
>> buildslaves.
>>
>
> Here's a theory:
>
> (gdb) server show commands
> 1 set height 0
> 2 set width 0
> 3 target extended-remote localhost:2348
> 4 print 1
> 5 monitor exit
> 6 set height 0
> 7 set width 0
> 8 target extended-remote localhost:2349
> (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
> Remote debugging from host 127.0.0.1
> monitor exit
> (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
> -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
> ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
> GNU gdb (GDB) 7.10.50.20150812-cvs
> ...
> (gdb) set height 0
> (gdb) set width 0
> (gdb) spawn ../gdbserver/gdbserver --once --multi :2350
> Listening on port 2350
> target extended-remote localhost:2350
> Remote debugging using localhost:2350
> (gdb) server show commands
> 1 set height 0
> 2 set width 0
> 3 target extended-remote localhost:2350
> (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>
> Observations:
>
> #1 - In the FAILing case, the history only shows commands invoked in that
> gdb session. That means that either that GDB invocation failed to
> load the history file, or the previous invocation failed to write it.
>
> #2 - dejagnu's standard_close, what is ultimately used to exit gdb,
> does this:
>
> 2.1 - send gdb a SIGINT. This might help by interrupting any command
> that gdb might be stuck in, getting it back to
> the event loop processing input.
>
> 2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.
>
> 2.3 - if gdb doesn't exit after 5 seconds, kills it
> with SIGTERM. If it still doesn't exit after another 5
> seconds, dejagnu kill it with SIGKILL.
>
> Unless you're running dejagnu git master, 2.1 and 2.2 happen
> in parallel.
>
> I'm assuming that 2.3 is not happening. It's 2.2 that normally
> causes gdb to exit. See event-top.c:stdin_event_handler.
>
> - Because, due to 2.2, stdin/stdout are already closed when gdb is
> running the teardown code (quit_force, what saves history), any
> warning/output that gdb might output goes nowhere, it won't
> appear in gdb.log.
>
> #3 - If you'll recall, not long ago we made the code that appends
> the commands to the history file roughly:
>
> - mv old-history-file old-history-file.tmp
> - append to old-history-file.tmp
> - mv old-history-file.tmp old-history-file
>
> That is top.c:gdb_safe_append_history.
>
> AFAICS, SIGINT is not set to SA_RESTART. Now imagine what happens
> if the SIGINT lands exactly just when gdb is about to call
> rename here, in gdb_safe_append_history:
>
> ret = rename (local_history_filename, history_filename);
> saved_errno = errno;
> if (ret < 0 && saved_errno != EEXIST)
> warning (_("Could not rename %s to %s: %s"),
> local_history_filename, history_filename,
> safe_strerror (saved_errno));
> }
>
> and rename fails with EINTR. Then the following gdb invocation
> would not find any previous history file to load, which would
> perfectly explain that FAIL.
>
> If we still have the build/test directory of that test run,
> if the theory is right, I think we'd find a
> leftover gdbinit-history.gdb_history-gdb-$PID~ file.
>
> What doesn't look right in the theory is that the rename man
> does not document rename as failing with EINTR... But
> maybe it does?
Interesting theory. I had assumed that the problem lies in the
testsuite driver, not in the GDB source code, because the problem only
manifests itself under native-extended-gdbserver. If I understand
correctly, your explanation is also plausible under native or
native-gdbserver, yet the problem was not observed to ever manifest
itself under these targets.
Also, if rename fails with EINTR, wouldn't a "Could not rename %s to
%s: %s" warning be emitted in the gdb output log?