This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)


On Thu, Aug 13, 2015 at 7:26 PM, Pedro Alves <palves@redhat.com> wrote:
> On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote:
>> On Thursday, August 13 2015, Patrick Palka wrote:
>>
>>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>>> <sergiodj@redhat.com> wrote:
>>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>>
>>>>> We still do not handle "set history size unlimited" correctly.  In
>>>>> particular, after writing to the history file, we truncate the history
>>>>> even if it is unlimited.
>>>>>
>>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>>>> history file to be truncated to zero on exit when one has "set history
>>>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>>> GDB 7.9 (PR gdb/17820).
>>>>
>>>> Hey Patrick,
>>>>
>>>> Looking at the BuildBot logs today, I found that this new test is
>>>> failing occasionally on native-extended-gdbserver testing.  Take a look
>>>> at the following build:
>>>>
>>>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>>
>>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>>
>>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>>
>>>> The gdb.log is here:
>>>>
>>>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>>
>>>> I haven't really investigated to determine what's going on here, but let
>>>> me know if you need any help with this.
>>>
>>> Hi Sergio,
>>>
>>> Have you seen these spurious FAILs pop up recently?  I wonder if the
>>> fixes to SIGTERM handling I made a few weeks ago may have fixed this
>>> as well.
>>
>> Hey Patrick,
>>
>> Yeah, I still see those FAIL's from time to time:
>>
>>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593>
>>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674>
>>
>> I may be wrong, but I noticed that the FAIL's happen only on the PPC
>> buildslaves.
>>
>
> Here's a theory:
>
> (gdb) server show commands
>     1  set height 0
>     2  set width 0
>     3  target extended-remote localhost:2348
>     4  print 1
>     5  monitor exit
>     6  set height 0
>     7  set width 0
>     8  target extended-remote localhost:2349
> (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
> Remote debugging from host 127.0.0.1
> monitor exit
> (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
> -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
> ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
> GNU gdb (GDB) 7.10.50.20150812-cvs
> ...
> (gdb) set height 0
> (gdb) set width 0
> (gdb) spawn ../gdbserver/gdbserver --once --multi :2350
> Listening on port 2350
> target extended-remote localhost:2350
> Remote debugging using localhost:2350
> (gdb) server show commands
>     1  set height 0
>     2  set width 0
>     3  target extended-remote localhost:2350
> (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>
> Observations:
>
> #1 - In the FAILing case, the history only shows commands invoked in that
>   gdb session.  That means that either that GDB invocation failed to
>   load the history file, or the previous invocation failed to write it.
>
> #2 - dejagnu's standard_close, what is ultimately used to exit gdb,
>   does this:
>
>  2.1 - send gdb a SIGINT.  This might help by interrupting any command
>    that gdb might be stuck in, getting it back to
>    the event loop processing input.
>
>  2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.
>
>  2.3 - if gdb doesn't exit after 5 seconds, kills it
>      with SIGTERM.  If it still doesn't exit after another 5
>      seconds, dejagnu kill it with SIGKILL.
>
>  Unless you're running dejagnu git master, 2.1 and 2.2 happen
>  in parallel.
>
>  I'm assuming that 2.3 is not happening.  It's 2.2 that normally
>  causes gdb to exit.  See event-top.c:stdin_event_handler.
>
> - Because, due to 2.2, stdin/stdout are already closed when gdb is
>   running the teardown code (quit_force, what saves history), any
>   warning/output that gdb might output goes nowhere, it won't
>   appear in gdb.log.
>
> #3 - If you'll recall, not long ago we made the code that appends
>      the commands to the history file roughly:
>
>    - mv old-history-file old-history-file.tmp
>    - append to old-history-file.tmp
>    - mv old-history-file.tmp old-history-file
>
> That is top.c:gdb_safe_append_history.
>
> AFAICS, SIGINT is not set to SA_RESTART.  Now imagine what happens
> if the SIGINT lands exactly just when gdb is about to call
> rename here, in gdb_safe_append_history:
>
>       ret = rename (local_history_filename, history_filename);
>       saved_errno = errno;
>       if (ret < 0 && saved_errno != EEXIST)
>         warning (_("Could not rename %s to %s: %s"),
>                  local_history_filename, history_filename,
>                  safe_strerror (saved_errno));
>     }
>
> and rename fails with EINTR.  Then the following gdb invocation
> would not find any previous history file to load, which would
> perfectly explain that FAIL.
>
> If we still have the build/test directory of that test run,
> if the theory is right, I think we'd find a
> leftover gdbinit-history.gdb_history-gdb-$PID~ file.
>
> What doesn't look right in the theory is that the rename man
> does not document rename as failing with EINTR...  But
> maybe it does?

Interesting theory.  I had assumed that the problem lies in the
testsuite driver, not in the GDB source code, because the problem only
manifests itself under native-extended-gdbserver.  If I understand
correctly, your explanation is also plausible under native or
native-gdbserver, yet the problem was not observed to ever manifest
itself under these targets.

Also, if rename fails with EINTR, wouldn't a "Could not rename %s to
%s: %s" warning be emitted in the gdb output log?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]