This is the mail archive of the mailing list for the GDB project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)

On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote:
> On Thursday, August 13 2015, Patrick Palka wrote:
>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>> <> wrote:
>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>> We still do not handle "set history size unlimited" correctly.  In
>>>> particular, after writing to the history file, we truncate the history
>>>> even if it is unlimited.
>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>>> history file to be truncated to zero on exit when one has "set history
>>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>> GDB 7.9 (PR gdb/17820).
>>> Hey Patrick,
>>> Looking at the BuildBot logs today, I found that this new test is
>>> failing occasionally on native-extended-gdbserver testing.  Take a look
>>> at the following build:
>>>   <>
>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>> The gdb.log is here:
>>>   <>
>>> I haven't really investigated to determine what's going on here, but let
>>> me know if you need any help with this.
>> Hi Sergio,
>> Have you seen these spurious FAILs pop up recently?  I wonder if the
>> fixes to SIGTERM handling I made a few weeks ago may have fixed this
>> as well.
> Hey Patrick,
> Yeah, I still see those FAIL's from time to time:
>  <>
>  <>
> I may be wrong, but I noticed that the FAIL's happen only on the PPC
> buildslaves.

Here's a theory:

(gdb) server show commands
    1  set height 0
    2  set width 0
    3  target extended-remote localhost:2348
    4  print 1
    5  monitor exit
    6  set height 0
    7  set width 0
    8  target extended-remote localhost:2349
(gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
Remote debugging from host
monitor exit
(gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
GNU gdb (GDB)
(gdb) set height 0
(gdb) set width 0
(gdb) spawn ../gdbserver/gdbserver --once --multi :2350
Listening on port 2350
target extended-remote localhost:2350
Remote debugging using localhost:2350
(gdb) server show commands
    1  set height 0
    2  set width 0
    3  target extended-remote localhost:2350
(gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands


#1 - In the FAILing case, the history only shows commands invoked in that
  gdb session.  That means that either that GDB invocation failed to
  load the history file, or the previous invocation failed to write it.

#2 - dejagnu's standard_close, what is ultimately used to exit gdb,
  does this:

 2.1 - send gdb a SIGINT.  This might help by interrupting any command
   that gdb might be stuck in, getting it back to
   the event loop processing input.

 2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.

 2.3 - if gdb doesn't exit after 5 seconds, kills it
     with SIGTERM.  If it still doesn't exit after another 5
     seconds, dejagnu kill it with SIGKILL.

 Unless you're running dejagnu git master, 2.1 and 2.2 happen
 in parallel.

 I'm assuming that 2.3 is not happening.  It's 2.2 that normally
 causes gdb to exit.  See event-top.c:stdin_event_handler.

- Because, due to 2.2, stdin/stdout are already closed when gdb is
  running the teardown code (quit_force, what saves history), any
  warning/output that gdb might output goes nowhere, it won't
  appear in gdb.log.

#3 - If you'll recall, not long ago we made the code that appends
     the commands to the history file roughly:

   - mv old-history-file old-history-file.tmp
   - append to old-history-file.tmp
   - mv old-history-file.tmp old-history-file

That is top.c:gdb_safe_append_history.

AFAICS, SIGINT is not set to SA_RESTART.  Now imagine what happens
if the SIGINT lands exactly just when gdb is about to call
rename here, in gdb_safe_append_history:

      ret = rename (local_history_filename, history_filename);
      saved_errno = errno;
      if (ret < 0 && saved_errno != EEXIST)
        warning (_("Could not rename %s to %s: %s"),
		 local_history_filename, history_filename,
		 safe_strerror (saved_errno));

and rename fails with EINTR.  Then the following gdb invocation
would not find any previous history file to load, which would
perfectly explain that FAIL.

If we still have the build/test directory of that test run,
if the theory is right, I think we'd find a
leftover gdbinit-history.gdb_history-gdb-$PID~ file.

What doesn't look right in the theory is that the rename man
does not document rename as failing with EINTR...  But
maybe it does?

Pedro Alves

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]