This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64)


On Mon, Aug 17, 2015 at 9:15 AM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Thu, Aug 13, 2015 at 7:26 PM, Pedro Alves <palves@redhat.com> wrote:
>> On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote:
>>> On Thursday, August 13 2015, Patrick Palka wrote:
>>>
>>>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>>>> <sergiodj@redhat.com> wrote:
>>>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>>>
>>>>>> We still do not handle "set history size unlimited" correctly.  In
>>>>>> particular, after writing to the history file, we truncate the history
>>>>>> even if it is unlimited.
>>>>>>
>>>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>>>>> history file to be truncated to zero on exit when one has "set history
>>>>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>>>> GDB 7.9 (PR gdb/17820).
>>>>>
>>>>> Hey Patrick,
>>>>>
>>>>> Looking at the BuildBot logs today, I found that this new test is
>>>>> failing occasionally on native-extended-gdbserver testing.  Take a look
>>>>> at the following build:
>>>>>
>>>>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>>>
>>>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>>>
>>>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>>>
>>>>> The gdb.log is here:
>>>>>
>>>>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>>>
>>>>> I haven't really investigated to determine what's going on here, but let
>>>>> me know if you need any help with this.
>>>>
>>>> Hi Sergio,
>>>>
>>>> Have you seen these spurious FAILs pop up recently?  I wonder if the
>>>> fixes to SIGTERM handling I made a few weeks ago may have fixed this
>>>> as well.
>>>
>>> Hey Patrick,
>>>
>>> Yeah, I still see those FAIL's from time to time:
>>>
>>>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593>
>>>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674>
>>>
>>> I may be wrong, but I noticed that the FAIL's happen only on the PPC
>>> buildslaves.
>>>
>>
>> Here's a theory:
>>
>> (gdb) server show commands
>>     1  set height 0
>>     2  set width 0
>>     3  target extended-remote localhost:2348
>>     4  print 1
>>     5  monitor exit
>>     6  set height 0
>>     7  set width 0
>>     8  target extended-remote localhost:2349
>> (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>> Remote debugging from host 127.0.0.1
>> monitor exit
>> (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
>> -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
>> ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
>> GNU gdb (GDB) 7.10.50.20150812-cvs
>> ...
>> (gdb) set height 0
>> (gdb) set width 0
>> (gdb) spawn ../gdbserver/gdbserver --once --multi :2350
>> Listening on port 2350
>> target extended-remote localhost:2350
>> Remote debugging using localhost:2350
>> (gdb) server show commands
>>     1  set height 0
>>     2  set width 0
>>     3  target extended-remote localhost:2350
>> (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>
>> Observations:
>>
>> #1 - In the FAILing case, the history only shows commands invoked in that
>>   gdb session.  That means that either that GDB invocation failed to
>>   load the history file, or the previous invocation failed to write it.
>>
>> #2 - dejagnu's standard_close, what is ultimately used to exit gdb,
>>   does this:
>>
>>  2.1 - send gdb a SIGINT.  This might help by interrupting any command
>>    that gdb might be stuck in, getting it back to
>>    the event loop processing input.
>>
>>  2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.
>>
>>  2.3 - if gdb doesn't exit after 5 seconds, kills it
>>      with SIGTERM.  If it still doesn't exit after another 5
>>      seconds, dejagnu kill it with SIGKILL.
>>
>>  Unless you're running dejagnu git master, 2.1 and 2.2 happen
>>  in parallel.
>>
>>  I'm assuming that 2.3 is not happening.  It's 2.2 that normally
>>  causes gdb to exit.  See event-top.c:stdin_event_handler.
>>
>> - Because, due to 2.2, stdin/stdout are already closed when gdb is
>>   running the teardown code (quit_force, what saves history), any
>>   warning/output that gdb might output goes nowhere, it won't
>>   appear in gdb.log.
>>
>> #3 - If you'll recall, not long ago we made the code that appends
>>      the commands to the history file roughly:
>>
>>    - mv old-history-file old-history-file.tmp
>>    - append to old-history-file.tmp
>>    - mv old-history-file.tmp old-history-file
>>
>> That is top.c:gdb_safe_append_history.
>>
>> AFAICS, SIGINT is not set to SA_RESTART.  Now imagine what happens
>> if the SIGINT lands exactly just when gdb is about to call
>> rename here, in gdb_safe_append_history:
>>
>>       ret = rename (local_history_filename, history_filename);
>>       saved_errno = errno;
>>       if (ret < 0 && saved_errno != EEXIST)
>>         warning (_("Could not rename %s to %s: %s"),
>>                  local_history_filename, history_filename,
>>                  safe_strerror (saved_errno));
>>     }
>>
>> and rename fails with EINTR.  Then the following gdb invocation
>> would not find any previous history file to load, which would
>> perfectly explain that FAIL.
>>
>> If we still have the build/test directory of that test run,
>> if the theory is right, I think we'd find a
>> leftover gdbinit-history.gdb_history-gdb-$PID~ file.
>>
>> What doesn't look right in the theory is that the rename man
>> does not document rename as failing with EINTR...  But
>> maybe it does?
>
> Interesting theory.  I had assumed that the problem lies in the
> testsuite driver, not in the GDB source code, because the problem only
> manifests itself under native-extended-gdbserver.  If I understand
> correctly, your explanation is also plausible under native or
> native-gdbserver, yet the problem was not observed to ever manifest
> itself under these targets.
>
> Also, if rename fails with EINTR, wouldn't a "Could not rename %s to
> %s: %s" warning be emitted in the gdb output log?

Ah, you already addressed this: a warning is not emitted because
stdout is closed..

But because the problem only occurs under extended-gdbserver, I'm
inclined to think the issue is with the testsuite driver, in
particular with the gdb_exit implementation in
lib/gdbserver-support.exp.  One potential issue I notice in this proc
is that when we send "monitor exit" to GDB, we don't necessarily wait
for the command to finish (i.e. for the gdb prompt to get printed).
As soon as the server is observed to get killed, we continue with
exiting.  Dunno if that's substantial..


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]