This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH] Improve analysis of racy testcases

From: Pedro Alves <palves at redhat dot com>
To: Sergio Durigan Junior <sergiodj at redhat dot com>, GDB Patches <gdb-patches at sourceware dot org>
Date: Thu, 25 Feb 2016 17:01:28 +0000
Subject: Re: [PATCH] Improve analysis of racy testcases
Authentication-results: sourceware.org; auth=none
References: <87r3gcgm91 dot fsf at redhat dot com>
On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
> This patch is a proposal to introduce some mechanisms to identify racy
> testcases present in our testsuite.  As can be seen in previous
> discussions, racy tests are really bothersome and cause our BuildBot to
> pollute the gdb-testers mailing list with hundreds of false-positives
> messages every month.  Hopefully, by identifying these racy tests in
> advance (and automatically) will contribute to the reduction of noise
> traffic to gdb-testers, maybe to the point where we will be able to send
> the failure messages directly to the authors of the commits.
> 
> I spent some time trying to decide the best way to tackle this problem,
> and decided that there is no silver bullet.  Racy tests are tricky and
> it is difficult to catch them, so the best solution I could find (for
> now?) is to run our testsuite a number of times in a row, and then
> compare the results (i.e., the gdb.sum files generated during each run).
> The more times you run the tests, the more racy tests you are likely to
> detect (at the expense of waiting longer and longer).  You can also run
> the tests in parallel, which makes things faster (and contribute to
> catching more racy tests, because your machine will have less resources
> for each test and some of them are likely to fail when this happens).  I
> did some tests in my machine (8-core i7, 16GB RAM), and running the
> whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.
> 
> In order to run the racy test machinery, you need to specify the
> RACY_ITER environment variable.  You will assign a number to this
> variable, which represents the number of times you want to run the
> tests.  So, for example, if you want to run the whole testsuite 3 times
> in parallel (using 2 cores), you will do:
> 
>   make check RACY_ITER=3 -j2
> 
> It is also possible to use the TESTS variable and specify which tests
> you want to run:
> 
>   make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2

I did this, and noticed:

$ ls testsuite/racy_outputs/
gdb1.log  gdb1.sum  gdb2.log  gdb2.sum  gdb3.log  gdb3.sum

I think it'd make more sense, and it'd be easier to copy
test run results around, diff result dirs etc., if these
were put in subdirs instead:

 testsuite/racy_outputs/1/gdb.sum
 testsuite/racy_outputs/1/gdb.sum
 testsuite/race_outputs/1/gdb.log
 testsuite/race_outputs/2/gdb.sum
 testsuite/race_outputs/2/gdb.log

etc.  Any reason not to?

> 
> And so on.  The output files will be put at the directory
> gdb/testsuite/racy_outputs/.
> 
> After make invokes the necessary rules to run the tests, it finally runs
> a Python script that will analyze the resulting gdb.sum files.  This
> Python script will read each file, and construct a series of sets based
> on the results of the tests (one set for FAIL's, one for PASS'es, one
> for KFAIL's, etc.).  It will then do some set operations and come up
> with a list of unique, sorted testcases that are racy.  The algorithm
> behind this is:
> 
>   for state in PASS, FAIL, XFAIL, XPASS...; do
>     if a test's state in every sumfile is $state; then
>       it is not racy
>     else
>       it is racy
> 
> IOW, a test must have the same state in every sumfile.

Is it easy to start with preexisting dirs with gdb.sum/gdb.log
files, and run this script on them?

E.g., say I've already run the testsuite three times, and now
I want to check for racy tests, using the test results I already
have:

 testruns/2015-02-21/1/gdb.sum
 testruns/2015-02-21/2/gdb.sum
 testruns/2015-02-21/3/gdb.sum

I think we should document that.

> 
> After processing everything, the script prints the racy tests it could
> identify on stdout.  I am redirecting this to a file named racy.sum.

I tried:
 make check TESTS='gdb.base/default.exp' RACY_ITER=3

and was surprised to see nothing output at the end.  Then I found that
the racy.sum file is completely empty.  How about we instead make sure
that something is already printed?  Maybe follow dejagnu's output style,
and print a summary at the end?

I've now run "make check -j8 RACY_ITER=3" and got this:

$ cat testsuite/racy.sum
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork

The gdb.threads/process-dies-while-handling-bp.exp ones are actually:

 -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
 +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)

Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.


> 
> Something else that I wasn't sure how to deal with was non-unique
> messages in our testsuite.  I decided to do the same thing I do in our
> BuildBot: include a unique identifier in the end of message, like:
> 
>   gdb.base/xyz.exp: non-unique message
>   gdb.base/xyz.exp: non-unique message <<2>>
>   
> This means that you will have to be careful about them when you use the
> racy.sum file.
> 
> I ran the script several times here, and it did a good job catching some
> well-known racy tests.  Overall, I am satisfied with this approach and I
> think it will be helpful to have it upstream'ed. 

I think so too.  We don't have to fix everything before its put in either.
And having this doesn't preclude exploring other approaches either.

The only thing I do wish we should do, is use the fruits of this
to somehow mark racy tests in the testsuite itself, instead of only
making the buildbot ignore them, so that local development benefits
as well.

> I also intend to
> extend our BuildBot and create new, specialized builders that will be
> responsible for detecting the racy tests every X number of days, but
> that will only be done when this patch is accepted.

That'll be great.

> Comments are welcome, as usual.  Thanks.

On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
> diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
> index 38c3052..4bf6191 100644
> --- a/gdb/testsuite/Makefile.in
> +++ b/gdb/testsuite/Makefile.in
> @@ -52,6 +52,8 @@ RUNTESTFLAGS =
>
>  FORCE_PARALLEL =
>
> +DEFAULT_RACY_ITER = 3

Are users supposed to be able to trigger the default?  I didn't
see it documented.


> +Racy testcases
> +**************
> +
> +Unfortunately, GDB has a number of testcases that can randomly pass or
> +fail.  We call them "racy testcases", and they can be bothersome when
> +one is comparing different testsuite runs.  In order to help

I'd suggest future-proofing this, by rephrasing it as if we were at a
stage where the testsuite is stable, and new racy tests occasionally are
added to the testsuite.

>
> diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
> new file mode 100755
> index 0000000..4f05185
> --- /dev/null
> +++ b/gdb/testsuite/analyze-racy-logs.py
> @@ -0,0 +1,131 @@
> +#!/usr/bin/env python
> +
> +# Copyright (C) 2016 Free Software Foundation, Inc.
> +#
> +# This file is part of GDB.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +
> +# This program is used to analyze the test results (i.e., *.sum files)
> +# generated by GDB's testsuite, and print the testcases that are found
> +# to be racy.
> +#
> +# Racy testcases are considered as being testcases which can
> +# intermitently FAIL (or PASS) when run two or more times

intermittently

I think you should put this in as is, and we can keep working on
it on top.

Thanks,
Pedro Alves
Follow-Ups:
- Re: [PATCH] Improve analysis of racy testcases
  - From: Sergio Durigan Junior
References:
- [PATCH] Improve analysis of racy testcases
  - From: Sergio Durigan Junior
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]