dg-extract-results's bad sorting behavior (Re: Update dg-extract-results.* from gcc)
Pedro Alves
palves@redhat.com
Mon Oct 1 09:36:00 GMT 2018
On 08/08/2018 04:24 PM, Pedro Alves wrote:
> On 08/08/2018 03:36 PM, Rainer Orth wrote:
>
>>> On 07/20/2018 12:02 PM, Rainer Orth wrote:
>>>> When looking at the gdb.sum file produced by dg-extract-results.sh on
>>>> Solaris 11/x86, I noticed some wrong sorting, like this:
>>>>
>>>> PASS: gdb.ada/addr_arith.exp: print something'address + 0
>>>> PASS: gdb.ada/addr_arith.exp: print 0 + something'address
>>>> PASS: gdb.ada/addr_arith.exp: print something'address - 0
>>>> PASS: gdb.ada/addr_arith.exp: print 0 - something'address
>>>>
>>>> Looking closer, I noticed that while dg-extract-results.sh had been
>>>> copied over from contrib in the gcc repo, the corresponding
>>>> dg-extract-results.py file had not. The latter not only fixes the
>>>> sorting problem I'd observed, but is also way faster than the shell
>>>> version (like a factor of 50 faster).
>>>
>>> We used to have the dg-extract-results.py file, but we deleted it
>>> because it caused (funnily enough, sorting) problems. See:
>>>
>>> https://sourceware.org/ml/gdb-patches/2015-02/msg00333.html
>>>
>>> Has that sorting stability issue been meanwhile fixed upstream?
>>
>> not that I can see: between the version of dg-extract-results.py removed
>> in early 2015 and the one in current gcc trunk, there's only added
>> handling for DejaGnu ERRORs and another minor change to do with
>> summaries that doesn't seem to change anything wrt. sorting on first
>> blush.
>
> OK.
>
>> Howver, I've just run make -j16 check three times in a row on
>> amd64-pc-solaris2.11, followed by make -j48 check, and the only
>> differences were to to 200+ racy tests, the vast majority of them in
>> gdb.threads.
>
> Thanks for testing.
>
>> Maybe the prior problems have been due to bugs in older
>> versions of python?
> Might be.
>
> IIRC, the sorting that would change would be the order that the different
> individual gdb.sum results would be merged (the per-gdb.sum order was
> stable). So comparing two runs, you'd get something like, in one run this:
>
> gdb.base/foo.exp:PASS: test1
> gdb.base/foo.exp:PASS: test2
> gdb.base/foo.exp:PASS: test3
> gdb.base/bar.exp:PASS: testA
> gdb.base/bar.exp:PASS: testB
> gdb.base/bar.exp:PASS: testC
>
> and another run this:
>
> gdb.base/bar.exp:PASS: testA
> gdb.base/bar.exp:PASS: testB
> gdb.base/bar.exp:PASS: testC
> gdb.base/foo.exp:PASS: test1
> gdb.base/foo.exp:PASS: test2
> gdb.base/foo.exp:PASS: test3
>
> which would result in all those tests spuriously showing up in a
> gdb.sum old/new diff.
>
> I'm not sure whether we were seeing that if you compared runs
> of the same tree multiple times. It could be that it only happened
> when comparing the results of different trees, which contained
> a slightly different set of tests and testcases, like for example
> comparing testresults of a patched master against the testresults
> of master from a month or week ago, which is something I frequently
> do, for example.
>
> *time passes*
>
> Wait wait wait, can you clarify what you meant by wrong sorting in:
>
> PASS: gdb.ada/addr_arith.exp: print something'address + 0
> PASS: gdb.ada/addr_arith.exp: print 0 + something'address
> PASS: gdb.ada/addr_arith.exp: print something'address - 0
> PASS: gdb.ada/addr_arith.exp: print 0 - something'address
>
> ?
>
> Why do you think those results _should_ be sorted? And in what order?
>
> Typically, the order/sequence in which the tests of a given exp
> file is executed is important. The order in the gdb.sum file must
> be the order in which the fail/pass calls are written/issued in the .exp file.
> It'd be absolutely incorrect to alphabetically sort the gdb.sum output.
> Is that what the .py version does? That's not what I recall, though.
> I guess I may be confused.
Getting back to this, because I just diffed testresults between
runs of different vintage, and got bitten by the sorting problems.
I'm diffing testresults between a run on 20180713 and a run
against today's gdb, and I got a _ton_ of spurious diffs like these:
-PASS: gdb.ada/complete.exp: complete p my_glob
-PASS: gdb.ada/complete.exp: complete p insi
-PASS: gdb.ada/complete.exp: complete p inner.insi
-PASS: gdb.ada/complete.exp: complete p pck.inne
-PASS: gdb.ada/complete.exp: complete p pck__inner__ins
-PASS: gdb.ada/complete.exp: complete p pck.inner.ins
-PASS: gdb.ada/complete.exp: complete p side
-PASS: gdb.ada/complete.exp: complete p exported
+PASS: gdb.ada/complete.exp: complete break ada
PASS: gdb.ada/complete.exp: complete p <Exported
-PASS: gdb.ada/complete.exp: p <Exported_Capitalized>
-PASS: gdb.ada/complete.exp: p Exported_Capitalized
-PASS: gdb.ada/complete.exp: p exported_capitalized
-PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra
PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
-PASS: gdb.ada/complete.exp: complete p some
+PASS: gdb.ada/complete.exp: complete p <pck__my
+PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra
+PASS: gdb.ada/complete.exp: complete p ambig
+PASS: gdb.ada/complete.exp: complete p ambiguous_f
+PASS: gdb.ada/complete.exp: complete p ambiguous_func
+PASS: gdb.ada/complete.exp: complete p exported
+PASS: gdb.ada/complete.exp: complete p external_ident
+PASS: gdb.ada/complete.exp: complete p inner.insi
+PASS: gdb.ada/complete.exp: complete p insi
+PASS: gdb.ada/complete.exp: complete p local_ident
+PASS: gdb.ada/complete.exp: complete p my_glob
PASS: gdb.ada/complete.exp: complete p not_in_sco
-PASS: gdb.ada/complete.exp: complete p pck.ins
-PASS: gdb.ada/complete.exp: complete p pck.my
+PASS: gdb.ada/complete.exp: complete p pck
+PASS: gdb.ada/complete.exp: complete p pck.
+PASS: gdb.ada/complete.exp: complete p pck.inne
PASS: gdb.ada/complete.exp: complete p pck.inne
PASS: gdb.ada/complete.exp: complete p pck.inner.
-PASS: gdb.ada/complete.exp: complete p local_ident
+PASS: gdb.ada/complete.exp: complete p pck.inner.ins
+PASS: gdb.ada/complete.exp: complete p pck.ins
PASS: gdb.ada/complete.exp: complete p pck.local_ident
+PASS: gdb.ada/complete.exp: complete p pck.my
+PASS: gdb.ada/complete.exp: complete p pck__inner__ins
PASS: gdb.ada/complete.exp: complete p pck__local_ident
-PASS: gdb.ada/complete.exp: complete p external_ident
-PASS: gdb.ada/complete.exp: complete p pck
-PASS: gdb.ada/complete.exp: complete p pck.
-PASS: gdb.ada/complete.exp: complete p <pck__my
+PASS: gdb.ada/complete.exp: complete p side
+PASS: gdb.ada/complete.exp: complete p some
PASS: gdb.ada/complete.exp: interactive complete 'print some'
-PASS: gdb.ada/complete.exp: complete p ambig
-PASS: gdb.ada/complete.exp: complete p ambiguous_f
-PASS: gdb.ada/complete.exp: complete p ambiguous_func
+PASS: gdb.ada/complete.exp: p <Exported_Capitalized>
+PASS: gdb.ada/complete.exp: p Exported_Capitalized
+PASS: gdb.ada/complete.exp: p exported_capitalized
PASS: gdb.ada/complete.exp: set max-completions unlimited
-PASS: gdb.ada/complete.exp: complete break ada
Given the earlier discussions about sorting, I could
immediately recognize what is wrong. It's that while
testsuite/outputs/gdb.ada/complete/gdb.sum lists the
test results in chronological order, preserving
execution sequence:
Running src/gdb/testsuite/gdb.ada/complete.exp ...
PASS: gdb.ada/complete.exp: compilation foo.adb
PASS: gdb.ada/complete.exp: complete p my_glob
PASS: gdb.ada/complete.exp: complete p insi
PASS: gdb.ada/complete.exp: complete p inner.insi
PASS: gdb.ada/complete.exp: complete p pck.inne
PASS: gdb.ada/complete.exp: complete p pck__inner__ins
PASS: gdb.ada/complete.exp: complete p pck.inner.ins
PASS: gdb.ada/complete.exp: complete p side
PASS: gdb.ada/complete.exp: complete p exported
PASS: gdb.ada/complete.exp: complete p <Exported
PASS: gdb.ada/complete.exp: p <Exported_Capitalized>
PASS: gdb.ada/complete.exp: p Exported_Capitalized
PASS: gdb.ada/complete.exp: p exported_capitalized
PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra
PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
PASS: gdb.ada/complete.exp: complete p some
PASS: gdb.ada/complete.exp: complete p not_in_sco
PASS: gdb.ada/complete.exp: complete p pck.ins
PASS: gdb.ada/complete.exp: complete p pck.my
PASS: gdb.ada/complete.exp: complete p pck.inne
PASS: gdb.ada/complete.exp: complete p pck.inner.
PASS: gdb.ada/complete.exp: complete p local_ident
PASS: gdb.ada/complete.exp: complete p pck.local_ident
PASS: gdb.ada/complete.exp: complete p pck__local_ident
PASS: gdb.ada/complete.exp: complete p external_ident
PASS: gdb.ada/complete.exp: complete p pck
PASS: gdb.ada/complete.exp: complete p pck.
PASS: gdb.ada/complete.exp: complete p <pck__my
PASS: gdb.ada/complete.exp: interactive complete 'print some'
PASS: gdb.ada/complete.exp: complete p ambig
PASS: gdb.ada/complete.exp: complete p ambiguous_f
PASS: gdb.ada/complete.exp: complete p ambiguous_func
PASS: gdb.ada/complete.exp: set max-completions unlimited
PASS: gdb.ada/complete.exp: complete break ada
... the squashed testsuite/gdb.sum ended up with those tests above
sorted lexically:
PASS: gdb.ada/complete.exp: compilation foo.adb
PASS: gdb.ada/complete.exp: complete break ada
PASS: gdb.ada/complete.exp: complete p <Exported
PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
PASS: gdb.ada/complete.exp: complete p <pck__my
PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra
PASS: gdb.ada/complete.exp: complete p ambig
PASS: gdb.ada/complete.exp: complete p ambiguous_f
PASS: gdb.ada/complete.exp: complete p ambiguous_func
PASS: gdb.ada/complete.exp: complete p exported
PASS: gdb.ada/complete.exp: complete p external_ident
PASS: gdb.ada/complete.exp: complete p inner.insi
PASS: gdb.ada/complete.exp: complete p insi
PASS: gdb.ada/complete.exp: complete p local_ident
PASS: gdb.ada/complete.exp: complete p my_glob
PASS: gdb.ada/complete.exp: complete p not_in_sco
PASS: gdb.ada/complete.exp: complete p pck
PASS: gdb.ada/complete.exp: complete p pck.
PASS: gdb.ada/complete.exp: complete p pck.inne
PASS: gdb.ada/complete.exp: complete p pck.inne
PASS: gdb.ada/complete.exp: complete p pck.inner.
PASS: gdb.ada/complete.exp: complete p pck.inner.ins
PASS: gdb.ada/complete.exp: complete p pck.ins
PASS: gdb.ada/complete.exp: complete p pck.local_ident
PASS: gdb.ada/complete.exp: complete p pck.my
PASS: gdb.ada/complete.exp: complete p pck__inner__ins
PASS: gdb.ada/complete.exp: complete p pck__local_ident
PASS: gdb.ada/complete.exp: complete p side
PASS: gdb.ada/complete.exp: complete p some
PASS: gdb.ada/complete.exp: interactive complete 'print some'
PASS: gdb.ada/complete.exp: p <Exported_Capitalized>
PASS: gdb.ada/complete.exp: p Exported_Capitalized
PASS: gdb.ada/complete.exp: p exported_capitalized
PASS: gdb.ada/complete.exp: set max-completions unlimited
... which is clearly incorrect.
So you won't see the problem if you compare test results of
two runs that both postdate the dg-extract-results update,
and if they're both run in parallel mode. I assume the problem
is visible if you compare a parallel mode run against
a serial mode run, since the latter won't sort.
Is this something that can be easily fixed?
Thanks,
Pedro Alves
More information about the Gdb-patches
mailing list