dg-extract-results's bad sorting behavior (Re: Update dg-extract-results.* from gcc)

Mon Oct 1 09:36:00 GMT 2018

On 08/08/2018 04:24 PM, Pedro Alves wrote:
> On 08/08/2018 03:36 PM, Rainer Orth wrote:
> 
>>> On 07/20/2018 12:02 PM, Rainer Orth wrote:
>>>> When looking at the gdb.sum file produced by dg-extract-results.sh on
>>>> Solaris 11/x86, I noticed some wrong sorting, like this:
>>>>
>>>> PASS: gdb.ada/addr_arith.exp: print something'address + 0
>>>> PASS: gdb.ada/addr_arith.exp: print 0 + something'address
>>>> PASS: gdb.ada/addr_arith.exp: print something'address - 0
>>>> PASS: gdb.ada/addr_arith.exp: print 0 - something'address
>>>>
>>>> Looking closer, I noticed that while dg-extract-results.sh had been
>>>> copied over from contrib in the gcc repo, the corresponding
>>>> dg-extract-results.py file had not.  The latter not only fixes the
>>>> sorting problem I'd observed, but is also way faster than the shell
>>>> version (like a factor of 50 faster).
>>>
>>> We used to have the dg-extract-results.py file, but we deleted it
>>> because it caused (funnily enough, sorting) problems.  See:
>>>
>>>   https://sourceware.org/ml/gdb-patches/2015-02/msg00333.html
>>>
>>> Has that sorting stability issue been meanwhile fixed upstream?
>>
>> not that I can see: between the version of dg-extract-results.py removed
>> in early 2015 and the one in current gcc trunk, there's only added
>> handling for DejaGnu ERRORs and another minor change to do with
>> summaries that doesn't seem to change anything wrt. sorting on first
>> blush.
> 
> OK.
> 
>> Howver, I've just run make -j16 check three times in a row on
>> amd64-pc-solaris2.11, followed by make -j48 check, and the only
>> differences were to to 200+ racy tests, the vast majority of them in
>> gdb.threads.  
> 
> Thanks for testing.
> 
>> Maybe the prior problems have been due to bugs in older
>> versions of python?
> Might be.
> 
> IIRC, the sorting that would change would be the order that the different
> individual gdb.sum results would  be merged (the per-gdb.sum order was
> stable).  So comparing two runs, you'd get something like, in one run this:
> 
>  gdb.base/foo.exp:PASS: test1
>  gdb.base/foo.exp:PASS: test2
>  gdb.base/foo.exp:PASS: test3
>  gdb.base/bar.exp:PASS: testA
>  gdb.base/bar.exp:PASS: testB
>  gdb.base/bar.exp:PASS: testC
> 
> and another run this:
> 
>  gdb.base/bar.exp:PASS: testA
>  gdb.base/bar.exp:PASS: testB
>  gdb.base/bar.exp:PASS: testC
>  gdb.base/foo.exp:PASS: test1
>  gdb.base/foo.exp:PASS: test2
>  gdb.base/foo.exp:PASS: test3
> 
> which would result in all those tests spuriously showing up in a
> gdb.sum old/new diff.
> 
> I'm not sure whether we were seeing that if you compared runs
> of the same tree multiple times.  It could be that it only happened
> when comparing the results of different trees, which contained
> a slightly different set of tests and testcases, like for example
> comparing testresults of a patched master against the testresults
> of master from a month or week ago, which is something I frequently
> do, for example.
> 
> *time passes*
> 
> Wait wait wait, can you clarify what you meant by wrong sorting in:
> 
>  PASS: gdb.ada/addr_arith.exp: print something'address + 0
>  PASS: gdb.ada/addr_arith.exp: print 0 + something'address
>  PASS: gdb.ada/addr_arith.exp: print something'address - 0
>  PASS: gdb.ada/addr_arith.exp: print 0 - something'address
> 
> ?
> 
> Why do you think those results _should_ be sorted?  And in what order?
> 
> Typically, the order/sequence in which the tests of a given exp
> file is executed is important.  The order in the gdb.sum file must
> be the order in which the fail/pass calls are written/issued in the .exp file.
> It'd be absolutely incorrect to alphabetically sort the gdb.sum output.
> Is that what the .py version does?  That's not what I recall, though.
> I guess I may be confused.

Getting back to this, because I just diffed testresults between
runs of different vintage, and got bitten by the sorting problems.

I'm diffing testresults between a run on 20180713 and a run
against today's gdb, and I got a _ton_ of spurious diffs like these:

 -PASS: gdb.ada/complete.exp: complete p my_glob 
 -PASS: gdb.ada/complete.exp: complete p insi 
 -PASS: gdb.ada/complete.exp: complete p inner.insi 
 -PASS: gdb.ada/complete.exp: complete p pck.inne 
 -PASS: gdb.ada/complete.exp: complete p pck__inner__ins 
 -PASS: gdb.ada/complete.exp: complete p pck.inner.ins 
 -PASS: gdb.ada/complete.exp: complete p side 
 -PASS: gdb.ada/complete.exp: complete p exported 
 +PASS: gdb.ada/complete.exp: complete break ada 
  PASS: gdb.ada/complete.exp: complete p <Exported
 -PASS: gdb.ada/complete.exp: p <Exported_Capitalized> 
 -PASS: gdb.ada/complete.exp: p Exported_Capitalized 
 -PASS: gdb.ada/complete.exp: p exported_capitalized 
 -PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra 
  PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
 -PASS: gdb.ada/complete.exp: complete p some 
 +PASS: gdb.ada/complete.exp: complete p <pck__my 
 +PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra 
 +PASS: gdb.ada/complete.exp: complete p ambig 
 +PASS: gdb.ada/complete.exp: complete p ambiguous_f 
 +PASS: gdb.ada/complete.exp: complete p ambiguous_func 
 +PASS: gdb.ada/complete.exp: complete p exported 
 +PASS: gdb.ada/complete.exp: complete p external_ident 
 +PASS: gdb.ada/complete.exp: complete p inner.insi 
 +PASS: gdb.ada/complete.exp: complete p insi 
 +PASS: gdb.ada/complete.exp: complete p local_ident 
 +PASS: gdb.ada/complete.exp: complete p my_glob 
  PASS: gdb.ada/complete.exp: complete p not_in_sco
 -PASS: gdb.ada/complete.exp: complete p pck.ins 
 -PASS: gdb.ada/complete.exp: complete p pck.my 
 +PASS: gdb.ada/complete.exp: complete p pck 
 +PASS: gdb.ada/complete.exp: complete p pck. 
 +PASS: gdb.ada/complete.exp: complete p pck.inne 
  PASS: gdb.ada/complete.exp: complete p pck.inne
  PASS: gdb.ada/complete.exp: complete p pck.inner.
 -PASS: gdb.ada/complete.exp: complete p local_ident 
 +PASS: gdb.ada/complete.exp: complete p pck.inner.ins 
 +PASS: gdb.ada/complete.exp: complete p pck.ins 
  PASS: gdb.ada/complete.exp: complete p pck.local_ident
 +PASS: gdb.ada/complete.exp: complete p pck.my 
 +PASS: gdb.ada/complete.exp: complete p pck__inner__ins 
  PASS: gdb.ada/complete.exp: complete p pck__local_ident
 -PASS: gdb.ada/complete.exp: complete p external_ident 
 -PASS: gdb.ada/complete.exp: complete p pck 
 -PASS: gdb.ada/complete.exp: complete p pck. 
 -PASS: gdb.ada/complete.exp: complete p <pck__my 
 +PASS: gdb.ada/complete.exp: complete p side 
 +PASS: gdb.ada/complete.exp: complete p some 
  PASS: gdb.ada/complete.exp: interactive complete 'print some'
 -PASS: gdb.ada/complete.exp: complete p ambig 
 -PASS: gdb.ada/complete.exp: complete p ambiguous_f 
 -PASS: gdb.ada/complete.exp: complete p ambiguous_func 
 +PASS: gdb.ada/complete.exp: p <Exported_Capitalized> 
 +PASS: gdb.ada/complete.exp: p Exported_Capitalized 
 +PASS: gdb.ada/complete.exp: p exported_capitalized 
  PASS: gdb.ada/complete.exp: set max-completions unlimited
 -PASS: gdb.ada/complete.exp: complete break ada 

Given the earlier discussions about sorting, I could
immediately recognize what is wrong.  It's that while
testsuite/outputs/gdb.ada/complete/gdb.sum lists the
test results in chronological order, preserving
execution sequence:

 Running src/gdb/testsuite/gdb.ada/complete.exp ...
 PASS: gdb.ada/complete.exp: compilation foo.adb
 PASS: gdb.ada/complete.exp: complete p my_glob
 PASS: gdb.ada/complete.exp: complete p insi
 PASS: gdb.ada/complete.exp: complete p inner.insi
 PASS: gdb.ada/complete.exp: complete p pck.inne
 PASS: gdb.ada/complete.exp: complete p pck__inner__ins
 PASS: gdb.ada/complete.exp: complete p pck.inner.ins
 PASS: gdb.ada/complete.exp: complete p side
 PASS: gdb.ada/complete.exp: complete p exported
 PASS: gdb.ada/complete.exp: complete p <Exported
 PASS: gdb.ada/complete.exp: p <Exported_Capitalized>
 PASS: gdb.ada/complete.exp: p Exported_Capitalized
 PASS: gdb.ada/complete.exp: p exported_capitalized
 PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra
 PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
 PASS: gdb.ada/complete.exp: complete p some
 PASS: gdb.ada/complete.exp: complete p not_in_sco
 PASS: gdb.ada/complete.exp: complete p pck.ins
 PASS: gdb.ada/complete.exp: complete p pck.my
 PASS: gdb.ada/complete.exp: complete p pck.inne
 PASS: gdb.ada/complete.exp: complete p pck.inner.
 PASS: gdb.ada/complete.exp: complete p local_ident
 PASS: gdb.ada/complete.exp: complete p pck.local_ident
 PASS: gdb.ada/complete.exp: complete p pck__local_ident
 PASS: gdb.ada/complete.exp: complete p external_ident
 PASS: gdb.ada/complete.exp: complete p pck
 PASS: gdb.ada/complete.exp: complete p pck.
 PASS: gdb.ada/complete.exp: complete p <pck__my
 PASS: gdb.ada/complete.exp: interactive complete 'print some'
 PASS: gdb.ada/complete.exp: complete p ambig
 PASS: gdb.ada/complete.exp: complete p ambiguous_f
 PASS: gdb.ada/complete.exp: complete p ambiguous_func
 PASS: gdb.ada/complete.exp: set max-completions unlimited
 PASS: gdb.ada/complete.exp: complete break ada

... the squashed testsuite/gdb.sum ended up with those tests above
sorted lexically:

 PASS: gdb.ada/complete.exp: compilation foo.adb
 PASS: gdb.ada/complete.exp: complete break ada
 PASS: gdb.ada/complete.exp: complete p <Exported
 PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
 PASS: gdb.ada/complete.exp: complete p <pck__my
 PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra
 PASS: gdb.ada/complete.exp: complete p ambig
 PASS: gdb.ada/complete.exp: complete p ambiguous_f
 PASS: gdb.ada/complete.exp: complete p ambiguous_func
 PASS: gdb.ada/complete.exp: complete p exported
 PASS: gdb.ada/complete.exp: complete p external_ident
 PASS: gdb.ada/complete.exp: complete p inner.insi
 PASS: gdb.ada/complete.exp: complete p insi
 PASS: gdb.ada/complete.exp: complete p local_ident
 PASS: gdb.ada/complete.exp: complete p my_glob
 PASS: gdb.ada/complete.exp: complete p not_in_sco
 PASS: gdb.ada/complete.exp: complete p pck
 PASS: gdb.ada/complete.exp: complete p pck.
 PASS: gdb.ada/complete.exp: complete p pck.inne
 PASS: gdb.ada/complete.exp: complete p pck.inne
 PASS: gdb.ada/complete.exp: complete p pck.inner.
 PASS: gdb.ada/complete.exp: complete p pck.inner.ins
 PASS: gdb.ada/complete.exp: complete p pck.ins
 PASS: gdb.ada/complete.exp: complete p pck.local_ident
 PASS: gdb.ada/complete.exp: complete p pck.my
 PASS: gdb.ada/complete.exp: complete p pck__inner__ins
 PASS: gdb.ada/complete.exp: complete p pck__local_ident
 PASS: gdb.ada/complete.exp: complete p side
 PASS: gdb.ada/complete.exp: complete p some
 PASS: gdb.ada/complete.exp: interactive complete 'print some'
 PASS: gdb.ada/complete.exp: p <Exported_Capitalized>
 PASS: gdb.ada/complete.exp: p Exported_Capitalized
 PASS: gdb.ada/complete.exp: p exported_capitalized
 PASS: gdb.ada/complete.exp: set max-completions unlimited

... which is clearly incorrect.

So you won't see the problem if you compare test results of
two runs that both postdate the dg-extract-results update,
and if they're both run in parallel mode.  I assume the problem
is visible if you compare a parallel mode run against
a serial mode run, since the latter won't sort.

Is this something that can be easily fixed?

Thanks,
Pedro Alves