Frank Ch. Eigler [Tue, 28 Jun 2022 20:15:08 +0000 (16:15 -0400)]
g-dejagnu-cluster-entropy rework
Change logic so that freshness is tracked in a separate new table.
This allows full caching even if no cluster/expfile data would be
computed (due to no expfiles in a particular cluster, e.g.), and
happens to reduce storage requirements too from the previous v2.
The previous entropy payload schema is restored.
Frank Ch. Eigler [Thu, 23 Jun 2022 23:12:02 +0000 (19:12 -0400)]
pipeline: disable entropy calculation again
On rhel8's old sqlite 3.26, a pretty pessimal query evaluation
strategy is chosen to run the inner query, and it slows things down by
orders of magnitude compared to f35's sqlite 3.36. Disable again
awhile.
Frank Ch. Eigler [Thu, 23 Jun 2022 22:21:17 +0000 (18:21 -0400)]
omnibus g-engine changes
- add "--update" option to g-* engines to refresh rather than start over, if possible
- pipeline uses --update for them
- pipeline process exits with rc != 0 in case of engine/etc. errors
- g-dejagnu-cluster-entropy: API BREAK table schema change, add a cluster membership-hash
to output table, to know if elements need to be recomputed (if clusterfinder changed
members); saves 85% time for incremental work situations; this engine could also become
hypothetically i-* series, being given the new testrun hashes only to focus on, except
that it consumes the clusterfinder's output
Frank Ch. Eigler [Mon, 20 Jun 2022 23:43:35 +0000 (19:43 -0400)]
r-httpd-browse: add cluster navigation to testrun view
Add an arrow and a delta (plus a count) for metadata-induced
clustering, for prev, this, and next clusters. This allows the user
to navigate between and within testrun clusters, and also run
testsuite-diff operations between the current testrun and related
clusters of testruns.
For better or for worse, some of these clusters can be quite large,
which means the the diff operations can be quite expensive & produce
large results. Some throttling limits will be needed shortly.
Serhei Makarov [Tue, 14 Jun 2022 15:38:02 +0000 (11:38 -0400)]
r-httpd-browse icebreaker: report empty sets of things
A minor thing, but this was a factor in my confusion when I ran
r-httpd-browse on an old sqlite db with keyval authored_day
and got an empty query that was trying to sort on testrun.authored.day.
As keiths & serhei have long ago discovered and we rediscovered now,
make -j check
type dejagnu runs can regularly scramble .sum / .log file segments so
they no longer parallel. One offender is gcc's
contrib/dg-extract-results.sh that assembles .log/.sum files from
parallel-executed pieces .... and proceeds to SORT rows within .sum
file .exp segments (only). Why? Why? WHYYYYYY?! y tho
That resulted in many logfile cursor = NULL results. Tweak
i-dejagnu-parser to tolerate this case better by pre-parsing segments
of the .exp file loglines. The logic here may break if some wacky
tool reorders the sequence of .exp files, but this has not been
observed.
As keiths has long ago discovered and we rediscovered now,
make -j check
type dejagnu runs can regularly scramble .sum / .log file segments so
they no longer parallel. That resulted in many logfile cursor = NULL
results. Tweak i-dejagnu-parser to tolerate this case better by
willing to restart a search for a .log file snippet from the top of
the file.
- add a predicate-filtering frontend for testrun searches
- add tests
- coincidentally (sorry), do some toolshedding cleanup on i-testrun-indexer
metadata extraction; add back testrun.git_describe for the nickname
Serhei Makarov [Mon, 16 May 2022 14:19:36 +0000 (10:19 -0400)]
fixes for --refspec option
1. dead commits aren't in the repo, so there's no refspec to check.
2. excluded commits should be subtracted from len(new)
or else the global analysis will run even when nothing has changed.
Serhei Makarov [Mon, 16 May 2022 14:05:30 +0000 (10:05 -0400)]
add --refspec option to filter commits from an indexed repo
Adding --refspec="*testlogs*" will limit the processed commits from
the testlog git repo to those whose containing branch according to
'git describe --always --contains' includes 'testlogs'. This is needed
on my Bunsen setup where I already have a testlog repo being regularly
updated with branches 'testlogs' containing valid SystemTap testlogs
and 'testruns' containing JSON index data which this branch of Bunsen
doesn't need to process. This option saved around 3 hours on initial
build from my SystemTap data.