Bug 25091 - analysis: which regressions are truly new?
Summary: analysis: which regressions are truly new?
Status: NEW
Alias: None
Product: bunsen
Classification: Unclassified
Component: bunsen (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Serhei Makarov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-09 17:00 UTC by Serhei Makarov
Modified: 2022-09-16 17:15 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Serhei Makarov 2019-10-09 17:00:01 UTC
Due to flaky tests flipping back and forth, the regression info generated by +diff_commits is very noisy. In my presentation I described the next step as 'identifying and filtering out flaky testcases', but a more useful and simple approach might be to search the entire history (by regression) and then identify when a new regression *first* appears (is 'truly new').

This could be done either for the entire history, or for an n-month or n-commit sliding window over the history (i.e. show a regression if it hasn't appeared during previous n months/commits).
Comment 1 Serhei Makarov 2019-11-20 22:11:05 UTC
Committed a basic version of +new_regressions, now that I'm satisfied it runs in reasonable (roughly constant in the length of history) memory and finishes in an extravagant (but not indefinitely-increasing) amount of time.

There may be some bugs to fix, not closing the PR yet.

Because the memory is constant as the algorithm computes *forward* over the history, the next logical step is to cache the algorithm state and reuse it as new testruns are added for more recent commits. That would solve the extravagant-amount-of-time problem in practical use.
Comment 2 Serhei Makarov 2020-01-15 19:40:03 UTC
Figuring out how to cache the +new_regressions analysis in a reasonable way. The problem is that +new_regressions can be run with different key=* and window_size/novelty_threshold=* arguments. So the cached data must be stored in a way that allows recomputation if necessary.

My current thought is to store
- list of single_change: (name, subtest, outcome_pair)
- map of commit_pair -> list of key or '*'
- map of commit_pair -> list of (index of) single_change
Comment 3 Serhei Makarov 2022-09-16 17:15:34 UTC
I did a lot of experimentation with the regression-filtering view on the prototype branch earlier this spring, but it's still far from perfect in terms of accurate & easy-to-read results. Eventually I will try implementing something similar to the prototype/2022:scripts_main/show_changes*.py scripts on the new master branch, so this PR is still relevant going forward.

For the time being the 'best practice' for getting an overview of regressions is to generate a set of grid views with R-show-testcases ( / show_testcases.py ), grab a coffee, and page through the results.