Due to flaky tests flipping back and forth, the regression info generated by +diff_commits is very noisy. In my presentation I described the next step as 'identifying and filtering out flaky testcases', but a more useful and simple approach might be to search the entire history (by regression) and then identify when a new regression *first* appears (is 'truly new'). This could be done either for the entire history, or for an n-month or n-commit sliding window over the history (i.e. show a regression if it hasn't appeared during previous n months/commits).
Committed a basic version of +new_regressions, now that I'm satisfied it runs in reasonable (roughly constant in the length of history) memory and finishes in an extravagant (but not indefinitely-increasing) amount of time. There may be some bugs to fix, not closing the PR yet. Because the memory is constant as the algorithm computes *forward* over the history, the next logical step is to cache the algorithm state and reuse it as new testruns are added for more recent commits. That would solve the extravagant-amount-of-time problem in practical use.
Figuring out how to cache the +new_regressions analysis in a reasonable way. The problem is that +new_regressions can be run with different key=* and window_size/novelty_threshold=* arguments. So the cached data must be stored in a way that allows recomputation if necessary. My current thought is to store - list of single_change: (name, subtest, outcome_pair) - map of commit_pair -> list of key or '*' - map of commit_pair -> list of (index of) single_change
I did a lot of experimentation with the regression-filtering view on the prototype branch earlier this spring, but it's still far from perfect in terms of accurate & easy-to-read results. Eventually I will try implementing something similar to the prototype/2022:scripts_main/show_changes*.py scripts on the new master branch, so this PR is still relevant going forward. For the time being the 'best practice' for getting an overview of regressions is to generate a set of grid views with R-show-testcases ( / show_testcases.py ), grab a coffee, and page through the results.