25091 – analysis: which regressions are truly new?

Bug 25091 - analysis: which regressions are truly new?

Summary: analysis: which regressions are truly new?

Status:	NEW

Alias:	None

Product:	bunsen
Classification:	Unclassified
Component:	bunsen (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Serhei Makarov

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-10-09 17:00 UTC by Serhei Makarov
Modified:	2022-09-16 17:15 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Serhei Makarov 2019-10-09 17:00:01 UTC

Due to flaky tests flipping back and forth, the regression info generated by +diff_commits is very noisy. In my presentation I described the next step as 'identifying and filtering out flaky testcases', but a more useful and simple approach might be to search the entire history (by regression) and then identify when a new regression *first* appears (is 'truly new').

This could be done either for the entire history, or for an n-month or n-commit sliding window over the history (i.e. show a regression if it hasn't appeared during previous n months/commits).

Comment 1 Serhei Makarov 2019-11-20 22:11:05 UTC

Committed a basic version of +new_regressions, now that I'm satisfied it runs in reasonable (roughly constant in the length of history) memory and finishes in an extravagant (but not indefinitely-increasing) amount of time.

There may be some bugs to fix, not closing the PR yet.

Because the memory is constant as the algorithm computes *forward* over the history, the next logical step is to cache the algorithm state and reuse it as new testruns are added for more recent commits. That would solve the extravagant-amount-of-time problem in practical use.

Comment 2 Serhei Makarov 2020-01-15 19:40:03 UTC

Figuring out how to cache the +new_regressions analysis in a reasonable way. The problem is that +new_regressions can be run with different key=* and window_size/novelty_threshold=* arguments. So the cached data must be stored in a way that allows recomputation if necessary.

My current thought is to store
- list of single_change: (name, subtest, outcome_pair)
- map of commit_pair -> list of key or '*'
- map of commit_pair -> list of (index of) single_change

Comment 3 Serhei Makarov 2022-09-16 17:15:34 UTC

I did a lot of experimentation with the regression-filtering view on the prototype branch earlier this spring, but it's still far from perfect in terms of accurate & easy-to-read results. Eventually I will try implementing something similar to the prototype/2022:scripts_main/show_changes*.py scripts on the new master branch, so this PR is still relevant going forward.

For the time being the 'best practice' for getting an overview of regressions is to generate a set of grid views with R-show-testcases ( / show_testcases.py ), grab a coffee, and page through the results.