Bug 29778 - [gdb/testsuite] Revise untested usage
Summary: [gdb/testsuite] Revise untested usage
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: testsuite (show other bugs)
Version: HEAD
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-12 10:42 UTC by Tom de Vries
Modified: 2023-01-22 17:31 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2022-11-12 10:42:09 UTC
The definitions/explanations for UNTESTED in the dejagnu manual are:
...
A test case is not yet complete, and in particular cannot yet produce a PASS or FAIL. You can also use this outcome in dummy “tests” that note explicitly the absence of a real test case for a particular property.
...
and:
...
untested
prints a message for an test case that isn’t run for some technical reason. 
...
and:
...
untested
prints a message for an test case that isn’t run for some reason.
...
and:
...
UNTESTED
A test was not run. This is a placeholder used when there is no real test case yet.
...
and:
...
For example, you might use this in a dummy test whose only role is to record that a test does not yet exist for some feature.
...

I don't think the definition is very clear, but as I understand it the valid uses of UNTESTED are:
...
gdb.base/features.exp: PASS: feature a
gdb.base/features.exp: UNTESTED: feature b
...
or:
...
gdb.base/feature-b: UNTESTED: <feature b in more detail>
...

In the gdb testsuite there's an interpretation of UNTESTED as "not having progressed sufficiently in executing an otherwise complete test-case to generate a PASS or FAIL", which explains why it's used for say failure to compile.

I suspect this usage is wrong, but again given the vague definition I'm not sure.

Anyway, perhaps given the vague definition, I think we should eliminate the usage of untested, by redefining it to:
...
proc untested { message } {
  error "untested: $message"
}
...
and fixing the errors.

Note that out of a total of:
...
$ egrep -c UNTESTED: gdb.sum 
35
...
I have:
...
$ egrep -c UNTESTED:.*(not supported|unsupported) gdb.sum 
18
...
which probably should use unsupported instead of untested.
Comment 1 Tom de Vries 2022-11-15 10:53:10 UTC
Hmm, reading the definition of UNSUPPORTED:
...
There is no support for the tested case. This may mean that a conditional
feature of an operating system, or of a compiler, is not implemented. DejaGnu
also uses this message when a testing environment (often a “bare board” target)
lacks basic support for compiling or running the test case. For example, a test
for the system subroutine gethostname would never work on a target board
running only a boot monitor.
...
and seeing how like XFAIL it's focused on the environment, I start to wonder if the role of UNTESTED was intended to be that it's to UNSUPPORTED what KFAIL is to XFAIL.

In other words:
- xfail: fail due to problem in environment
- kfail: fail due to problem in gdb
- unsupported: not run due to missing optional feature in environment
- untested: not run due to missing optional feature in gdb

That way, each one implies one type of action to achieve a passing test-case:
- xfail: fix environment
- kfail: fix gdb
- unsupported: add missing feature to environment
- untested: build gdb with missing feature enabled
Comment 2 Tom de Vries 2022-11-15 11:49:08 UTC
And then there are the test-cases that use proc skip_ variants which largely seem happy to skip only doing "verbose $msg" or just silently.
Comment 3 Simon Marchi 2022-11-15 14:35:51 UTC
(In reply to Tom de Vries from comment #1)
> Hmm, reading the definition of UNSUPPORTED:
> ...
> There is no support for the tested case. This may mean that a conditional
> feature of an operating system, or of a compiler, is not implemented. DejaGnu
> also uses this message when a testing environment (often a “bare board”
> target)
> lacks basic support for compiling or running the test case. For example, a
> test
> for the system subroutine gethostname would never work on a target board
> running only a boot monitor.
> ...
> and seeing how like XFAIL it's focused on the environment, I start to wonder
> if the role of UNTESTED was intended to be that it's to UNSUPPORTED what
> KFAIL is to XFAIL.
> 
> In other words:
> - xfail: fail due to problem in environment
> - kfail: fail due to problem in gdb
> - unsupported: not run due to missing optional feature in environment
> - untested: not run due to missing optional feature in gdb

I'm struggling a bit with the difference between unsupported and untested, especially because unsupported sounds like a specific case of untested.  And except because of what the Dejagnu documentation says, unsupported really sounds like it could be used for both "missing feature in environment" and "missing feature in gdb".  Using unsupported just for the former one seems arbitrary.
 
> And then there are the test-cases that use proc skip_ variants which largely seem happy to skip only doing "verbose $msg" or just silently.

I think all tests should at least record at least one untested or unsupported, so that there's a trace in gdb.sum.
Comment 4 Tom de Vries 2022-11-18 15:33:09 UTC
(In reply to Simon Marchi from comment #3)
> > And then there are the test-cases that use proc skip_ variants which largely seem happy to skip only doing "verbose $msg" or just silently.
> 
> I think all tests should at least record at least one untested or
> unsupported, so that there's a trace in gdb.sum.

I think that's an ok principle, but once the volume grows the usefulness is reduced.

I think it's fine to avoid generating noise.  For instance, I would consider it noise to record that a ppc64-specific test is not run on x86_64.
Comment 5 Tom de Vries 2022-11-18 15:39:06 UTC
(In reply to Simon Marchi from comment #3)
> > and seeing how like XFAIL it's focused on the environment, I start to wonder
> > if the role of UNTESTED was intended to be that it's to UNSUPPORTED what
> > KFAIL is to XFAIL.
> > 
> > In other words:
> > - xfail: fail due to problem in environment
> > - kfail: fail due to problem in gdb
> > - unsupported: not run due to missing optional feature in environment
> > - untested: not run due to missing optional feature in gdb
> 
> I'm struggling a bit with the difference between unsupported and untested,
> especially because unsupported sounds like a specific case of untested.  And
> except because of what the Dejagnu documentation says, unsupported really
> sounds like it could be used for both "missing feature in environment" and
> "missing feature in gdb".  Using unsupported just for the former one seems
> arbitrary.
>  

I know some struggle with the difference between kfail and xfail and feel it's arbitrary.  For me the difference is clear: for a kfail you need to record a gdb PR.  For an xfail, you don't.

Between unsupported and untested, I agree what I proposed here can bee seen as arbitrary, it's just an attempt to make sense of the available incomplete information.

All I care about is not having to wonder each time whether to use untested or unsupported, either by:
- having clear rules when to use them, or
- eliminated the use of one of them by generating an error, or
- simply mapping one onto the other.
Comment 6 Tom de Vries 2022-11-18 15:41:33 UTC
(In reply to Tom de Vries from comment #5)
> - simply mapping one onto the other.

Say:
...
proc unsupported { message } {
   untested "unsupported: $message"
}
...
Comment 7 Tom de Vries 2022-11-18 15:51:54 UTC
(In reply to Tom de Vries from comment #4)
> (In reply to Simon Marchi from comment #3)
> > > And then there are the test-cases that use proc skip_ variants which largely seem happy to skip only doing "verbose $msg" or just silently.
> > 
> > I think all tests should at least record at least one untested or
> > unsupported, so that there's a trace in gdb.sum.
> 
> I think that's an ok principle, but once the volume grows the usefulness is
> reduced.
> 
> I think it's fine to avoid generating noise.  For instance, I would consider
> it noise to record that a ppc64-specific test is not run on x86_64.

And we could also avoid noise by emitting one untested/unsupported per skip group.  I don't need to know that gdb.server/*.exp is skipped for each test-case, if gdbserver is unavailable.
Comment 8 Tom Tromey 2023-01-18 19:02:17 UTC
I wasn't aware of this bug when I wrote the "require" series.

I also stumbled over the weird text in dejagnu that made me
that "untested" was essentially useless.  So, I changed require
to use unsupported.

On reflection though, I think what dejagnu says isn't super important.
We can just pick the one we like for gdb.

I tend to think the noise is fine.  My reasoning is that these
procs give us a way to see not just that a test didn't run, but
also why it didn't run.  Occasionally this reason could be surprising
and lead to some kind of bug fix.  So my view is that the silent
"return"s in the test cases should be converted.

Centralizing these decisions with "require" or other "do task and
return on failure" procs makes this simpler to achieve.
(Here I was thinking of prepare_for_testing but it looks like
build_executable can call untested.)

One other way centralizing is nice is that if we decide to remove
the noise, we can just remove these calls and be done.
Comment 9 Tom Tromey 2023-01-22 17:31:28 UTC
I found a case where the noisier output would be actively helpful.
For me, dw2-ranges.exp started silently doing nothing.
Investigating it:

/home/tromey/gdb/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.dwarf2/dw2-ranges-3.c: warning: STABS debugging information is obsolete and not supported anymore

So, a new compiler warning means we started skipping a test.

If every early return had some annotation that showed up in the .sum,
we could detect this scenario... not saying there's anything to do
about it, but it seems better to flag it an intentionally ignore
it than to have some tests simply silently stop running and never
really find out.