Does glibc has complete test coverage?

Wed Mar 24 12:31:36 GMT 2021

On 24/03/2021 00:13, Peng Yu via Libc-help wrote:
> On Tue, Mar 23, 2021 at 8:13 PM Mike Frysinger <vapier@gentoo.org> wrote:
> 
>> On 23 Mar 2021 17:02, Jeffrey Walton wrote:
>>> On Tue, Mar 23, 2021 at 4:43 PM Mike Frysinger wrote:
>>>> On 23 Mar 2021 11:39, Peng Yu via Libc-help wrote:
>>>>> https://www.kernel.org/doc/man-pages/missing_pages.html
>>>>>
>>>>> "... quite a few kernel and glibc bugs have been uncovered while
>>>>> writing test programs during the preparation of man pages. "
>>>>>
>>>>> I see the above text. It doesn't make too much sense, as it indicates
>>>>> that glibc does not have complete test coverage.
>>>>>
>>>>> Why not taking an approach of always accompanying each line of source
>>>>> code with appopriate test cases? If this approach is taken, then most
>>>>> bugs should have been eliminated beforehand?
>>>>
>>>> ignoring the legacy aspect (code that's in the tree now but lacks
>> tests),
>>>> you have diminishing returns when it comes to writing unittests, and,
>> as
>>>> can be seen in a recent discussion, glibc is pretty tightly coupled to
>>>> the runtime environment (i.e. the host kernel).  so getting an env that
>>>> matches all the different code paths is challenging.
>>>>
>>>> plus it comes down a bit to this being an open source project for many
>>>> of us, not a job, and you have to be respectful of balancing quality
>>>> and developer time with any requests you make on other volunteers.
>>>>
>>>> along those lines, this is an open source project where "patches are
>>>> welcome", so if you wanted to spend your time improving the frameworks
>>>> and coverage of our tests, we'd welcome you.
>>>
>>> Interns are usually a good choice for writing test cases. It gets them
>>> familiar with the code, frees up a senior developer's time, and helps
>>> avoid the developer's bias.
>>>
>>> Test cases are monkey work that should be delegated. When delegation
>>> does not occur it usually points back to shortcomings in project
>>> management.
>>
>> many are good for delegation, but that doesn't mean quantity is the same as
>> quality.  if we could get 100% coverage but it took weeks to run, but 90%
>> coverage took <1 hour, is that 10% worth it ?  this isn't exactly hyperbole
>> when we have targets that run on simulators or FPGAs and have <<1GHz CPUs.
>>
>> for example, how much of LTP should be part of glibc ?  they have over 1000
>> "syscall" tests which mostly go through the C library's APIs and can catch
>> bugs, but they also take a long time to run.
>>
>> how much should glibc be exercising different kernel versions ?  a lot of
>> our work & APIs depend heavily on the kernel working correctly.  should
>> we be running against every Linux release since 3.2 ?  do we test the many
>> different ways kernels can be compiled ?  do we workaround kernel bugs ?
>> https://sourceware.org/pipermail/libc-alpha/2021-March/123486.html
>> https://sourceware.org/pipermail/libc-alpha/2021-March/123582.html
>>
>> glibc has a matrix of build tools that it can utilize and significantly
>> affects its behavior & output.  do we try every combo of GCC & binutils
>> that we support ?
>>
>> glibc runs on like 20 diff architectures, and many of those have ISA
>> specific optimizations (like x86_64 SSE/AVX/etc...).  that's another
>> huge multiplier.
> 
> 
> You mentioned ”balance” in another email. But isn’t it a balance to not to
> support so many architecture? It sounds like supporting so many
> architectures can cause bugs. Alternatively, it is better to assume certain
> things, that the underlying architecture must meet. If not, add glue in
> between, which should be separate from glibc. In this way, it should be
> much easier to isolate bugs out of glibc.

The extra architectures does adds an extra burden, that's why I am pushing
a lot of implementation consolidation to compartmentalize the architecture
bits and minimize the duplicate code.  The idea is architecture specific
should be added only for optimizations (for instance string or memory
optimizations), arch-specific glue (such as relocation handling) or arch
specific features (such as Intel CET or ARM PAC/BTI).

The refactor kind of work does not really yield immediate gains for the
code base, so architecture maintainer does not focus on this changes.
For instance, I send a long patchset [1] that aims to simplify the code
base for syscall generation on multiple architecture that haven't seen
any review so far.

> 
> Also, the test cases should be white boxed instead of black boxed. If the
> test cases can be made white boxed, it is much less likely to have bugs in
> them than based black boxed strategies.
> 
> The current test cases do not seem to be mostly white boxed?

No one is really against make whitebox tests, the *main* problem for glibc
project is *workforce*. We have a very limited number of developers working
actively and a lot of features and long-standing fixes to work on.  We have
now a backlog of 564 patches that need review.

But we did improve testing a *lot* over the years: the current policy is
add tests for each new feature and bug fix; we added a internal library
(libsupport) that aims to simplify test creation; we added a minimal container
test infrastructure to test pieces that required root or change system status,
and the most senior developers do actively constantly work on newer tests.
The problem again is we need extra engagement to move this forward.

So if you are willing to work on whitebox support for glibc I can help you
devise a strategy the required internal bits.

> 
> Also, using a white boxed approach, the original programmers should also
> write the test cases. But the current way of waiting others to add test
> cases making it hard to use the white boxed approach. The code complexities
> can not be reduce in the black boxed approach. Therefore, I don’t think
> just adding more patches is an efficient way to eliminate the bugs.

That's not what is current practice for glibc development: each new feature
or bugfix is required to add testcase. The problem is have a large code
base where a lot of features were added without proper testcases. We are
trying to improve on this front, but there is a lot of work do to.

> 
> BTW, is there a way to know which part of the code is not covered? Also,
> even a line is covered,how well is tested against corner cases?

There is a lot of feature and code that is not covered from any testing,
I just sent a patchset that add some tests for missing interfaces [2].
I guess there are a lot of corner cases not handled.

[1] https://patchwork.sourceware.org/project/glibc/list/?series=1153
[2] https://patchwork.sourceware.org/project/glibc/list/?series=1893