[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bzip2 test suite (Was: bzip2 1.0.7 released)

On Fri, 2019-06-28 at 13:10 +0200, Mark Wielaard wrote:
> >    In particular, the three test files in the tarball merely serve to verify
> >    that the build didn't fail in some obvious way.  They are in no way a
> >    comprehensive test set.
> Yes, having a more comprehensive test set would be great.
> Especially having file encoded with other bzip2 encoders.

I don't claim it is comprehensive at all, but I have gone through all
the other bzip2 encoder/decoder projects I could find and collected the
samples/tests they used. To not clutter up the main repo I have setup a
new repository with a test driver.

git clone git://sourceware.org/git/bzip2-tests.git

It only has one commit:

commit 3e71fa8acda490a6b288833d759eb4129ad573e2
Author: Mark Wielaard <mark@klomp.org>
Date:   Sun Jun 30 20:56:33 2019 +0200

    Initial bzip2 test suite.
    Contains test files from the commons-compress, dotnetzip, go, lbzip2
    and pyflate projects. Each test file has either an associated md5 sum
    to check it decompressed correctly. Or it is named bz2.bad to indicate
    it cannot be decompressed (or might need --force to decompress).
    The run-tests.sh test wrapper runs a bzip2 binary in a couple of
    configurations, optionally under valgrind, on all the test files in
    the tree.

See the attached README to see how to run it.

Even though it might not be comprehensive, it is a start. And you can
use the ./run-tests.sh test runner to run over a full directory tree.
See the end of the README file.

It already contains a testcase (from lbzip2) that shows the original
issue with bzip2 1.0.6
(compiled with gcc -fsanitize=undefined -fno-sanitize-recover):

Processing lbzip2/32767.bz2
decompress.c:299:24: runtime error: index 18002 out of bounds for type 'UChar [18002]'
!!! bad decompress result 1

And with bzip 1.0.7 it does fail as follows:

Processing lbzip2/32767.bz2

bzip2: Data integrity error when decompressing.
!!! bad decompress result 2

Sadly it also shows that the fix we have isn't complete.
Even with that fix it fails as above.

I think I have a fix for that though. Will post in a minute.

I'll try to integrate test suite with the buildbots (although some will
have to at least use --without-valgrind because testing is really,
really, really slow under valgrind).


= BZ2 test file collection =

This is a collection of "interesting" .bz2 files that can be used to
test bzip2 works correctly. They come from different projects.

Each directory should contain a README file explaining where the .bz2
files originally came from. Plus a reference to the (Free Software)
license that the project files were distributed under.

Some files are deliberately bad, and are use to see how bzip2 handles
corrupt files. They are explicitly not intended to decompress correctly,
but to catch errors in bzip2 trying to deal with deliberately bad data.
All such files have a name ending in .bz2.bad.

All none bad files end in bz2. And should come with a .md5 file for
the original input file. The .md5 file is used to check that bzip2
could correctly decompress the file. The original (non-compressed)
files are deliberately not checked in.

A .md5 sum is generated by:
  md5sum < file > file.md5

This generates a .md5 file that doesn't carry a file name (but just "-").
They can then be checked again with:
  md5sum --check file.md5 < file

Note do NOT name a file ending in .testfilecopy or .testfilecopy.bz2.
Those will automatically be cleaned by up the testframework.

There is a simple bash script to run the tests:

run-tests [--bzip2=bzip2-command] [--without-valgrind]
          [--ignore-md5] [--tests-dir=/path/to/bzip2-tests-dir]

It will by default test with the command 'bzip2', running under
valgrind (if installed on the system), checking md5 sum files
after decompression using the current directory (".") to find
any .bz2 or .bz2.bad files (and .md5 files if checked).

For each .bz2 file found it is decompressed, recompressed and
decompressed again. Once with the default bzip2 settings and
once in --small (-s) mode.

For each .bz2.bad file decompression is tried twice also. In
default mode and small mode. The bzip2 binary is expected to
return either 1 or 2 as exit status. Any other exit code is
interpreted as failure.

If you just want to check a directory (and any subdirectories)
full of (known good) .bz2 files you can invoke the script as:

  ./run-test --ignore-md5 --tests-dir=/dir/full/off/bz2/files