[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bzip2 next steps - goals for 1.0.9



Hi all,

After the fire drills that were necessary to get the bzip2 1.0.7 "Help
a security issues!" and bzip2 1.0.8 "O, wait, maybe that was a little
too secure!" releases out I hope we will get a lot more time to do a
bzip2 1.0.9 release.

But having been forced to do two releases was also good. We got the old
website, including all old releases on sourceware.org. There is now a
git repository, also including the code for older releases. The whole
release process is now automated. Updates to the code, website and
manual are now all synchronized. We have the start of a more
comprehensive testsuite now. With buildbots on various architectures
running it on every commit. bzip2 is now part of oss-fuzz pulling from
the new git repository. And we did manage to integrate several changes
from distros and other forks/downstream back into the upstream bzip2
sources.

If we had more time (which I think we have, no rush to push out 1.0.9
quickly) then I think we want to do the following things:

- Extend the testsuite with more bz2 files that show interesting corner
  cases (also as new seeds for fuzzers).

- Add more tests than just plain compress and decompress targets.
  In particular I believe -f has some subtle behavior.
  But it would also be good to have at least tests for all the libbz2
  interfaces (some of which are not official/documented, see below).

- Add a Windows (cross) build and test (under wine) to the buildbot
  as the major non-unix build that is supported.

- Provide more fuzzer targets and make them part of the upstream code
  with a small wrapper so they can double as regression tests.
  At least create targets for the low level BZ2_bzCompress,
  BZ2_bzDecompress, high-level BZ2_bzRead (BZ2_bzReadGetUnused),
  BZ2_bzWrite (BZ2_bzWriteFinish) and utility BZ2_bzBuffToBuffCompress,
  BZ2_bzBuffToBuffDecompress functions in various configurations
  and in compress/decompress mode to double check we can decompress
  anything we compress ourselves.

- Update the manual to at least include documentation for the zlib
  compatibility functions (see below). And double check it for any
  other changes we made since the project moved to sourceware.

- Figure out how to produce the pdf version of the manual on more
  setups. Currently it works perfectly on my RHEL7 setup, but some of
  the buildbots cannot do a make dist because they don't produce a
  correct pdf version (the html variant seems fine though).

- Some distros have some fixes for the man pages. Mainly symlinks for
  some binaries. Look whether that should be upstreamed.

- Related, it would be good to generate the man page from the manual
  again. Or find some other arrangement so that one or the other
  is the main copy from which the other is generated/imported.

- We didn't pick up the Debian patch to add O_EXCL/O_CLOEXEC because
  it seemed not portable (if we had a cross windows builder it would
  probably have shown that the BZ_UNIX guards were incorrect). This
  patch should probably be split in two.
  - The O_EXCL part for bzip2recover should be easy to work exactly
    like with bzip2. It would be good to not override the output
    file if it already exists.
  - The O_CLOEXEC one, through fopen "e" mode, part is trickier though.
    It is only for the zlib compatible functions BZ2_bzopen and
    BZ2_bzdopen. But those are not officially supported. Or as the
    manual says: " These functions are not (yet) officially part of the
    library, and are minimally documented here.  If they break, you get
    to keep all the pieces." And, worse, they seem to not be really
    api compatible with the zlib versions. In particular the zlib
    variants actually pass-through the "e" mode (!) so you don't need
    to do that inside those functions themselves...

- We really should come up with a good path forward for the SONAME
  mess. Currently Debian based or freedesktop SDK based distros/builds
  use the current upstream libbz2.so.1.0 name. While Fedora and
  openSUSE based distros will use the "sane" libbz2.so.1 name.
  This means programs build against libbz2.so on those variants
  don't run on the other. And so updating the upstream default
  will break backward compatibility for one or the other.
  I hope we can come up with an upgrade path that makes it possible
  to run both binaries against a future libbz2.
  My idea is simply to switch to the "sane" libbz2.so.1 name,
  but also provide a wrapper library with the old libbz2.so.1.0 name.
  Which might be as simply as:
  gcc -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0 -lbz2
  (where that -lbz2 linked against is the "sane" libbz2.so.1)
  But there might be subtle issues with that I haven't thought about.

- Related, we probably also should tweak the symbol visibility so
  libbz2 only exports functions/symbols that are actually supported.

- Nobody seems to really like our current build system and the Makefile
  (fragments) you might have to hand edit. It is really basic and
  simple and should work almost everywhere. But some of the above might
  be helped with a slightly better build system. The best candidate
  seems to be the autotools system that is already in use by various
  distros and by Nix Packages collection (including for building cross
  MinGW packages) since it was explicitly created to be integrated
  upstream:
  http://ftp.suse.com/pub/people/sbrabec/bzip2/README.autotools
  (We will have to update the manual though, which currently says that
   autotools aren't necessary, but I think just making cross compiling
   to Windows easy proofs the original author was wrong, sorry Julian!)

Please let me know if I missed any other obvious goals for bzip2 1.0.9.

Cheers,

Mark