*.cwp file recognistion patch for setup

Robert Collins robert.collins@itdomain.com.au
Sat Sep 8 06:48:00 GMT 2001


Hi Warren, 
  thanks for submitting a patch. Nice to see some code coming in (not
meaning to imply that assisting on the support lists/documenting/package
management etc aren't contributions as well :}).

I've got a few comments about the patch though :[.

In overview, I don't thinks its ready to put into setup. It would work,
but it also would become an impediment to other later changes, and thus
make more work ... If you're willing to put in a little more time, I
think we'd have a real good solution. I hope my critique below parses as
constructive criticism - thats all it's meant as.

1) the #if 1...#else...#endif's. Either the code is good or bad - CVS
is for tracking changes, not compiler directives! If your intention was
to prevent clobbering the existing code, don't worry about that :]. CVS
will let use recover any version, so it's best to have only the
production code in the files. Under rare circumstances, such as when a
feature is needed for some folk, but not others, then keeping it
#ifdef'd makes sense... but then it would be usual to give it a symbolic
name - ie #ifdef CWP_SUPPORT.
2) Your code is very C-like. That is, it doesn't make use of some the
nice things C++ can do for us. See my suggested solution below for
somewhat more detailed approach. 
3) You've got magic numbers in the code (*groan*). By this I mean that
there are constant strings/numbers (ie gz, bz2, cwp, BZh) in the code
with a) no explanation (and while it's trivial code at the moment, these
things have habit of growing, and before we know it they are in much
less readable code :}) and b) not defined for reuse. A better way to
code that sort of thing is via #defines or enums. (ie
#define GZ_EXT ".gz"
#define BZ2_EXT ".bz2"
#define CWP_EXT ".cwp"
and then use strncmp (ac, CWP_EXT, "4"). and so on. This may seem like
work for the sake of work, bu I assure you it's not. Those #defines
accomplish several things: the magic string tests become more clear (ie
GZ_MAGIC vs \037\213; they become centralised (easily accessible list of
known identifiers, even if the code is more spread out), and they can
get reused, if the same test needs to be made in different contexts and
a function isn't appropriate.  (I'm not claiming to be guilt free on
this point btw :}).
4) filename comparisons should use strncmp not strcmp on win32. tsk tsk.
5) in find_tar_ext, you shouldn't need to copy the path, strrchr is non
dsestructive.
6) You have a robustness bug in the code: a cwb that is a renamed gz,
not a renamed .tar.gz or .tar.bz2 will die. IIRC we have an explicit
test for .gz on its own now - and that needs to be moved from choose.cc
to the tar.cc code - because we are now content dependent. 

mmm thats about it on the patch you submitted, now for how I'd suggest
it's done to get the maximum benefit from the change. This approach is
IMO actually easier to code and maintain, but theres a little more
legwork to do to get it up and running...

1) just _ignore_ the file extention. For choose.cc accept anything from
the .ini. If scanning local non-ini listed files, call something like
magic_is_archive(path), which will attempt to magic_open_archive(path)
and then close the resulting stream. if the open fails it would return a
error. 
2) When we go to use the file we feed the path into a thing called
magic_open_archive, which we expect to return NULL (file was not an
archive) or an archive object (not currently existing theoretical parent
of tar object). (We don't need to create an archive class for this
patch, I'm just pointing out that the next step down the
rpm/deb/whatever track requires multiple archive types, but they all
have the characteristic of containing multiple files, and possibly being
a compressed file themselves.
magic_open_archive first makes a stream from the path, and then does a
magic_open_archive_stream (probably just overload magic_open_archive -
it's easier to read :]).

magic_open_archive_stream detects the file type by magic number
(tar|gz|bz) and then 
a) fails if it's not recognised or has too little content for magic
detection.
b) returns an archive object (again here that can simply be a tar object
- one step at a time) if the magic number was for a tar file. 
c) if it's a known compression type, gets a stream from it (ie calls
bz() _and then feeds that back into magic_open_archive_stream_ (ie
return magic_open_archive_stream(compressed_stream)). This recursion
means that we support compressed archives within other compressed files
- or other archives.

bz() and gz() need to be tweaked to support streams as well as paths.

For ease of use we probably want we probably want a magic constructor
that takes a stream, checks the magic and returns a stream that handles
the contents. (ie raw for unknown, gz() for gz, bz() for bz, tar for tar
file.) That makes magic_open_archive_stream even easier, all it has to
do is test if the streamable object is an archive, not a compressed file
- so a couple of virtual functions is_compressed and is_archive will
solve that. 

This has several advantages: files that are not tar archives are
detected be exclusion, not by inclusion and without reliance on knowing
the extension. (This is beneficial - consider when if add ar support. We
would otherwise have to alter every function that tests for .tar :-/ )

Secondly its easily expandable. The file name no longer plays a part in
how we handle the files. Even setup.ini could be tweaked :].

It's probably a bit more work :] but I think that like the hierarchy and
category stuff (not that they are in use yet!) it's well worth it. It
will make dealing with rpms and other multi-archive formats much easier
(just pass the outer stream to magic_open_archive_stream at the
beginning of the embedded archive, and keep reading after that embedded
archive is processed).

Rob
-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEABECAAYFAjuaIdMACgkQf4izsFjHGB4a3wCguuVGprLkcD8t/lZVC6CN9vlA
Xx4An0gi7WPRWPjYzHG80rc6SP2yrd1j
=6Xh1
-----END PGP SIGNATURE-----


More information about the Cygwin mailing list