Compressing hippos really fast
Phil Betts
Phil.Betts@ascribe.com
Tue Mar 4 18:35:00 GMT 2008
Corinna Vinschen wrote on Tuesday, March 04, 2008 3:43 PM::
> Hi,
>
>
> does anybody know about a compression tool which is above all capable
> of compressing really fast? The compression ratio is only a mild
> concern, it's rather more important that the tool is not acting as
> bottleneck when compressing files which are badly compressable.
> Unfortunately
> the usual compression tools are rather interested in a good
> compression than in a good speed when streaming lots of data.
>
> Here are a couple of disks which are supposed to be backed up. Right
> now this is done using a script which creats tar.gz archives of all
> disks. Some of this disks are quite big and contains many files which
> are already compressed. It turns out that gzipping these disks is
> *the* bottleneck when backing up.
>
> When not compressing, tar creates archives with 37MB/s. When creating
> tar.gz archives, the compression takes so much time that the speed
> goes down to 6MB/s. Using gzip --fast doesn't help much. bzip is a
> lot slower than gzip.
>
> So the question is, does anybody know a compression tool which can be
> used with tar, which doesn't slow down the backup by a factor of 6?
> It would be cool to have a tool which is as quick as the hardware
> compression used in modern tape drives, but that's just dreaming...
>
>
> May the hippos be with you,
> Corinna
I had this problem ages ago. My solution was to run two backups.
One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ],
*.bz2, *.zip etc, and one for the remainder which was piped
through gzip.
Even a fast compression algorithm is just wasting time trying to
compress previously compressed files, and as most compressors work
on some variant of Lempel Ziv, if they're fed a mixture of
compressible and incompressible data, the incompressible data
flushes the dictionary making the compression of the compressible
part worse.
Phil
More information about the Cygwin-talk
mailing list