Compressing hippos really fast
Lee D. Rothstein
l1ee057@veritech.com
Tue Mar 4 18:57:00 GMT 2008
Sounds like he needs data-dedupe. Google "data de-duplication" for an
array of vendors.
Phil Betts wrote:
> Corinna Vinschen wrote on Tuesday, March 04, 2008 3:43 PM::
>
>
>> Hi,
>>
>>
>> does anybody know about a compression tool which is above all capable
>> of compressing really fast? The compression ratio is only a mild
>> concern, it's rather more important that the tool is not acting as
>> bottleneck when compressing files which are badly compressable.
>> Unfortunately
>> the usual compression tools are rather interested in a good
>> compression than in a good speed when streaming lots of data.
>>
>> Here are a couple of disks which are supposed to be backed up. Right
>> now this is done using a script which creats tar.gz archives of all
>> disks. Some of this disks are quite big and contains many files which
>> are already compressed. It turns out that gzipping these disks is
>> *the* bottleneck when backing up.
>>
>> When not compressing, tar creates archives with 37MB/s. When creating
>> tar.gz archives, the compression takes so much time that the speed
>> goes down to 6MB/s. Using gzip --fast doesn't help much. bzip is a
>> lot slower than gzip.
>>
>> So the question is, does anybody know a compression tool which can be
>> used with tar, which doesn't slow down the backup by a factor of 6?
>> It would be cool to have a tool which is as quick as the hardware
>> compression used in modern tape drives, but that's just dreaming...
>>
>>
>> May the hippos be with you,
>> Corinna
>>
>
> I had this problem ages ago. My solution was to run two backups.
> One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ],
> *.bz2, *.zip etc, and one for the remainder which was piped
> through gzip.
>
> Even a fast compression algorithm is just wasting time trying to
> compress previously compressed files, and as most compressors work
> on some variant of Lempel Ziv, if they're fed a mixture of
> compressible and incompressible data, the incompressible data
> flushes the dictionary making the compression of the compressible
> part worse.
>
> Phil
>
>
More information about the Cygwin-talk
mailing list