[RFC] Allow parallel multifile with -p -e

Tom de Vries tdevries@suse.de
Fri Mar 26 16:55:16 GMT 2021


On 3/26/21 5:47 PM, Jakub Jelinek wrote:
> On Fri, Mar 26, 2021 at 05:40:51PM +0100, Tom de Vries wrote:
>> This gives us reproducible compression:
>> ...
>> $ ls -la j1/*
>> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j1/1
>> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j1/2
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/3
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j1/4
>> -rw-r--r-- 1 vries users  64543 Mar 26 17:16 j1/5
>> $ ls -la j4/*
>> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j4/1
>> -rwxr-xr-x 1 vries users  11432 Mar 26 17:16 j4/2
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/3
>> -rwxr-xr-x 1 vries users 807376 Mar 26 17:16 j4/4
>> -rw-r--r-- 1 vries users  64543 Mar 26 17:16 j4/5
>> ...
>>
>> But it doesn't give reproducible results:
>> ...
>> $ md5sum j1/*
>> e6e655f7b5d1078672c8b0da99ab8c41  j1/1
>> e6e655f7b5d1078672c8b0da99ab8c41  j1/2
>> d833aa3ad6ad35597e1b7d0635b401cf  j1/3
>> d833aa3ad6ad35597e1b7d0635b401cf  j1/4
>> d5282aa9d065f1d00fd7a46c54ebde8d  j1/5
>> $ md5sum j4/*
>> de1645ce60bba6f345b2334825deb01f  j4/1
>> de1645ce60bba6f345b2334825deb01f  j4/2
>> ac2f16c50cf3d31be1f42f35ced4a091  j4/3
>> ac2f16c50cf3d31be1f42f35ced4a091  j4/4
>> 7fc3cd2c2514c8bf1f23348a27025b8d  j4/5
>> ...
>>
>> The temporary multifile section contributions happen in random
>> order, so consequently the multifile layout will be different, and the
>> files referring to the multifile will be different.
> 
> What I meant is that each fork should use different temporary filenames
> for the multifiles, once all childs are done, merge them (depends on how
> exactly is the work distributed among the forks, if e.g. for 4 forks
> first fork gets first quarter of files, second second quarter etc., then
> just merge them in the order, otherwise more work would be needed to make
> the merging reproduceable.

Hi,

yes, I understood your comments in bugzilla.  I just wanted to see how
far I got _without_ solving the reproducibility problem.

> Then on generate in a single process the multifile, and then again
> in multiple forks work on the individual files against the multifile.

Yeah, that bit I haven't gotten to yet, but that doesn't look very
difficult.

Thanks,
- Tom


More information about the Dwz mailing list