Dedup x86/x86_64 --> noarch

Jon Turney
Sat Apr 23 10:51:00 GMT 2016

On 16/04/2016 11:03, Achim Gratz wrote:
> After a discussion on IRC about de-duping the noarch content out of
> package files (where I was told this would be too difficult), I've just

I think it was more along the lines of 'not yet' :)

In any case, we need noarch support in calm, before it's useful to have 
dedup of arch packages to noarch.

I think I have implemented the changes to calm to support all-or-nothing 
noarch (i.e. where all packages produced from a source package must be 
noarch), so if you can nominate a suitable, unimportant perl package, we 
can test it with that, initially.

(This wasn't quite as straightforward as just looking in another 
directory for packages, as the upload validation becomes more complex: 
we must check that consistent package sets result for both x86 and 
x86_64 before we can move noarch packages)

To make full use of this, cygport upload will need a feature to upload 
noarch packages from dist/ to noarch/ rather than <arch>/.

On 18/04/2016 20:44, Achim Gratz wrote:
>> Looking at the current repo content we'd save about 30GB from the dedup
>> of the src abd doc packages alone and probably about 20GB from dedup in
>> the remaining packages.
> I've implemented some POC code and deduped my Cygwin mirror (it is
> missing most of KDE and the cross-Cygwin compilation toolchains).  This
> took a solid 12 hours of flat out 400% CPU load on my SandyBridge laptop
> and ballooned the page file to 21GiB.  But it also removed almost
> exactly a third from the repo's size (going from 81.2GiB to 51.4GiB), so
> projected to the full repo it's slightly more than my original estimate.

Thanks.  It's very useful to have some numbers.

I don't think this distinguishes between packages which are (or should 
be) marked ARCH="noarch" in the cygport, and those where the build 
products happen to be identical and can be deduped?

I would guess that this saving is dominated by some very large, 
data-only noarch packages, but who knows?

(Also, looking forward, perhaps cygport needs a separate command to 
build the source package, rather than building it for each arch and then 
deduping it?)

More information about the Cygwin-apps mailing list