Bug 22432 - Non-deterministic build
Summary: Non-deterministic build
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: build (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.27
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-13 19:23 UTC by Juro Bystricky
Modified: 2017-11-30 21:27 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Juro Bystricky 2017-11-13 19:23:42 UTC
Building various libraries can be non-deterministic because they may be be built with two different versions of intl/plural.c. in two separate builds. 
We may or may not regenerate the file from thefile plural.y, based on modification times of those two files, as the Makefile contains:
    
    plural.c: plural.y
    	$(BISON) $(BISONFLAGS) $@ $^

GIT does not preserve modification times of files on checkout, so I don't see how the above rule can work reliably (deterministically).
The libraries (with or without the above rule firing) will build correctly, so we may not notice they differ until we try to binary compare them.

(Please see http://git.yoctoproject.org/cgit/cgit.cgi/poky-contrib/commit/?h=jurob/reproducible_binaries_4&id=936494cf34f003860352472bcdc43ec21433d27c)
Comment 1 Dmitry V. Levin 2017-11-13 19:57:35 UTC
(In reply to Juro Bystricky from comment #0)
> GIT does not preserve modification times of files on checkout, so I don't
> see how the above rule can work reliably (deterministically).

Consider running the git-set-file-times script
(https://git.samba.org/?p=rsync.git;a=blob_plain;f=support/git-set-file-times)
to set each file's last-modified time based on its last commit.
Comment 2 Juro Bystricky 2017-11-13 20:22:50 UTC
(In reply to Dmitry V. Levin from comment #1)
> (In reply to Juro Bystricky from comment #0)
> > GIT does not preserve modification times of files on checkout, so I don't
> > see how the above rule can work reliably (deterministically).
> 
> Consider running the git-set-file-times script
> (https://git.samba.org/?p=rsync.git;a=blob_plain;f=support/git-set-file-
> times)
> to set each file's last-modified time based on its last commit.


Wouldn't it make more sense to remove the rule entirely? Once you are aware of the problem, it is easy to avoid it even without the script. However, I think most people are not aware of the issue and will not use the script (by default) when building glibc. So IMHO this should be either fixed (i.e. remove the rule entirely) or at least mentioned somewhere in README.

I can send in a patch if needed.
Comment 3 jsm-csl@polyomino.org.uk 2017-11-13 23:22:03 UTC
Well, the rule is required to regenerate plural.c if it's out of date (for 
example, the copyright date update process involves updating *source* 
files and then running a build so checked-in generated files get updated 
accordingly.

You could make a proposal to add a requirement for Bison as a build tool 
for building glibc, in which case the checked-in plural.c could be 
removed.
Comment 4 Juro Bystricky 2017-11-14 14:56:39 UTC
(In reply to joseph@codesourcery.com from comment #3)
> Well, the rule is required to regenerate plural.c if it's out of date (for 
> example, the copyright date update process involves updating *source* 
> files and then running a build so checked-in generated files get updated 
> accordingly.
> 
> You could make a proposal to add a requirement for Bison as a build tool 
> for building glibc, in which case the checked-in plural.c could be 
> removed.

I like the idea. I will propose a fix along those lines.
Comment 5 Sourceware Commits 2017-11-30 21:22:49 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  1faaf7035cabda101e1d6653bff7a539f201db91 (commit)
      from  bd6ea9edd1708c7b0166af685a676b91e5b5950d (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1faaf7035cabda101e1d6653bff7a539f201db91

commit 1faaf7035cabda101e1d6653bff7a539f201db91
Author: Juro Bystricky <juro.bystricky@intel.com>
Date:   Thu Nov 30 21:21:15 2017 +0000

    plural.c: improve reproducibility
    
    There is a subtle non-determinism when building glibc.
    This depends on whether the glibc is built using the distibuted
    file intl/plural.c or built using the generated file intl/plural.c.
    These two files (intl/plural.c generated vs. distributed) are slightly
    different, hence we may end up with slightly different libraries.
    
    Originally, having "bison" installed was optional. So if "bison" was
    not present, we always built libraries with the distributed plural.c.
    If bison was installed, we *** may have *** replaced the distributed
    file plural.c with a new plural.c generated from plural.y. if the
    timestamps triggered this rule:
    
    plural.c plural.y
    	$(BISON) $(BISONFLAGS) $@ $^
    
    Given that timestamps are not preserved in GIT repositories, the above
    rule is not reliable without explicitly touching plural.c or plural.y.
    In other words, the rule may or may not have fired.
    
    In summary: there are two distinct sources of non-determinism:
    
    1. Having "bison" installed or not
    2. Having "bison" installed but timestamps poorly defined.
    
    This patch fixes this by requiring "bison" being installed
    and by always generating intl/plural.c from intl/plural.y.
    (This is achieved by simply removing checked-in intl/plural.c)
    
    	[BZ #22432]
    	* configure.ac (BISON): Require to be present.
    	* configure: Regenerated.
    	* intl/Makefile (generated): Add plural.c.
    	[$(BISON) != no]: Make code unconditional.
    	(plural.c): Change rule to $(objpfx)plural.c.
    	($(objpfx)plural.o): Depend on $(objpfx)plural.c.
    	* intl/plural.c: Remove.
    	* manual/install.texi (Tools for Compilation): Document bison as
    	required.
    	* INSTALL: Regenerated.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog           |   14 +
 INSTALL             |   10 +-
 NEWS                |    3 +-
 configure           |  128 ++--
 configure.ac        |    7 +-
 intl/Makefile       |    9 +-
 intl/plural.c       | 2011 ---------------------------------------------------
 manual/install.texi |   14 +-
 8 files changed, 98 insertions(+), 2098 deletions(-)
 delete mode 100644 intl/plural.c
Comment 6 Joseph Myers 2017-11-30 21:27:12 UTC
Fixed for 2.27.