Bug 23466 - Issues with Windows reproducible builds starting with commit 13e570f80cbfb299a8858ce6830e91a6cb40ab7b
Summary: Issues with Windows reproducible builds starting with commit 13e570f80cbfb299...
Status: RESOLVED INVALID
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-30 16:27 UTC by Nicolas Vigier
Modified: 2019-01-16 11:37 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nicolas Vigier 2018-07-30 16:27:28 UTC
We are building Tor Browser for Windows using binutils and mingw-w64. We are doing reproducible builds (http://reproducible-builds.org/) and with binutils 2.25 and later versions we have issues to reproduce our builds. After git-bisecting, we start seeing this issue with commit 13e570f80cbfb299a8858ce6830e91a6cb40ab7b. We are currently using binutils 2.26.1 with a patch reverting 13e570f80cbfb299a8858ce6830e91a6cb40ab7b, as the commit cannot be easily reverted on later versions.

When running objdump -x on 2 builds of the same dll file, we have this diff (followed by many other reloc lines):

2,3c2,3
< 1/libEGL.dll:     file format pei-i386
< 1/libEGL.dll
---
> 2/libEGL.dll:     file format pei-i386
> 2/libEGL.dll
39c39
< CheckSum              003422cf
---
> CheckSum              00342445
55c55
< Entry 5 0032c000 00019b70 Base Relocation Directory [.reloc]
---
> Entry 5 0032c000 00019b74 Base Relocation Directory [.reloc]
27642,27649c27642,27649
<       reloc 27130 offset  cc3 [30b13cf4] HIGHLOW
<       reloc 27131 offset  cd0 [30b13d01] HIGHLOW
<       reloc 27132 offset  cd7 [30b13d08] HIGHLOW
<       reloc 27133 offset  d81 [30b13db2] HIGHLOW
<       reloc 27134 offset  dd2 [30b13e03] HIGHLOW
<       reloc 27135 offset  dd9 [30b13e0a] HIGHLOW
<       reloc 27136 offset  de0 [30b13e11] HIGHLOW
<       reloc 27137 offset  e83 [30b13eb4] HIGHLOW
---
>       reloc 27130 offset  cc2 [30b13cf3] HIGHLOW
>       reloc 27131 offset  cc9 [30b13cfa] HIGHLOW
>       reloc 27132 offset  cd0 [30b13d01] HIGHLOW
>       reloc 27133 offset  d73 [30b13da4] HIGHLOW
>       reloc 27134 offset  dc3 [30b13df4] HIGHLOW
>       reloc 27135 offset  dd0 [30b13e01] HIGHLOW
>       reloc 27136 offset  dd7 [30b13e08] HIGHLOW
>       reloc 27137 offset  e81 [30b13eb2] HIGHLOW
30163,30166c30163,30166
<       reloc 29656 offset  d2f [30b13d60] HIGHLOW
<       reloc 29657 offset  d42 [30b13d73] HIGHLOW
<       reloc 29658 offset  dbf [30b13df0] HIGHLOW
<       reloc 29659 offset  dd2 [30b13e03] HIGHLOW
---
>       reloc 29656 offset  cff [30b13d30] HIGHLOW
>       reloc 29657 offset  d12 [30b13d43] HIGHLOW
>       reloc 29658 offset  d8f [30b13dc0] HIGHLOW
>       reloc 29659 offset  da2 [30b13dd3] HIGHLOW
36415,36416c36415,36416
<       reloc 35916 offset  dd2 [30b13e03] HIGHLOW
<       reloc 35917 offset  dda [30b13e0b] HIGHLOW


Our ticket where we discussed this issue:
https://trac.torproject.org/projects/tor/ticket/16472#comment:74
Comment 1 Alan Modra 2018-09-10 07:12:59 UTC
The commit message for 13e570f80cb says "See the comment which I'm removing from elf_link_add_archive_symbols."  That's this comment:

 /* Add symbols from an ELF archive file to the linker hash table.  We
-   don't use _bfd_generic_link_add_archive_symbols because of a
-   problem which arises on UnixWare.  The UnixWare libc.so is an
-   archive which includes an entry libc.so.1 which defines a bunch of
-   symbols.  The libc.so archive also includes a number of other
-   object files, which also define symbols, some of which are the same
-   as those defined in libc.so.1.  Correct linking requires that we
-   consider each object file in turn, and include it if it defines any
-   symbols we need.  _bfd_generic_link_add_archive_symbols does not do
-   this; it looks through the list of undefined symbols, and includes
-   any object file which defines them.  When this algorithm is used on
-   UnixWare, it winds up pulling in libc.so.1 early and defining a
-   bunch of symbols.  This means that some of the other objects in the
-   archive are not included in the link, which is incorrect since they
-   precede libc.so.1 in the archive.
+   don't use _bfd_generic_link_add_archive_symbols because we need to
+   handle versioned symbols.

So...  Commit 13e570f80cb may change the order in which files are extracted from archives, which in turn can change the set of files extracted from archives.  Is that the case for your build?  Link with -Wl,-t to see.

Also, you say "When running objdump -x on 2 builds of the same dll file".  Were the two builds comparing builds using different versions of the linker?  If so,  it is quite possible that the output will differ.
Comment 2 Nicolas Vigier 2018-09-17 15:21:57 UTC
(In reply to Alan Modra from comment #1)
> The commit message for 13e570f80cb says "See the comment which I'm removing
> from elf_link_add_archive_symbols."  That's this comment:
> 
>  /* Add symbols from an ELF archive file to the linker hash table.  We
> -   don't use _bfd_generic_link_add_archive_symbols because of a
> -   problem which arises on UnixWare.  The UnixWare libc.so is an
> -   archive which includes an entry libc.so.1 which defines a bunch of
> -   symbols.  The libc.so archive also includes a number of other
> -   object files, which also define symbols, some of which are the same
> -   as those defined in libc.so.1.  Correct linking requires that we
> -   consider each object file in turn, and include it if it defines any
> -   symbols we need.  _bfd_generic_link_add_archive_symbols does not do
> -   this; it looks through the list of undefined symbols, and includes
> -   any object file which defines them.  When this algorithm is used on
> -   UnixWare, it winds up pulling in libc.so.1 early and defining a
> -   bunch of symbols.  This means that some of the other objects in the
> -   archive are not included in the link, which is incorrect since they
> -   precede libc.so.1 in the archive.
> +   don't use _bfd_generic_link_add_archive_symbols because we need to
> +   handle versioned symbols.
> 
> So...  Commit 13e570f80cb may change the order in which files are extracted
> from archives, which in turn can change the set of files extracted from
> archives.  Is that the case for your build?  Link with -Wl,-t to see.

Thanks. I will try a build using -Wl,-t to see if I can get more details about the issue.

> 
> Also, you say "When running objdump -x on 2 builds of the same dll file". 
> Were the two builds comparing builds using different versions of the linker?
> If so,  it is quite possible that the output will differ.

I meant two builds done using the same linker version (the same binutils commit), as it is expected that different binutils versions could produce different outputs. When we are doing two Tor Browser builds using a binutils commit before 13e570f80cbfb2, we get the same output in both builds. When we do two Tor Browser builds using 13e570f80cbfb2 or a later binutils commit, we get a different output in the two builds (although we used the same binutils commit in both builds).
Comment 3 Nicolas Vigier 2019-01-16 11:37:14 UTC
After investigating the issue, it seems that what is causing our build to be unreproductible is that an .a archive that we use to link a dll is containing .o files in random order. It seems that before commit 13e570f80cbfb299a8858ce6830e91a6cb40ab7b the order of .o files inside an .a archive did not matter, but not anymore.

During our build, we are building gcc 6.4.0, which is using an old version of libtool, which is adding the .o files inside libstdc++.a in a nondeterministic order. We can easily patch libtool to sort the .o files when adding them to an .a archive, which is solving the issue for us.

I don't know if ld giving different output depending on the order of .o files is a bug or not. As we can easily solve this issue at the gcc/libtool level, I am closing this ticket, but feel free to reopen it if you think otherwise.