This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: build id computation


On Tue, Nov 18, Roland McGrath wrote:

> > For computing the build id following things are used:
> > 
> > - the ELF header, without e_phoff and e_shoff
> > - all segments content
> > - all sections content
> 
> It also includes the phdrs and the shdrs, without sh_offset.
> (bfd/elfcode.h:elf_checksum_contents, rpm/tools/debugedit.c:handle_build_id)

Thats correct, I forgot to mention that.

> The purpose of the build ID is to uniquely identify the binary created by a
> build so that its ID only matches that of a semantically identical binary.
> 
> Using only the allocated sections is exactly wrong.  "Semantics" includes
> the contents of all ELF sections, not only SHF_ALLOC ones.  Consider:
> 
> 	$ echo 'main(){} /* war is peace */' > a.c
> 	$ echo 'main(){} /* kumbaya */' > b.c
> 	$ gcc -o a -g a.c
> 	$ gcc -o b -g b.c
> 
> The allocated sections of a and b are identical, as are their stripped
> versions.  But the important human meaning of the two builds is that they
> came from two different sources, written with two very different motivations.

Err, that is too esoteric. You want to checksum the intentions and motivations
of the programmer?

In my opinion what counts is what people are doing and saying, reads .text and
.data. I'm not even convinced if it is important where they say it, reads
directory or file name.

> (Incidentally, note that there is a (recent?) bug in ld's computation that
> makes it emit identical IDs for the test case above.  At least in Fedora 9's
> binutils-2.18.50.0.6-6.fc9.x86_64, that is; I haven't tried building the cvs
> trunk lately.  cf https://bugzilla.redhat.com/show_bug.cgi?id=472152
> But they should be different because of differing .symtab/.debug_* contents,
> and they do end up so in the rpmbuild/debugedit recomputation, and I presume
> they also do with gold.)

Yes, that happens on openSUSE as well. This is what I get with your example:

jblunck@e179:~/Work/build-id$ eu-readelf -n b | grep "Build ID:"
    Build ID: c2cfed592cdff1348fc6c573cbb5ad8b8255d0fe
jblunck@e179:~/Work/build-id$ eu-readelf -n a | grep "Build ID:"
    Build ID: c2cfed592cdff1348fc6c573cbb5ad8b8255d0fe
jblunck@e179:~/Work/build-id$ 

> The reason rpmbuild's debugedit recomputes the build ID is to preserve
> repeatability at the granularity of the whole RPM build.  If you do two
> different rpmbuild runs in identical environments but with different
> _builddir settings, debugedit rewrites the different source directory names
> in the DWARF info so that they become identical.  But the original build ID
> bits produced by ld are different, because they were computed from DWARF data
> containing the different build directory names.  So, debugedit recomputes the
> build ID based on the contents of the actual binary that will be in the RPM.
> Hence, two runs that produce identical DWARF also have identical build IDs.

I understand the motivation behind debugedit but I doubt that producing two
different build ids depending on the _builddir setting is correct. I have never
come across a tool that is changing behavior depending on the pathname it was
compiled from. I have never seen an executable changing behavior depending on
the pathnames embedded in the line number program of the debuginfo file.
In my opinion, changing the pathnames embedded in the debuginfo is not
semantically changing the ELF executable itself.

Ironically changing your testcase as follows produces different build id:

       $ echo 'main(){} /* war is peace */' > a.c
       $ mkdir b
       $ echo 'main(){} /* war is peace */' > b/a.c
       $ gcc -o a -g a.c
       $ gcc -o b -g b/a.c

I would expect the same build id is generated when comparing the file with
eu-elfcmp succeeds.

If I understand you correctly you also want to include the sources into the
checksum. Since that doesn't seem to work today, I think that should be done
externally. Otherwise you would run into problem reproducing builds anyway.
I'm thinking about situations where autogenerated code is involved but the
higher level source hasn't changed. Probably you want to have a reproducable
build id in this cases as well.

Regards,
	Jan

-- 
Jan Blunck <jblunck@suse.de>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]