Bug 25236

Summary: common symbols ignore size and alignment of symbols in shared libraries
Product: binutils Reporter: Fangrui Song <i>
Component: goldAssignee: Cary Coutant <ccoutant>
Status: NEW ---    
Severity: normal CC: amodra, ian, iant
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed: 2019-12-07 00:00:00

Description Fangrui Song 2019-11-30 22:34:00 UTC
Originally reported at https://bugs.llvm.org//show_bug.cgi?id=43748 but it turns out to be related to openmpi and GNU ld.
This affects the libopenmpi-dev package on Debian (it has patches to enable symbol versioning).

(The Fortran declaration in ompi/include/mpif-f08-types.h is problematic.
character, dimension(1, 1), bind(C, name="mpi_fortran_argvs_null_") :: MPI_ARGVS_NULL)


cat > a.s <<e
 .globl mpi_fortran_argvs_null_
 .comm mpi_fortran_argvs_null_,1,16
e
cat > b.s <<e
 .globl mpi_fortran_argvs_null_
 .comm mpi_fortran_argvs_null_,8,16
e
as a.s -o a.o                # st_size(mpi_fortran_argvs_null_) = 1
gcc -shared b.s -o b.so      # st_size(mpi_fortran_argvs_null_) = 8

# The GNU ld output takes st_size (8) in b.so into consideration.
ld.bfd -shared a.o b.so --version-script =(printf 'OMPI_2.0.0 {global: mpi_fortran_argvs_null_;};') -o a.so
% readelf -W --dyn-syms a.so | grep mpi_fortran_argvs_null_
     1: 0000000000002000     8 OBJECT  GLOBAL DEFAULT    9 mpi_fortran_argvs_null_
# Note that .gnu.version marks  mpi_fortran_argvs_null_ as a local version (VER_NDX_LOCAL).
% readelf -WV a.so

Version symbols section '.gnu.version' contains 3 entries:
 Addr: 00000000000001d6  Offset: 0x0001d6  Link: 3 (.dynsym)
  000:   0 (*local*)       0 (*local*)       2 (OMPI_2.0.0)
...


# gold and lld set st_size to 1, ignoring the definition in b.so
% readelf -W --dyn-syms a.gold.so | grep mpi_fortran_argvs_null_
     5: 0000000000002000     1 OBJECT  GLOBAL DEFAULT   10 mpi_fortran_argvs_null_@@OMPI_2.0.0
% readelf -W --dyn-syms a.lld.so | grep mpi_fortran_argvs_null_
     1: 0000000000003390     1 OBJECT  GLOBAL DEFAULT   10 mpi_fortran_argvs_null_@@OMPI_2.0.0

GNU ld and gold are a bit tolerant in that they allow shared definitions with the local version (VER_NDX_LOCAL).

cat > c.s <<e
 cmp %r13, mpi_fortran_argvs_null_@gotpcrel(%rip)
e
as c.s -o c.o

% ld.bfd a.so c.o
ld.bfd: warning: b.so, needed by a.so, not found (try using -rpath or -rpath-link)
ld.bfd: warning: cannot find entry symbol _start; defaulting to 0000000000401000
% ld.lld a.so c.o
ld.lld: error: corrupt input file: version definition index 0 for symbol mpi_fortran_argvs_null_ is out of bounds
>>> defined in a.so
Comment 1 Fangrui Song 2019-12-01 00:59:21 UTC
symbol versioning is always enabled. openmpi may need a workaround https://github.com/open-mpi/ompi/issues/7209
Comment 2 Alan Modra 2019-12-05 21:55:14 UTC
Let's ignore symbol versioning for now.  What is the correct result when linking ld -shared -o a.so a.o b.so?  In b.so we have a STB_GLOBAL STT_OBJECT mpi_fortran_argvs_null_ defined in .bss, treated by ld as a common symbol.  In a.o we have a STB_GLOBAL STT_OBJECT mpi_fortran_argvs_null_ SHN_COMMON symbol.

The ELF spec says of STT_COMMON symbols: "When the dynamic linker encounters a reference to a symbol that resolves to a definition of type STT_COMMON, it may (but is not required to) change its symbol resolution rules as follows: instead of binding the reference to the first symbol found with the given name, the dynamic linker searches for the first symbol with that name with type other than STT_COMMON. If no such symbol is found, it looks for the STT_COMMON definition of that name that has the largest size."

OK, so we don't have STT_COMMON here (ld.bfd was implemented before STT_COMMON was added to the ELF spec), but the behaviour of common symbols is clear.  If at runtime the dynamic linker may choose the common symbol with the largest size, then it's quite correct for ld to do so too.  This is old established behaviour, dating back to at least 1997.  I believe it is a bug that lld and gold do not do this.

I haven't yet looked at why ld.bfd makes mpi_fortran_argvs_null_ versioned local, that does seem odd.
Comment 3 Fangrui Song 2019-12-06 09:07:16 UTC
Maybe we should discuss on the generic ABI mailing list. It is very late here (01:00) so I'll not do that now. If you create a thread, can you CC me?

My feeling is that a STN_COMMON (STT_COMMON) definition in a shared object should be treated as a regular definition, not a common symbol. It is subject to copy relocations. Only STN_COMMON (STT_COMMON) from object files are assigned st_size comparing semantics.

If we choose the interpretation that the common definition from a shared object wins, it will break the rule that object file definitions beat shared definitions.
Comment 4 Alan Modra 2019-12-06 12:29:56 UTC
No, I'm not saying a common in a shared library wins over a common in a regular object, just that the maximum size is taken into account.

See the comment in bfd/elflink.c starting /* NEWDYNCOMMON and OLDDYNCOMMON indicate ..  It's interesting that the comment mentions fortran too.

I believe this is Ian's code from 1997 or so.  See also https://www.airs.com/blog/archives/49
Comment 5 Ian Lance Taylor 2019-12-06 18:19:48 UTC
As I recall the behavior in GNU ld is due to SunOS.  On SunOS when a shared library defined a common symbol it was required for the main executable to define the symbol with the same size.  I think the shared library would then refer to the executable's symbol, so if the size were not the same (or larger) the program would break.  I think this approach was required for stdin/stdout/stderr, but I may be misremembering.

If a common symbol in a shared library is going to be resolved to a common symbol in the main executable, then it seems essential to consider the size of the common symbol in the shared library.  If the two symbols are going to be treated as distinct, then it doesn't matter.  So I think the resolution here hinges on that question.
Comment 6 Fangrui Song 2019-12-06 22:48:37 UTC
To make sure we are on the same page. In the case that both a.o and a.so define the common symbol:

The definition from a.o wins. --version-script should apply versions on the definition. At runtime, the shared object should refer to the executable's symbol.  (This is the regular rule for definitions.)

st_size: largest st_size from all object files and shared objects
alignment: largest st_value (alignment) from all object files and shared objects

In my pasted example, mpi_fortran_argvs_null_ should have the default version OMPI_2.0.0 and st_size=8.

Action items:

gold/lld: respect st_size/st_value in shared objects
ld: fix VER_NDX_LOCAL
Comment 7 Alan Modra 2019-12-07 01:08:17 UTC
Agreed.  I have a patch for the ld.bfd version problem.
Comment 8 Sourceware Commits 2019-12-07 05:52:15 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5fa370e437f39bf73a133cc84c4e6329943522bf

commit 5fa370e437f39bf73a133cc84c4e6329943522bf
Author: Alan Modra <amodra@gmail.com>
Date:   Fri Dec 6 11:21:45 2019 +1030

    PR25236, common sym versioning
    
    In cases where a relocatable object file has a common symbol, no other
    file has a definition, and there is a matching common symbol found in
    a shared library then ld will output a definition using the largest of
    size and alignment for the commons.  This patch fixes a bug in ld that
    ignored common symbols when assigning versions, resulting in such
    symbols being given VER_NDX_LOCAL versions.
    
    	PR 25236
    	* elflink.c (_bfd_elf_link_assign_sym_version): Assign versions
    	for ELF_COMMON_DEF_P symbols.
    	(elf_link_output_extsym, _bfd_elf_add_default_symbol): Adjust to
    	suit.
Comment 9 Alan Modra 2019-12-07 07:03:58 UTC
Fixed for bfd linker.