Originally reported at https://bugs.llvm.org//show_bug.cgi?id=43748 but it turns out to be related to openmpi and GNU ld. This affects the libopenmpi-dev package on Debian (it has patches to enable symbol versioning). (The Fortran declaration in ompi/include/mpif-f08-types.h is problematic. character, dimension(1, 1), bind(C, name="mpi_fortran_argvs_null_") :: MPI_ARGVS_NULL) cat > a.s <<e .globl mpi_fortran_argvs_null_ .comm mpi_fortran_argvs_null_,1,16 e cat > b.s <<e .globl mpi_fortran_argvs_null_ .comm mpi_fortran_argvs_null_,8,16 e as a.s -o a.o # st_size(mpi_fortran_argvs_null_) = 1 gcc -shared b.s -o b.so # st_size(mpi_fortran_argvs_null_) = 8 # The GNU ld output takes st_size (8) in b.so into consideration. ld.bfd -shared a.o b.so --version-script =(printf 'OMPI_2.0.0 {global: mpi_fortran_argvs_null_;};') -o a.so % readelf -W --dyn-syms a.so | grep mpi_fortran_argvs_null_ 1: 0000000000002000 8 OBJECT GLOBAL DEFAULT 9 mpi_fortran_argvs_null_ # Note that .gnu.version marks mpi_fortran_argvs_null_ as a local version (VER_NDX_LOCAL). % readelf -WV a.so Version symbols section '.gnu.version' contains 3 entries: Addr: 00000000000001d6 Offset: 0x0001d6 Link: 3 (.dynsym) 000: 0 (*local*) 0 (*local*) 2 (OMPI_2.0.0) ... # gold and lld set st_size to 1, ignoring the definition in b.so % readelf -W --dyn-syms a.gold.so | grep mpi_fortran_argvs_null_ 5: 0000000000002000 1 OBJECT GLOBAL DEFAULT 10 mpi_fortran_argvs_null_@@OMPI_2.0.0 % readelf -W --dyn-syms a.lld.so | grep mpi_fortran_argvs_null_ 1: 0000000000003390 1 OBJECT GLOBAL DEFAULT 10 mpi_fortran_argvs_null_@@OMPI_2.0.0 GNU ld and gold are a bit tolerant in that they allow shared definitions with the local version (VER_NDX_LOCAL). cat > c.s <<e cmp %r13, mpi_fortran_argvs_null_@gotpcrel(%rip) e as c.s -o c.o % ld.bfd a.so c.o ld.bfd: warning: b.so, needed by a.so, not found (try using -rpath or -rpath-link) ld.bfd: warning: cannot find entry symbol _start; defaulting to 0000000000401000 % ld.lld a.so c.o ld.lld: error: corrupt input file: version definition index 0 for symbol mpi_fortran_argvs_null_ is out of bounds >>> defined in a.so
symbol versioning is always enabled. openmpi may need a workaround https://github.com/open-mpi/ompi/issues/7209
Let's ignore symbol versioning for now. What is the correct result when linking ld -shared -o a.so a.o b.so? In b.so we have a STB_GLOBAL STT_OBJECT mpi_fortran_argvs_null_ defined in .bss, treated by ld as a common symbol. In a.o we have a STB_GLOBAL STT_OBJECT mpi_fortran_argvs_null_ SHN_COMMON symbol. The ELF spec says of STT_COMMON symbols: "When the dynamic linker encounters a reference to a symbol that resolves to a definition of type STT_COMMON, it may (but is not required to) change its symbol resolution rules as follows: instead of binding the reference to the first symbol found with the given name, the dynamic linker searches for the first symbol with that name with type other than STT_COMMON. If no such symbol is found, it looks for the STT_COMMON definition of that name that has the largest size." OK, so we don't have STT_COMMON here (ld.bfd was implemented before STT_COMMON was added to the ELF spec), but the behaviour of common symbols is clear. If at runtime the dynamic linker may choose the common symbol with the largest size, then it's quite correct for ld to do so too. This is old established behaviour, dating back to at least 1997. I believe it is a bug that lld and gold do not do this. I haven't yet looked at why ld.bfd makes mpi_fortran_argvs_null_ versioned local, that does seem odd.
Maybe we should discuss on the generic ABI mailing list. It is very late here (01:00) so I'll not do that now. If you create a thread, can you CC me? My feeling is that a STN_COMMON (STT_COMMON) definition in a shared object should be treated as a regular definition, not a common symbol. It is subject to copy relocations. Only STN_COMMON (STT_COMMON) from object files are assigned st_size comparing semantics. If we choose the interpretation that the common definition from a shared object wins, it will break the rule that object file definitions beat shared definitions.
No, I'm not saying a common in a shared library wins over a common in a regular object, just that the maximum size is taken into account. See the comment in bfd/elflink.c starting /* NEWDYNCOMMON and OLDDYNCOMMON indicate .. It's interesting that the comment mentions fortran too. I believe this is Ian's code from 1997 or so. See also https://www.airs.com/blog/archives/49
As I recall the behavior in GNU ld is due to SunOS. On SunOS when a shared library defined a common symbol it was required for the main executable to define the symbol with the same size. I think the shared library would then refer to the executable's symbol, so if the size were not the same (or larger) the program would break. I think this approach was required for stdin/stdout/stderr, but I may be misremembering. If a common symbol in a shared library is going to be resolved to a common symbol in the main executable, then it seems essential to consider the size of the common symbol in the shared library. If the two symbols are going to be treated as distinct, then it doesn't matter. So I think the resolution here hinges on that question.
To make sure we are on the same page. In the case that both a.o and a.so define the common symbol: The definition from a.o wins. --version-script should apply versions on the definition. At runtime, the shared object should refer to the executable's symbol. (This is the regular rule for definitions.) st_size: largest st_size from all object files and shared objects alignment: largest st_value (alignment) from all object files and shared objects In my pasted example, mpi_fortran_argvs_null_ should have the default version OMPI_2.0.0 and st_size=8. Action items: gold/lld: respect st_size/st_value in shared objects ld: fix VER_NDX_LOCAL
Agreed. I have a patch for the ld.bfd version problem.
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5fa370e437f39bf73a133cc84c4e6329943522bf commit 5fa370e437f39bf73a133cc84c4e6329943522bf Author: Alan Modra <amodra@gmail.com> Date: Fri Dec 6 11:21:45 2019 +1030 PR25236, common sym versioning In cases where a relocatable object file has a common symbol, no other file has a definition, and there is a matching common symbol found in a shared library then ld will output a definition using the largest of size and alignment for the commons. This patch fixes a bug in ld that ignored common symbols when assigning versions, resulting in such symbols being given VER_NDX_LOCAL versions. PR 25236 * elflink.c (_bfd_elf_link_assign_sym_version): Assign versions for ELF_COMMON_DEF_P symbols. (elf_link_output_extsym, _bfd_elf_add_default_symbol): Adjust to suit.
Fixed for bfd linker.