I don't know if this is a glibc bug or not. On ppc64le with recent binutils 2.29, applicaitons fail to start with errors like: ./bin/rustc: error while loading shared libraries: ./lib/libstd-c3a1748e15265da7.so: expected localentry:0 `pthread_condattr_destroy' /usr/bin/cmake: error while loading shared libraries: /lib64/libcurl.so.4: expected localentry:0 `pthread_mutex_destroy' ./dsimpletest: error while loading shared libraries: ../lib/libmumps_common-5.1.1.so: expected localentry:0 `pthread_cond_init' The common theme seems to be that this only affects symbols which live both in libpthread and libc.
2.29 is the first release supporting PPC64_OPT_LOCALENTRY, so it seem like this is still buggy.
List thread: https://sourceware.org/ml/libc-alpha/2017-07/msg00861.html
It may be that the localentry:0 optimisation needs to be turned off by default in ld. Which is a shame, because it gives a nice speedup for plt calls to many small leaf functions.
The problem is this: amodra@pike:~/build/glibc$ readelf -s ./libc.so.6 | grep pthread_condattr_destroy 1344: 000000000012e420 108 FUNC GLOBAL DEFAULT [<localentry>: 8] 10 pthread_condattr_destroy@@GLIBC_2.17 5940: 000000000012e420 108 FUNC GLOBAL DEFAULT [<localentry>: 8] 10 pthread_condattr_destroy amodra@pike:~/build/glibc$ readelf -s ./nptl/libpthread.so.0 | grep pthread_condattr_destroy 299: 0000000000011f70 20 FUNC GLOBAL DEFAULT 11 pthread_condattr_destroy@@GLIBC_2.17 181: 0000000000000000 0 FILE LOCAL DEFAULT ABS pthread_condattr_destroy. 694: 0000000000011f70 20 FUNC GLOBAL DEFAULT 11 pthread_condattr_destroy libpthread.so.0 has a stub implementation of pthread_condattr_destroy which consists of just ld r3,0 blr while libc.so.6 has the full implementation. Users link against libpthread.so which means ld sees the stub symbol first before finding the real implementation in libc.so. By the usual rules of duplicate symbol resolution involving shared libraries, ld must take the first symbol as the definition. The stub symbol of course doesn't need a toc pointer so the optimised localentry:0 call is used. At runtime, ld.so apparently resolves pthread_condattr_destroy to the libc version, which isn't localentry:0. I'm not familiar with the reasons for glibc having two conflicting implementations for pthread functions, but I believe the correct fix is to make the libpthread.so stub symbols weak.
(In reply to Alan Modra from comment #4) > The problem is this: > > amodra@pike:~/build/glibc$ readelf -s ./libc.so.6 | grep > pthread_condattr_destroy > 1344: 000000000012e420 108 FUNC GLOBAL DEFAULT [<localentry>: 8] > 10 pthread_condattr_destroy@@GLIBC_2.17 > 5940: 000000000012e420 108 FUNC GLOBAL DEFAULT [<localentry>: 8] > 10 pthread_condattr_destroy > amodra@pike:~/build/glibc$ readelf -s ./nptl/libpthread.so.0 | grep > pthread_condattr_destroy > 299: 0000000000011f70 20 FUNC GLOBAL DEFAULT 11 > pthread_condattr_destroy@@GLIBC_2.17 > 181: 0000000000000000 0 FILE LOCAL DEFAULT ABS > pthread_condattr_destroy. > 694: 0000000000011f70 20 FUNC GLOBAL DEFAULT 11 > pthread_condattr_destroy > > libpthread.so.0 has a stub implementation of pthread_condattr_destroy which > consists of just > ld r3,0 > blr But this is the full implementation. The stub implementation is in libc.so.6. I agree that it is bizarre that the stub implementation does more work than the real implementation, but that's just an artifact of pthread_condattr_destroy. The problem also happens with pthread_mutex_destroy, where there is actually some code in libpthread. > while libc.so.6 has the full implementation. Users link against > libpthread.so which means ld sees the stub symbol first before finding the > real implementation in libc.so. By the usual rules of duplicate symbol > resolution involving shared libraries, ld must take the first symbol as the > definition. The stub symbol of course doesn't need a toc pointer so the > optimised localentry:0 call is used. At runtime, ld.so apparently resolves > pthread_condattr_destroy to the libc version, which isn't localentry:0. How does this optimization work? Does the link editor detect that the implementation in libpthread does not require the TOC pointer and then activates the optimization? What happens if we change the implementation later, so that it needs a TOC pointer? I still think this optimization has a general issue with symbol interposition. > I'm not familiar with the reasons for glibc having two conflicting > implementations for pthread functions, but I believe the correct fix is to > make the libpthread.so stub symbols weak. That seems wrong because the libpthread implementation is the real (full) implementation.
> How does this optimization work? Does the link editor detect that the > implementation in libpthread does not require the TOC pointer and then > activates the optimization? Yes. > What happens if we change the implementation later, so that it needs a TOC > pointer? That could be solved by versioning. > I still think this optimization has a general issue with symbol interposition. True. > That seems wrong because the libpthread implementation is the real (full) > implementation. OK, so I had it back to front. How is it that an executable or shared library linked against libpthread.so.0 ends up not using the libpthread.so.0 symbols at run time?
(In reply to Alan Modra from comment #6) > > How does this optimization work? Does the link editor detect that the > > implementation in libpthread does not require the TOC pointer and then > > activates the optimization? > > Yes. Note that in that case I think it can be even exact compiler version and optimization options that affect ABI, say with -O0 or -O1 the compiler doesn't optimize some dead, but harder to be proven as dead, access to a global variable, while -O2 or -O3 or whatever other option could optimize that away and make the function not using r2 anymore. So like that the localentry change is useful only for the development models where any single change causes rebuild of all the binaries and shared libraries and then nothing changes until following whole world rebuild. What could work is a shared library providing a hint to the dynamic linker, this function doesn't need or clobber r2, say in some .dynsym flags, and only the dynamic linker then using that information as a hint whether to use a more or less efficient PLT sequence when calling certain symbol. But it has to be something only determined at runtime, there can't be anything decided at static link time (because then some property that is not part of ABI is inlined into binaries or shared libraries otherwise).
Binutils commits 8b5f1ed8 and d44c746a addressed this bug on the binutils side. There is a gold fix yet to be committed.
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=565ed01a4e0e3584f24580177822a5271b1c0c8b commit 565ed01a4e0e3584f24580177822a5271b1c0c8b Author: Alan Modra <amodra@gmail.com> Date: Mon Aug 28 16:27:33 2017 +0930 [GOLD] Symbol flag for PowerPC64 localentry:0 tracking This patch provides a flag for PowerPC64 ELFv2 use in class Symbol, and modifies Sized_target::resolve to return whether the symbol has been resolved. If not, normal processing continues. I use this for PowerPC64 ELFv2 to keep track of whether a symbol has any definition with non-zero localentry, in order to disable --plt-localentry for that symbol. PR 21847 * powerpc.cc (Target_powerpc::is_elfv2_localentry0): Test non_zero_localentry. (Target_powerpc::resolve): New function. (powerpc_info): Set has_resolve for 64-bit. * target.h (Sized_target::resolve): Return bool. * resolve.cc (Symbol_table::resolve): Continue with normal processing when target resolve returns false. * symtab.h (Symbol::non_zero_localentry, set_non_zero_localentry): New accessors. (Symbol::non_zero_localentry_): New flag bit. * symtab.cc (Symbol::init_fields): Init non_zero_localentry_.
The binutils-2_29-branch branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=188f4b19708f85539da85c8cc722589edd32389b commit 188f4b19708f85539da85c8cc722589edd32389b Author: Alan Modra <amodra@gmail.com> Date: Mon Aug 28 16:27:33 2017 +0930 [GOLD] Symbol flag for PowerPC64 localentry:0 tracking This patch provides a flag for PowerPC64 ELFv2 use in class Symbol, and modifies Sized_target::resolve to return whether the symbol has been resolved. If not, normal processing continues. I use this for PowerPC64 ELFv2 to keep track of whether a symbol has any definition with non-zero localentry, in order to disable --plt-localentry for that symbol. PR 21847 * powerpc.cc (Target_powerpc::is_elfv2_localentry0): Test non_zero_localentry. (Target_powerpc::resolve): New function. (powerpc_info): Set has_resolve for 64-bit. * target.h (Sized_target::resolve): Return bool. * resolve.cc (Symbol_table::resolve): Continue with normal processing when target resolve returns false. * symtab.h (Symbol::non_zero_localentry, set_non_zero_localentry): New accessors. (Symbol::non_zero_localentry_): New flag bit. * symtab.cc (Symbol::init_fields): Init non_zero_localentry_.
Fixed.