Bug 21116 - Wrong jump address on ppc64le by using dlopen and dlsym in glibc 2.17 and 2.24
Summary: Wrong jump address on ppc64le by using dlopen and dlsym in glibc 2.17 and 2.24
Status: RESOLVED MOVED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.24
: P2 normal
Target Milestone: ---
Assignee: Florian Weimer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-08 16:26 UTC by grisu
Modified: 2017-02-09 12:07 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Minimal Example for the dlopen bug. (1.46 KB, application/x-gzip)
2017-02-08 16:26 UTC, grisu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description grisu 2017-02-08 16:26:12 UTC
Created attachment 9795 [details]
Minimal Example for the dlopen bug.

During the development of a plugin framework for LAPACK functions I ran into trouble with dlopen/dlsym which seems to compute a wrong address or jump to a wrong address. The problem setup is the following (as short description to the attached code). 

I open the LAPACK (at least version 3.6.1) library using dlopen and search for the symbols "dgetrf", "dgetrf2" and "dgetf2" on start up of the application using the attribute(constructor) mechanism. For each of the three symbols I have a wrapping function with the exactly the same name and the same binary interface. Running the code on x86-64 everything works fine. If I now switch to ppc64le (OpenPower8 - triple: powerpc64le-linux-gnu) it crashes and I get the following backtrace in gdb:

#0  0x000000000074696c in ?? ()
#1  0x00003fffb76af954 in dgetrf2_ () from /home/k/lapacktest/lib64/liblapack.so
#2  0x00003fffb7f60c1c in dgetrf2_ (n=0x3fffffffec14, m=0x3fffffffe3f4, A=0x10046830, lda=0x3fffffffeda0, ipiv=0x1005a0c0, info=0x3fffffffe3f0) at liblapack-calls.c:31
#3  0x00003fffb76af8c8 in dgetrf2_ () from /home/k/lapacktest/lib64/liblapack.so
#4  0x00003fffb7f60c1c in dgetrf2_ (n=0x3fffffffec14, m=0x3fffffffe594, A=0x10046830, lda=0x3fffffffeda0, ipiv=0x1005a0c0, info=0x3fffffffe590) at liblapack-calls.c:31
#5  0x00003fffb76af8c8 in dgetrf2_ () from /home/k/lapacktest/lib64/liblapack.so
#6  0x00003fffb7f60c1c in dgetrf2_ (n=0x3fffffffec14, m=0x3fffffffe734, A=0x10046830, lda=0x3fffffffeda0, ipiv=0x1005a0c0, info=0x3fffffffe730) at liblapack-calls.c:31
#7  0x00003fffb76af8c8 in dgetrf2_ () from /home/k/lapacktest/lib64/liblapack.so
#8  0x00003fffb7f60c1c in dgetrf2_ (n=0x3fffffffec14, m=0x3fffffffe8d4, A=0x10046830, lda=0x3fffffffeda0, ipiv=0x1005a0c0, info=0x3fffffffe8d0) at liblapack-calls.c:31
#9  0x00003fffb76af8c8 in dgetrf2_ () from /home/k/lapacktest/lib64/liblapack.so
#10 0x00003fffb7f60c1c in dgetrf2_ (n=0x3fffffffec14, m=0x3fffffffea74, A=0x10046830, lda=0x3fffffffeda0, ipiv=0x1005a0c0, info=0x3fffffffea70) at liblapack-calls.c:31
#11 0x00003fffb76af8c8 in dgetrf2_ () from /home/k/lapacktest/lib64/liblapack.so
#12 0x00003fffb7f60c1c in dgetrf2_ (n=0x3fffffffec14, m=0x3fffffffec0c, A=0x10046830, lda=0x3fffffffeda0, ipiv=0x1005a0c0, info=0x3fffffffec04) at liblapack-calls.c:31
#13 0x00003fffb76aee54 in dgetrf_ () from /home/k/lapacktest/lib64/liblapack.so
#14 0x00003fffb7f60b8c in dgetrf_ (n=0x3fffffffeda0, m=0x3fffffffeda0, A=0x10046830, lda=0x3fffffffeda0, ipiv=0x1005a0c0, info=0x3fffffffeda4) at liblapack-calls.c:25
#15 0x0000000010000d70 in main (argc=2, argv=0x3ffffffff1d8) at lapack-test.c:58


where in frame #0 the address 0x000000000074696c is wrong. Correct would be 
0x00003fffb7f60c1c as in frame #2. With some work I found out that it seems that inside the plt when the call is performed a wrong address is computed. 

I tried the code on two different versions of glibc and gcc. The first one was on a CentOS 7.3 with glibc 2.17 :
gcc -v :
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/ppc64le-redhat-linux/4.8.5/lto-wrapper
Target: ppc64le-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-ppc64le-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-ppc64le-redhat-linux/cloog-install --enable-gnu-indirect-function --enable-secureplt --with-long-double-128 --enable-targets=powerpcle-linux --disable-multilib --with-cpu-64=power8 --with-tune-64=power8 --build=ppc64le-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) 

ld -v: 
GNU ld version 2.25.1-22.base.el7 

The second one was a Debian Stretch with glibc 2.24: 
gcc -v:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc64le-linux-gnu/6/lto-wrapper
Target: powerpc64le-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-5' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=powerpc64le-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-ppc64el/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-ppc64el --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-ppc64el --with-arch-directory=ppc64le --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc=auto --enable-secureplt --with-cpu=power8 --enable-targets=powerpcle-linux --disable-multilib --enable-multiarch --with-long-double-128 --enable-checking=release --build=powerpc64le-linux-gnu --host=powerpc64le-linux-gnu --target=powerpc64le-linux-gnu
Thread model: posix
gcc version 6.3.0 20170124 (Debian 6.3.0-5) 

ld -v:
GNU ld (GNU Binutils for Debian) 2.27.90.20170124

Both system running with Linux 4.8.6-300.el7.centos.ppc64le. But a second the confirmed that it also happens on 3.11 which originally shipped with CentOS. 


The attached tar-ball contains the minimal-not-working example on my systems. Using make it compiles LAPACK-3.6.1 and the example. By "make run" it executes the example and "make gdb" starts the example in gdb.
Comment 1 grisu 2017-02-08 16:29:30 UTC
One additional note. The dgetrf2 function in LAPACK is a recursive function.
Comment 2 grisu 2017-02-08 16:34:05 UTC
One further bug in the descriptions. The correct address in frame #0 is 0x3fffb77212d8 which points to the dlaswp function inside LAPACK.
Comment 3 Rajalakshmi 2017-02-09 04:43:09 UTC
Able to run the test after setting LD_PRELOAD.
Comment 4 grisu 2017-02-09 08:17:39 UTC
The LD_PRELOAD mechanism is exacly what I want to avoid because I want to be able to exchange the pointers and the dlopen handle to switch LAPACK at runtime. Furthermore, on x86 and x86-64 it works without any problems.
Comment 5 Florian Weimer 2017-02-09 11:03:16 UTC
Thanks for mentioning the recursion, it was helpful for identifying the root cause.

This is a GCC code generation issue, affecting the POWER ELFv2 ABI:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79439
Comment 6 grisu 2017-02-09 11:25:57 UTC
It seems that you are right. I compiled the same code again with the PGI compiler and it works as expected. Than lets wait what happens on the gcc side and thank you for you even shorter example.