This is the mail archive of the
binutils@sources.redhat.com
mailing list for the binutils project.
Re: Incompatibility between GNU-ld and SUN's ld.so.1
- From: Geoff Keating <geoffk at geoffk dot org>
- To: "Christian Ehrhardt" <ehrhardt at mathematik dot uni-ulm dot de>
- Cc: binutils at sources dot redhat dot com
- Date: 24 Sep 2002 12:21:49 -0700
- Subject: Re: Incompatibility between GNU-ld and SUN's ld.so.1
- References: <20020924162651.26384.qmail@thales.mathematik.uni-ulm.de>
"Christian Ehrhardt" <ehrhardt@mathematik.uni-ulm.de> writes:
> Hi,
>
> first: I'd appreciate to be CC'ed on replies but I'll try to follow
> the thread in the archives.
>
> [ I still think this is a Problem of Suns ld.so.1 and I have an open
> CALL with Sun. However, as this problem is triggered by libstdc++
> and libgcc_s and the Sun behaviour dates back to Solaris 7 (or even
> earlier) it would be helpful if GNU-ld could work around this problem.
> ]
>
> Here's the relevant part of my report sent to SUN (I guess you'd
> prefere to use Makefile instead of Makefile.sun. However, note that
> using -nostdlib will cause a different crash due to a missing exit()):
>
> ----------------- cut here --------------------------------------------
>
> SUMMARY DESCRIPTION: ld.so.1 fails to relocate certain shared libraries
>
> DETAILED DESCRIPTION:
>
> The dynamic runtime linker fails to relocate valid shared libraries
> generated by recent versions of GNU-ld. /usr/local/bin/ld is from
> the GNU binutils-2.13 package:
>
> turing$ /usr/local/bin/ld -v
> GNU ld version 2.13
>
> How to reproduce:
>
> Script started on Fri Sep 20 19:46:43 2002
> turing$ cat t2.c
> struct object {
> int i;
> int j;
> int k;
> int l;
> };
>
>
>
> int func ()
> {
> static struct object x;
> struct object * p;
> p = &x;
> p->i = 3;
> return 0;
> }
>
> turing$ cat t3.c
> extern int func();
>
> int main ()
> {
> func();
> return 0;
> }
> turing$ cat Makefile.sun
> .PHONY: clean
> all: a.out
> t2.o: t2.c
> CC -c -KPIC t2.c
> libt2.so: t2.o
> /usr/local/bin/ld -G t2.o -olibt2.so
> t3.o: t3.c
> CC -c t3.c
> a.out: libt2.so t3.o
> CC -lt2 t3.o -L. -R.
> clean:
> rm -f *.so *.o a.out
>
> turing$ cat Makefile
> .PHONY: clean
> all: a.out
> t2.o: t2.c
> gcc -c -fPIC t2.c
> libt2.so: t2.o
> /usr/local/bin/ld -nostdlib -shared -olibt2.so t2.o
> a.out: libt2.so t3.c
> gcc -nostdlib t3.c libt2.so -L. -R.
> clean:
> rm -f *.so *.o a.out core
>
> turing$ make -f Makefile.sun clean
> rm -f *.so *.o a.out
> turing$ make -f Makefile.sun
> CC -c -KPIC t2.c
> /usr/local/bin/ld -G t2.o -olibt2.so
> CC -c t3.c
> CC -lt2 t3.o -L. -R.
> turing$ a.out
> Segmentation Fault (core dumped)
> turing$ exit
>
> script done on Fri Sep 20 19:47:32 2002
>
> Note that I compiled everything with /opt/SUNWspro/bin/CC to
> rule out bugs in gcc. This problem can be reproduced using
> the second Makefile and gcc with an even smaller resulting
> executable.
>
>
> Analyzing the core shows the following:
> turing$ pmap core | grep libt2.so
> FF370000 8K read/exec libt2.so
> FF380000 8K read/write/exec libt2.so
>
> Script started on Fri Sep 20 19:53:10 2002
> turing$ gdb a.out core
> GNU gdb 5.0
> [ ... ]
> #0 0xff370318 in __1cEfunc6F_i_ ()
> from /home/thales/ehrhardt/ld.so.1-bug/./libt2.so
> (gdb) disass
> Dump of assembler code for function __1cEfunc6F_i_:
> 0xff3702e0 <__1cEfunc6F_i_>: save %sp, -112, %sp
> 0xff3702e4 <__1cEfunc6F_i_+4>: call 0xff3702ec <__1cEfunc6F_i_+12>
> 0xff3702e8 <__1cEfunc6F_i_+8>: sethi %hi(0), %o1
> 0xff3702ec <__1cEfunc6F_i_+12>: mov %o1, %o1 ! 0x0
> 0xff3702f0 <__1cEfunc6F_i_+16>: add %o7, %o1, %o1
> 0xff3702f4 <__1cEfunc6F_i_+20>: st %o1, [ %fp + -12 ]
> 0xff3702f8 <__1cEfunc6F_i_+24>: sethi %hi(0x10000), %o0
> 0xff3702fc <__1cEfunc6F_i_+28>: or %o0, 0xc4, %o0 ! 0x100c4
> 0xff370300 <__1cEfunc6F_i_+32>: add %o1, %o0, %l7
> 0xff370304 <__1cEfunc6F_i_+36>: sethi %hi(0), %g1
> 0xff370308 <__1cEfunc6F_i_+40>: or %g1, 4, %g1 ! 0x4
> 0xff37030c <__1cEfunc6F_i_+44>: ld [ %l7 + %g1 ], %o0
> 0xff370310 <__1cEfunc6F_i_+48>: st %o0, [ %fp + -8 ]
> 0xff370314 <__1cEfunc6F_i_+52>: mov 3, %o1
> 0xff370318 <__1cEfunc6F_i_+56>: st %o1, [ %o0 ]
> 0xff37031c <__1cEfunc6F_i_+60>: clr [ %fp + -4 ]
> 0xff370320 <__1cEfunc6F_i_+64>: mov %g0, %i0
> 0xff370324 <__1cEfunc6F_i_+68>: ret
> 0xff370328 <__1cEfunc6F_i_+72>: restore
> 0xff37032c <__1cEfunc6F_i_+76>: mov %g0, %i0
> 0xff370330 <__1cEfunc6F_i_+80>: ret
> 0xff370334 <__1cEfunc6F_i_+84>: restore
> ---Type <return> to continue, or q <return> to quit---
> End of assembler dump.
> (gdb) bt
> #0 0xff370318 in __1cEfunc6F_i_ ()
> from /home/thales/ehrhardt/ld.so.1-bug/./libt2.so
> #1 0x10884 in main ()
> (gdb) info reg o0
> o0 0xff370000 -13172736
> (gdb) info reg o1
> o1 0x3 3
> (gdb) info reg l7
> l7 0xff3803a8 -13106264
> (gdb) info reg g1
> g1 0x4 4
> (gdb) turing$ exit
>
> script done on Fri Sep 20 19:54:46 2002
>
> Looking back at function func from t2.c shows:
> int func ()
> {
> static struct object x;
> struct object * p;
> p = &x;
> p->i = 3; <====== crash is here.
> return 0;
> }
>
> The value of the pointer p is obviously in register o0, i.e. it is
> 0xff370000. This is precisely the BASE address where the shared library
> libt2.so has been mapped to. Register l7 contains the base address of
> the .got section (the global offset table of this library). The
> questionable address is loaded from offset 4 in the global offset table.
>
> Looking at the contents of the global offset table in the shared
> library shows the following:
>
> turing$ elfdump -G libt2.so
>
> Global Offset Table: 2 entries
> ndx addr value reloc addend symbol
> [00000] 000103a8 00010338 R_SPARC_NONE 00000000
> [00001] 000103ac 000103b0 R_SPARC_RELATIVE 00000000
> turing$
>
> Note that we have indeed
> %l7(0xff3803a8) = Offset of .got(0x000103a8) + library base address(0xFF370000)
>
> The Solaris Linker and Libraries Guide (freshly downloaded from
> docs.sun.com) has this explanation for R_SPARC_RELATIVE:
>
> |Some relocation types have semantics beyond simple calculation:
> |[ ... ]
> |R_SPARC_RELATIVE
> | Created by the link-editor for dynamic objects. Its offset member
> | gives the location within a shared object that contains a value
> | representing a relative address. The runtime linker computes the
> | corresponding virtual address by adding the virtual address at which
> | the shared object is loaded to the relative address. Relocation
> | entries for this type must specify 0 for the symbol table index.
>
> This means that the value at offset 0x4 in the global offset
> Table should be
> library base address + Value in .got
> 0xFF370000 + 0x000103B0 = 0xFF3803B0
> after relocation. However looking at the value of register o0 we
> see that the .got section obviously contains the value 0xFF37B000
> instead.
>
> ----------------- cut here --------------------------------------------
>
> The basic problem is the interpretation of the meaning of
> R_SPARC_RELATIVE. Recall the explanation from above:
>
> [ The same document also states that the calculation performed by
> R_SPARC_RELATIVE is B+A (see Terminologie below). IMHO this is
> overruled by the first sentence quoted below.
> ]
>
> |Some relocation types have semantics beyond simple calculation:
> |[ ... ]
> |R_SPARC_RELATIVE
> | Created by the link-editor for dynamic objects. Its offset member
> | gives the location within a shared object that contains a value
> | representing a relative address. The runtime linker computes the
> | corresponding virtual address by adding the virtual address at which
> | the shared object is loaded to the relative address. Relocation
> | entries for this type must specify 0 for the symbol table index.
>
>
> This explanation is obviously derived from the SHT_REL case where
> the ``relative address'' explained above and the implicit addend
> are the same.
>
> Terminologie:
> * B is the baseaddress where the library is loaded
> * A is the EXPLICIT addend
> * V is the value stored in the shared library where an implicit addend
> would reside (IMHO this is what ``relative address'' above describes).
>
> The SUN-Linker used to always calculate V + B + A for R_SPARC_RELATIVE
> relocations, however, starting with Solaris 7 and the advent of
> DT_RELACOUNT it calculates only B+A (ignoring V completly) iff
> DT_RELACOUNT is actually supplied and explicit addends are used.
>
> ld could work around this by always storing the relative address in
> the addend and setting V to 0 if explicit addends are used. This is
> what SUN's linker has done for quite some time.
Beware! IIRC, ld.so on Solaris used (perhaps in 2.5.1?) to always
compute V+B, ignoring A, for R_SPARC_RELATIVE, exactly as the doc
above describes. To be backwards compatible, it might be necessary to
suppress output of DT_RELACOUNT instead if being built for an older
Solaris system.
--
- Geoffrey Keating <geoffk@geoffk.org>