ppc64: Call to gettimeofday fails with segfault in __glink_PLTresolve because .plt0 is all zeros.
Carlos O'Donell
carlos@redhat.com
Tue Nov 5 05:56:00 GMT 2013
Alan, Adhemerval,
I've been scratching my head at a problem all day and I'm stuck.
I was hoping that either of you might have an insight into the
problem.
I have a shared library, librpmio.so.1, and that library has a
function named rpmswNow which calls gettimeofday(), but after
upgrading to a newer glibc with an IFUNC-based gettimeofday()
that resolves to a VDSO symbol the call to gettimeofday()
results in a segfault.
The failure looks like this and is 100% reproducible:
in librpmio.so.1:
00000000000292b0 <.rpmswNow>:
...
# Call PLT stub for gettimeofday
29318: 4b fe 8a e5 bl 11dfc <._init+0xf3c>
...
0000000000010f00 <.argiCount-0x2ad0>:
...
# PLT call stub (note that build_plt_stub() in bfd doesn't say much about
# why the stub does what it does so I my analysis might be wrong)...
11dfc: f8 41 00 28 std r2,40(r1)
11e00: e9 62 99 a8 ld r11,-26200(r2)
11e04: 7d 69 03 a6 mtctr r11
# At this point r11 points at the address of the VDSO __kernel_gettimeofday
11e08: e8 42 99 b0 ld r2,-26192(r2)
# At this point r2 is zero (Why?)
11e0c: 28 22 00 00 cmpldi r2,0
# Should be "bnectr+" but my local objdump doesn't seem to know it, though gdb does.
11e10: 4c e2 04 20 .long 0x4ce20420
# And because r2 is zero the plt stub does not jump to r11 but instead
# calls the plt entry for gettimeofday.
11e14: 48 02 0e 3c b 32c50 <gettimeofday@plt>
# PLT entry jumps to the glink call stub:
Dump of assembler code for function gettimeofday@plt:
0x00000fffb1292c50 <+0>: li r0,264
0x00000fffb1292c54 <+4>: b 0xfffb12923d8 <__glink_PLTresolve>
# Enter .glink0 with index 264 in r0.
Dump of assembler code for function __glink_PLTresolve:
0x00000fffb12923d8 <+0>: mflr r12
0x00000fffb12923dc <+4>: bcl 20,4*cr7+so,0xfffb12923e0 <__glink_PLTresolve+8>
0x00000fffb12923e0 <+8>: mflr r11
0x00000fffb12923e4 <+12>: ld r2,-16(r11)
0x00000fffb12923e8 <+16>: mtlr r12
0x00000fffb12923ec <+20>: add r12,r2,r11
fffb12a0000-fffb12b0000 rw-p 00040000 fd:00 2887512 /usr/lib64/librpmio.so.1.0.0
(gdb) x/9g $r12 - 24
0xfffb12a3e88: 0x00000fffb12a0e60 0x00000fffb12aa358
0xfffb12a3e98: 0x00000fffb12aa470 0x0000000000000000
0xfffb12a3ea8: 0x0000000000000000 0x0000000000000000
0xfffb12a3eb8: 0x00000fffb10e90d0 0x00000fffb12577d8
0xfffb12a3ec8: 0x00000fffb10e9100
# The .plt0 entry is all zeros for the ip, toc, and aux pointer.
0x00000fffb12923f0 <+24>: ld r11,0(r12)
# So r11 is zero.
0x00000fffb12923f4 <+28>: ld r2,8(r12)
# So r2 is zero.
0x00000fffb12923f8 <+32>: mtctr r11
# And this is a segfault.
0x00000fffb12923fc <+36>: ld r11,16(r12)
0x00000fffb1292400 <+40>: bctr
0x00000fffb1292404 <+44>: nop
0x00000fffb1292408 <+48>: nop
0x00000fffb129240c <+52>: nop
Could it be that elf_machine_runtime_setup (dl-machine.h) never setup
.plot0 because lazy resolution was not requested because the entire
DSO used OPDs?
No other function would have ever called __glink_PLTresolve because
they will all go through their OPDs, but in thise case the VDSO fails
the PLT stub check for a non-zero toc and the stub attempts a lazy
resolution when the dynamic loader never prepared .plt0 for it.
Does any of that make sense?
All I can say is that I'm seeing a clear failure here and it appears to
have to do with an interaction of a DSO, an IFUNC with a VDSO symbol,
and the dynamic loader never setting up .plt0 because it didn't need it.
I can confirm that under LD_DEBUG=all I don't see "(lazy)" being printed
which means the dynamic loader is not processing relocations lazily so
elf_machine_runtime_setup is called with lazy==0. So that serves to
confirm my suspicion.
I'm putting together a small test case, but I wanted to send this out
before I spent any more time. I wanted to get a feel from either of you
if you've seen something like this before.
Cheers,
Carlos.
More information about the Libc-alpha
mailing list