This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH] Allocate correct PLT reloc cache size during profiling and auditing.
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>, Matthew LeGendre <legendre1 at llnl dot gov>, Ben Woodard <woodard at redhat dot com>
- Date: Mon, 22 Sep 2014 14:27:49 -0400
- Subject: [PATCH] Allocate correct PLT reloc cache size during profiling and auditing.
- Authentication-results: sourceware.org; auth=none
During auditing or profiling modes the dynamic loader
builds a cache of the relocated PLT entries in order
to reuse them when called again through the same PLT
entry. This way the PLT entry is never completed and
the call into the resolver always results in profiling
or auditing code running.
The problem is that the PLT relocation cache size
is not computed correctly. The size of the cache
should be "Size of a relocation result structure"
x "Number of PLT-related relocations". Instead the
code erroneously computes "Size of a relocation
result" x "Number of bytes worth of PLT-related
relocations". I can only assume this was a mistake
in the understanding of the value of DT_PLTRELSZ
which is the number of bytes of PLT-related relocs.
We do have a DT_RELACOUNT entry, which is a count
for dynamic relative relocs, but we have no
DT_PLTRELCOUNT and thus we need to compute it.
For example, while loading a 64-bit libc.so.6 which
should have only a handful of PLT entries i.e.
...
0x0000000000000002 (PLTRELSZ) 288 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x314481f370
0x0000000000000007 (RELA) 0x3144817b28
0x0000000000000008 (RELASZ) 30792 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
...
Relocation section '.rela.plt' at offset 0x1f370 contains 12 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000003144bb8028 000001e900000007 R_X86_64_JUMP_SLOT 0000003144880410 realloc + 0
0000003144bb8030 0000049600000007 R_X86_64_JUMP_SLOT 000000314487ff10 malloc + 0
0000003144bb8038 0000000600000007 R_X86_64_JUMP_SLOT 0000000000000000 __tls_get_addr + 0
0000003144bb8040 000001bf00000007 R_X86_64_JUMP_SLOT 0000003144880760 memalign + 0
0000003144bb8050 0000000800000007 R_X86_64_JUMP_SLOT 0000000000000000 _dl_find_dso_for_object + 0
0000003144bb8058 000002f700000007 R_X86_64_JUMP_SLOT 00000031448808e0 calloc + 0
0000003144bb8060 000008a300000007 R_X86_64_JUMP_SLOT 0000003144880330 free + 0
0000003144bb8070 0000000000000025 R_X86_64_IRELATIVE 3144889b90
0000003144bb8068 0000000000000025 R_X86_64_IRELATIVE 31448ad120
0000003144bb8048 0000000000000025 R_X86_64_IRELATIVE 314488be80
0000003144bb8020 0000000000000025 R_X86_64_IRELATIVE 31448ad170
0000003144bb8018 0000000000000025 R_X86_64_IRELATIVE 3144889360
Note: The IRELATIVE's are for IFUNCs, but we can't
distinguish them without iterating the list of relocs
so we just count them.
The allocation for the PLT table which should be:
sizeof(l_reloc_result) * 12 = 32 * 12 = 384 bytes.
It is actually computed as:
sizeof(l_reloc_result)
* <DT_PLTRELSZ or sizeof PLT relocations in bytes>
= 32 * 288
= 9216 bytes.
e.g.
#0 calloc (nmemb=nmemb@entry=32, size=288) at dl-minimal.c:102
#1 0x000000314400c6a0 in _dl_relocate_object (scope=0x7ffff7ffce80, reloc_mode=1, consider_profiling=<optimized out>, consider_profiling@entry=1)
at dl-reloc.c:259
#2 0x000000314400389a in dl_main (phdr=<optimized out>, phdr@entry=0x400040, phnum=<optimized out>, phnum@entry=9,
user_entry=user_entry@entry=0x7fffffffdec8, auxv=<optimized out>) at rtld.c:2196
#3 0x00000031440160c5 in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffffffdfb0, dl_main=dl_main@entry=0x3144001830 <dl_main>)
at ../elf/dl-sysdep.c:249
#4 0x0000003144004b54 in _dl_start_final (arg=0x7fffffffdfb0) at rtld.c:329
#5 _dl_start (arg=0x7fffffffdfb0) at rtld.c:555
#6 0x00000031440011f8 in _start () from /lib64/ld-linux-x86-64.so.2
#7 0x0000000000000001 in ?? ()
#8 0x00007fffffffe2f2 in ?? ()
#9 0x0000000000000000 in ?? ()
This is 24x the memory that we need, which is right
given that a reloc is 24 bytes and we allocate one
reloc_result cache entry per byte. This isn't a huge
memory burden, but on large applictions with tons of
DSOs the burden starts to add up.
With a fixed glibc you see:
Breakpoint 2, calloc (nmemb=nmemb@entry=32, size=12) at dl-minimal.c:102
102 {
(gdb)
Which is the expected usage.
The bug was reported by Matthew LeGendre from LLNL.
Matt sent me an initial patch and I tweaked it up.
Tested on x86_64 with no regressions.
For all the obvious reasons it is very difficult to write
a regression test for this. We just don't have the framework
to write a test like for this without creating the framework
first.
I'll commit this on Friday if nobody objects or sees something wrong.
2014-09-17 Carlos O'Donell <carlos@redhat.com>
Matthew LeGendre <legendre1@llnl.gov>
[BZ #17411]
* elf/dl-reloc.c (_dl_relocate_object): Allocate correct
amount for l_reloc_result.
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index d2c6dac..cc8c8eb 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -270,6 +270,8 @@ _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[],
relocations. If the shared object lacks a PLT (for example
if it only contains lead function) the l_info[DT_PLTRELSZ]
will be NULL. */
+ size_t sizeofrel;
+ size_t relcount;
if (l->l_info[DT_PLTRELSZ] == NULL)
{
errstring = N_("%s: no PLTREL found in object %s\n");
@@ -279,8 +281,12 @@ _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[],
l->l_name);
}
- l->l_reloc_result = calloc (sizeof (l->l_reloc_result[0]),
- l->l_info[DT_PLTRELSZ]->d_un.d_val);
+ sizeofrel = l->l_info[DT_PLTREL]->d_un.d_val == DT_RELA
+ ? sizeof(ElfW(Rela))
+ : sizeof(ElfW(Rel));
+ relcount = l->l_info[DT_PLTRELSZ]->d_un.d_val / sizeofrel;
+ l->l_reloc_result = calloc (sizeof (l->l_reloc_result[0]), relcount);
+
if (l->l_reloc_result == NULL)
{
errstring = N_("\
---
Cheers,
Carlos.