This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Allocate correct PLT reloc cache size during profiling and auditing.

During auditing or profiling modes the dynamic loader
builds a cache of the relocated PLT entries in order
to reuse them when called again through the same PLT
entry. This way the PLT entry is never completed and
the call into the resolver always results in profiling
or auditing code running.

The problem is that the PLT relocation cache size
is not computed correctly. The size of the cache
should be "Size of a relocation result structure"
x "Number of PLT-related relocations". Instead the
code erroneously computes "Size of a relocation
result" x "Number of bytes worth of PLT-related
relocations". I can only assume this was a mistake
in the understanding of the value of DT_PLTRELSZ
which is the number of bytes of PLT-related relocs.
We do have a DT_RELACOUNT entry, which is a count
for dynamic relative relocs, but we have no
DT_PLTRELCOUNT and thus we need to compute it.

For example, while loading a 64-bit which
should have only a handful of PLT entries i.e.

 0x0000000000000002 (PLTRELSZ)           288 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x314481f370
 0x0000000000000007 (RELA)               0x3144817b28
 0x0000000000000008 (RELASZ)             30792 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
Relocation section '.rela.plt' at offset 0x1f370 contains 12 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000003144bb8028  000001e900000007 R_X86_64_JUMP_SLOT     0000003144880410 realloc + 0
0000003144bb8030  0000049600000007 R_X86_64_JUMP_SLOT     000000314487ff10 malloc + 0
0000003144bb8038  0000000600000007 R_X86_64_JUMP_SLOT     0000000000000000 __tls_get_addr + 0
0000003144bb8040  000001bf00000007 R_X86_64_JUMP_SLOT     0000003144880760 memalign + 0
0000003144bb8050  0000000800000007 R_X86_64_JUMP_SLOT     0000000000000000 _dl_find_dso_for_object + 0
0000003144bb8058  000002f700000007 R_X86_64_JUMP_SLOT     00000031448808e0 calloc + 0
0000003144bb8060  000008a300000007 R_X86_64_JUMP_SLOT     0000003144880330 free + 0
0000003144bb8070  0000000000000025 R_X86_64_IRELATIVE                        3144889b90
0000003144bb8068  0000000000000025 R_X86_64_IRELATIVE                        31448ad120
0000003144bb8048  0000000000000025 R_X86_64_IRELATIVE                        314488be80
0000003144bb8020  0000000000000025 R_X86_64_IRELATIVE                        31448ad170
0000003144bb8018  0000000000000025 R_X86_64_IRELATIVE                        3144889360

Note: The IRELATIVE's are for IFUNCs, but we can't
distinguish them without iterating the list of relocs
so we just count them.

The allocation for the PLT table which should be:
sizeof(l_reloc_result) * 12 = 32 * 12 = 384 bytes.

It is actually computed as:
* <DT_PLTRELSZ or sizeof PLT relocations in bytes>
= 32 * 288
= 9216 bytes.


#0  calloc (nmemb=nmemb@entry=32, size=288) at dl-minimal.c:102
#1  0x000000314400c6a0 in _dl_relocate_object (scope=0x7ffff7ffce80, reloc_mode=1, consider_profiling=<optimized out>, consider_profiling@entry=1)
    at dl-reloc.c:259
#2  0x000000314400389a in dl_main (phdr=<optimized out>, phdr@entry=0x400040, phnum=<optimized out>, phnum@entry=9, 
    user_entry=user_entry@entry=0x7fffffffdec8, auxv=<optimized out>) at rtld.c:2196
#3  0x00000031440160c5 in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffffffdfb0, dl_main=dl_main@entry=0x3144001830 <dl_main>)
    at ../elf/dl-sysdep.c:249
#4  0x0000003144004b54 in _dl_start_final (arg=0x7fffffffdfb0) at rtld.c:329
#5  _dl_start (arg=0x7fffffffdfb0) at rtld.c:555
#6  0x00000031440011f8 in _start () from /lib64/
#7  0x0000000000000001 in ?? ()
#8  0x00007fffffffe2f2 in ?? ()
#9  0x0000000000000000 in ?? ()

This is 24x the memory that we need, which is right
given that a reloc is 24 bytes and we allocate one
reloc_result cache entry per byte. This isn't a huge
memory burden, but on large applictions with tons of
DSOs the burden starts to add up.

With a fixed glibc you see:

Breakpoint 2, calloc (nmemb=nmemb@entry=32, size=12) at dl-minimal.c:102
102	{

Which is the expected usage.

The bug was reported by Matthew LeGendre from LLNL.
Matt sent me an initial patch and I tweaked it up.

Tested on x86_64 with no regressions.

For all the obvious reasons it is very difficult to write
a regression test for this. We just don't have the framework
to write a test like for this without creating the framework

I'll commit this on Friday if nobody objects or sees something wrong.

2014-09-17  Carlos O'Donell  <>
            Matthew LeGendre  <>

	[BZ #17411]
	* elf/dl-reloc.c (_dl_relocate_object): Allocate correct
	amount for l_reloc_result.
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index d2c6dac..cc8c8eb 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -270,6 +270,8 @@ _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[],
 	   relocations.  If the shared object lacks a PLT (for example
 	   if it only contains lead function) the l_info[DT_PLTRELSZ]
 	   will be NULL.  */
+	size_t sizeofrel;
+	size_t relcount;
 	if (l->l_info[DT_PLTRELSZ] == NULL)
 	    errstring = N_("%s: no PLTREL found in object %s\n");
@@ -279,8 +281,12 @@ _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[],
-	l->l_reloc_result = calloc (sizeof (l->l_reloc_result[0]),
-				    l->l_info[DT_PLTRELSZ]->d_un.d_val);
+	sizeofrel = l->l_info[DT_PLTREL]->d_un.d_val == DT_RELA
+		    ? sizeof(ElfW(Rela))
+		    : sizeof(ElfW(Rel));
+	relcount = l->l_info[DT_PLTRELSZ]->d_un.d_val / sizeofrel;
+	l->l_reloc_result = calloc (sizeof (l->l_reloc_result[0]), relcount);
 	if (l->l_reloc_result == NULL)
 	    errstring = N_("\


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]