Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
Sriraman Tallam
tmsriram@google.com
Sun Jan 1 00:00:00 GMT 2017
On Tue, Apr 25, 2017 at 11:02 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 10:12 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> We identified a problem with PIE executables, more than 5% code size
>> bloat compared to non-PIE and we have a few proposals to reduce the
>> bloat. Please take a look and let us know what you think.
>>
>> * What is the problem?
>>
>> PIE is a security hardening feature that enables ASLR (Address Space
>> Layout Randomization) and enables the executable to be loaded at a
>> random virtual address upon every execution instance. On an average, a
>> binary when built as PIE is larger by 5% to 9%, as measured on a suite
>> of benchmarks used at Google where the average text size is ~100MB,
>> when compared to the one built without PIE. This is also independent
>> of the target architecture and we found this to be true for x86_64,
>> arm64 and power. We noticed that the primary reason for this code
>> size bloat is due to the extra dynamic relocations that are generated
>> in order to make the binary position independent. This proposal
>> introduces new ways to represent these dynamic relocations that can
>> reduce the code size bloat to just a few percent.
>>
>> As an example, to show the bloat in code size, here is the data from
>> one of our larger binaries,
>>
>> Without PIE, the binary’s code size in bytes is this as displayed by
>> the ‘size’ command:
>>
>> text data bss dec
>> 504663285 16242884 9130248 530036417
>>
>> With PIE, the binary’s code size in bytes is this as displayed by the
>> ‘size’ command:
>>
>> text data bss dec
>> 539781977 16242900 9130248 565155125
>>
>> The text size of the binary grew by 7% and the total size by 6.6%.
>> Our experiments have shown that the binary sizes grow anywhere from 5%
>> to 9% with PIE on almost all benchmarks we looked at. Notice that
>> almost all the code bloat comes from the “text” segment of the binary,
>> which contains the executable code of the application and any
>> read-only data. We looked into this segment to see why this is
>> happening and found that the size of the section that contains the
>> dynamic relocations for a binary explodes with PIE. For instance,
>> without PIE, for the above binary the dynamic relocation section
>> contains 46 entries whereas with PIE, the same section contains
>> 1463325 entries. It takes 24 bytes to store one entry, that is 3
>> integer values each of size 8 bytes. So, the dynamic relocations
>> alone need an extra space of (1463325 - 46) * 8 bytes which is 35
>> million bytes which is almost all the bloat incurred!.
>>
>> * What are these extra dynamic relocations that are created for PIE executables?
>>
>> We noticed that these extra relocations for PIE binaries have a common
>> pattern and are needed for the reason that it is not known until
>> run-time where the binary will be loaded. All of these extra dynamic
>> relocations are of the ELF type R_X86_64_RELATIVE. Let us show using
>> an example what these relocations do.
>> Let us take an example of a program that stores the address of a global:
>>
>> #include <stdio.h>
>>
>> const int a = 10;
>>
>> const int *b = &a;
>>
>> int main() {
>>
>> printf (“b = %p\n”, b);
>>
>> }
>>
>> First, let us look at the binary built without PIE. Let’s look at the
>> data section where ‘b’ and ‘a’ are allocated.
>>
>> 00000000004007d0 <a>:
>> 4007d0: 0a 00
>>
>>
>> 0000000000401b10 <b>:
>> 401b10: d0 07
>> 401b12: 40 00 00
>>
>> Variable ‘a’ is allocated at address 0x4007d0 which matches the output
>> when running the binary. ‘b’ is allocated at address 0x401b10 and its
>> contents in little-endian byte order is the address of ‘a’.
>>
>> Now, lets us examine the contents of the PIE binary:
>>
>> 00000000000008d8 <a>:
>> 8d8: 0a 00
>>
>> 0000000000001c50 <b>:
>> 1c50: d8 08
>> 1c50: R_X86_64_RELATIVE *ABS*+0x8d8
>> 1c52: 00 00
>> 1c54: 00 00
>>
>>
>> Notice there is a dynamic relocation here which tells the dynamic
>> linker that this value needs to be fixed at run-time. This is needed
>> because ASLR can load this binary anywhere in the address space and
>> this relocation fixes the address after it is loaded.
>>
>>
>> * More details about R_X86_64_RELATIVE relocations
>>
>> This relocation is worth 24 bytes and has three fields
>>
>> Offset
>>
>> Type - here it is R_X86_64_RELATIVE
>>
>> Addend (what extra value needs to be added)
>>
>> The offset field of this relocation is the address offset from the
>> start where this relocation applies. The type field indicates the
>> type of the dynamic relocation but we are interested in particularly
>> one type of dynamic relocation, R_X86_64_RELATIVE. This is important
>> because in the motivating example that we presented above, all the
>> extra dynamic relocations were of this type!
>>
>>
>> * We have these proposals to reduce the size of the dynamic relocations section:
>>
>
> There are 3 pieces of run-time relocation information:
>
> 1. Type and symbol. 4 or 8 bytes
> 2. Offset. 4 or 8 bytes
> 3. Addend. 4 or 8 bytes
>
> If we use REL instead of RELA, addend can be implicit and stored in-place.
> If we limit the type to relative relocation, we only need offset.
> This is for PIC,
> not just for PIE. An we can use special encoding scheme for offset table,
> which can be placed in DT_GNU_RELATIVE_REL with
> DT_GNU_RELATIVE_RELSZ.
I have not done an intrusive change like this before, so I am
wondering what are the various tools/pieces that need to be modified.
Pointers to how to go about this would be really helpful. I can think
of these:
* Linker - gold, lld, gnuld
* Dynamic Linker
* readelf
* objdump
* ABI changes - what is involved here?
Thanks
Sri
>
> --
> H.J.
More information about the Gnu-gabi
mailing list