vtrelocs: large/modular C++ app speedup ...
Michael Meeks
michael.meeks@novell.com
Wed Apr 2 15:06:00 GMT 2008
Hi guys,
I spent a little time recently researching ways to reduce the number of
unique named relocations that must be processed at dlopen time for large
C++ libraries[1]. Apologies for spamming all 3 lists like this, but it
touches all 3 projects.
Since almost all function relocations of this type are inside vtables,
I implemented a new way of relocating vtables. This is a new
'.suse.vtrelocs' section.
As we inherit a class across a shared library boundary we construct new
vtables that are often extremely similar to their parents. However -
this similarity is not exposed - instead we fill the new vtable with
many unique named relocations, one per method. This generates lots
of .rel entries, and emits lots of external symbols; worse these symbols
tend to be duplicated across ~all libraries deriving from the base
class.
Instead a vtreloc sections contains (a sorted):
struct {
void **src, **dest;
int copy_slot_bitmask;
} vtreloc_entries[] = { ... }
The run-time cost of processing these is insignificant in comparison to
the cost of processing the remaining relocations, giving a pleasant
speed win.
A brief slide-deck with the results of my research is here:
http://www.gnome.org/~michael/vtrelocs-gcc.pdf
and has a comparison against the current state of the art wrt. reducing
relocations: -Bsymbolic-functions [ in itself a substantial
optimisation ].
The 3 prototype patches for discussion are attached. There are a number
of trivial hacks in there (of course) - eg. environment variables to
turn the feature on, leaving an empty .vtrelocs section in object files
etc.
The more interesting problems are:
* glibc - the memory protection semantics need adjusting - since
we need to fixup relocations in 'init' order: shouldn't be
impossibly hard to fix but I just turn off protection ;-)
+ subsequent dlopens can (I think) avoid touching
already relocated libraries they don't own avoiding
this sort of problem.
* gcc - the code to generate the vtreloc sections is <cough>
written for comfort not speed. This is a fall-back from having
initially tried to integrate the work into
build_vtbl_initializer & friends with some success, but rather
a tangling of the code.
* vtreloc section design - the section should be readonly, and
prolly refer by offset to .bss relocations that can be re-used
for implementing indirect calls via. parent vtable to virtual
functions. That should save relocs, but make each entry
slightly larger.
Of course, apart from the run-time speed wins, some of the nicest
potential size wins come from breaking the ABI[2] & depending on the
vtrelocs to fixup vtables: eg. hiding all thunks (implemented), or
potentially hiding all virtual function symbols & invoking them via
their parent vtable (not implemented).
Wrt. testing, I can build & run an OO.o built with this - clearly not a
unit-test ;-) but perhaps helpful.
Feedback much appreciated,
Thanks,
Michael.
[1] - specifically OpenOffice.org ;-)
[2] - which while bad, can be done in isolated islands like OO.o.
--
michael.meeks@novell.com <><, Pseudo Engineer, itinerant idiot
-------------- next part --------------
A non-text attachment was scrubbed...
Name: suse-vtrelocs-binutils.diff
Type: text/x-patch
Size: 4095 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20080402/ab90cf60/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: suse-vtrelocs-gcc.diff
Type: text/x-patch
Size: 27550 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20080402/ab90cf60/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: suse-vtrelocs-glibc.diff
Type: text/x-patch
Size: 9918 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20080402/ab90cf60/attachment-0002.bin>
More information about the Libc-alpha
mailing list