How to correctly intercept file system calls in the glibc dynamic linker (i.e. implement externally linked functions with access to rtld internals)

Aron Ahmadia aron@ahmadia.net
Fri Oct 7 18:46:00 GMT 2011


Dear glibc help list,

First, a quick introduction (technical question in the following paragraph):
I am a computational scientist working at KAUST and I am working with
Jed Brown at Argonne National Laboratories.  Both of our sites host
IBM BlueGene/P (BG/P) supercomputers, which run a fairly vanilla
glibc-2.4 library as part of the compute node operating system (Open
Toolchain Runtime in this figure
http://wiki.bg.anl-external.org/images/a/a6/Software_stack.png).
glibc is very important to us!  One of the many responsibilities that
glibc provides on the BG/P is dynamic loading, which enables us to run
interpreted dynamic programs such as Python at scale on tens of
thousands of cores simultaneously.  Unfortunately, the design
constraints of the BG/P and other large supercomputers does not allow
us to map a storage device to each individual compute node, so these
dynamic loads must go through a distributed file system.  Our
experiences so far have shown us that simply loading the scientific
libraries needed for an advanced simulation that requires less than a
minute on a single core can take as long as 4 hours on 65,536.  We
have shown that if the file system accesses can instead be preloaded
on a single core and broadcast to the remaining cores over the
supercomputer's high-speed network, we can almost completely eliminate
this "loading penalty".  Our requirements are simple, if we can
intercept the file system accesses to the dynamic loader, we can
provide high-performance functions that will then route these requests
over the network instead of to the file system.  One slight hurdle is
that the communication libraries (an implementation of MPI) we would
like to use are dynamically linked, so we will be unable to use them
until they have been loaded.  I have spent some time looking through
the glibc source and tried a few experimental modifications to the dl
source code in an effort to get this interception right, but it is
clear that I did not pay enough attention in my operating systems
courses :)  Our edit-compile-link loop is probably unnecessarily long,
as it takes us about 30 minutes to rebuild glibc, then another 5
minutes or so to request the compute node resources on the
supercomputer to test the program on hardware.

Here are our requirements/strategies:
* This implementation is in glibc-2.4 using the ELF binary format on
the 32-bit PowerPC architecture
* We want the I/O function calls in dl-load.c and dl-close.c (*stat,
open, close, read, seek, mmap, munmap) to be rerouted to our own
versions of these functions that are provided in a separate library
(libcollfs.so)
* We plan to implement two functions with external linkage:
collfsinitialize and collfsfinalize, to enable/disable this rerouting.
 collfsinitialize accepts a function pointer table that contains our
replacement functions and is responsible for populating a data
structure accessible to the interception points in dl-load and
dl-close, collffsfinalize is responsible for releasing this structure.
* We will also implement several functions with internal (or external,
if this makes it easier) linkage that must be callable from within
dl-load, and also have access to the data structures modified by the
externally linked collfsinitialize and collfsfinalize.  These
functions will not call any other functions aside from the original
function they are intercepting or the high-performance version passed
in via function pointer table by collfsinitialize.

And of course, our question is: How do we implement functions with
external linkage with access to the same data structures as our
intercept functions available within dl-load and dl-close?

* Where is the proper place to put the externally linked function
definitions, the internally linked function definitions?

* Is there any documentation on how to correctly modify the Versions
file to accomodate this?
It seems like we will need to target the rtld-libc.c source for both
our externally and internally linked functions, and that we will also
need to modify the Versions file to include the two new external
functions.  Is there anything else we need to be mindful of in this
implementation?

Thanks for both your work on glibc and your response.
Aron Ahmadia
Jed Brown

P.S.

If you'd like some idea of the code we've been working on (and that
isn't working), it's available on a github repository under the
minimalist branch:

https://github.com/wscullin/collfs/tree/minimalist/glibc-2.4-bgp-patches

Particularly useful are the blame links for dl-close.c and dl-libc.c

https://github.com/wscullin/collfs/blame/minimalist/glibc-2.4-bgp-patches/elf/dl-close.c
https://github.com/wscullin/collfs/blame/minimalist/glibc-2.4-bgp-patches/elf/dl-libc.c

I apologize for the confusing commit messages, the "giving up on this
branch" commits are the most useful in terms of understanding the
changes we are making.



More information about the Libc-help mailing list