This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Treat RTLD_GLOBAL as unique to namespace when used with dlmopen


On 07/20/2015 02:56 PM, Michael Kerrisk (man-pages) wrote:
> Hi all,
> 
> I'll add some further details to Carlos's points, plus some 
> observations from testing on Solaris.

Thank you.

> On 07/16/2015 06:43 AM, Carlos O'Donell wrote:
>> Michael Kerrisk and I are working on a man page for dlmopen.
>>
>> I have a question, and a proposal for the community.
>>
>> We do not allow dlmopen to use RTLD_GLOBAL. Was this really
>> intended or simply a QoI issue?
> 
> Well, the API comes from Solaris, but does not follow 
> Solaris behavior.

It should to the extent that it makes sense for our users.

>> Therefore dlmopen at present serves only as a limited way to
>> load one library in an isolated namespace along with all of
>> the dependent (DT_NEEDED) libraries. It would seem to me that
>> RTLD_LOCAL already provides this functionality with the exception
>> that such a DSO may get promoted to RTLD_GLOBAL if future dlopen
>> calls load a DSO RTLD_GLOBAL that has an implicit dependency
>> on the RTLD_LOCAL DSO (DT_NEEDED). In this case the DSO loaded
>> RTLD_LOCAL is promoted to RTLD_GLOBAL to resolve the dependencies.
>> This breaks the RTLD_LOCAL isolation, and is one of the benefits
>> of loading a DSO with dlmopen since at least *that* copy will
>> never be promoted to RTLD_GLOBAL.
> 
> Correct. And this is not the way that things are on Soalris.

Thanks for checking.

>> The clever developer says "No problem, I will dlmopen a stub
>> that dlopen's my library with RTLD_GLOBAL" under the impression
>> that global search list is unique per namespace. On expects
>> this allows the dlmopen'd stub to load several conjoined plugin
>> DSOs into the new namspace, having them to resolve their symbols
>> against eachother in an isolated way. This fails immediately
>> with a sigsegv (see Bug 18684[1]).
> 
> This is precisely the use case the Solaris dlmopen() does support:
> isolation of load namespaces, while allowing DSOs inside a namespace
> to share symbols via RTLD_GLOBAL.

I have seen some academic projects that used dlmopen in Solaris to
implement a form  of virtualization via the isolation of library
loading. The use of dlmopen solves quite a number of interesting
security and isolation problems within an application.

>> This trick fails for the same reason that calling dlmopen
>> with RTLD_GLOBAL would fail if you removed the check in dlfcn/dmlopen.c
>> (dlmopen_doit). When you go to add the DSO to the global
>> search list you find there is no search list setup. In the case of
>> the application we have rtld setup the global search list.
>>
>> Which begs the question? What should the global search list
>> be for a new namespace? I propose that the global search
>> list for a new namespace should be a copy of the symbol search
>> list (scope) of the first DSO loaded into the namespace with
>> RTLD_GLOBAL, and subsequent RTLD_GLOBAL loads into the namespace
>> add to that list.
> 
> The above is what Solaris appears to provide.

OK.

>> The Solaris documentation is silent on exactly what should happen
>> in this case. 
> 
> Yes, but notably the Solaris documentation does not explicitly
> prohibit the use of RTLD_GLOBAL with dlmopen(). The Solaris
> documentation says:
> 
>      The dlmopen() function is identical to dlopen(), except that
>      an identifying link-map ID (lmid) is provided. This link-map
>      ID informs the dynamic linking facilities upon  which  link-
>      map  list  to  load  the  object.

Agreed.

>> Since an alternate interpretation could be: All objects,
>> regardless of namespace (link map list) loaded with RTLD_GLOBAL are
>> available for symbol resolution for any objects. In which case
>> dlmopen with RTLD_GLOBAL makes no sense, other than perhaps symmetry
>> with dlopen, because the namespace isolation is lost. This still doesn't
>> solve the most compelling use case of an isolated set of dlmopen/dlopen
>> plugins with their own global search list.
> 
> And, in my testing, the above is *not* what Solaris does.

Good. I proposed the alternatives only in as much as they exist, but
I do not believe they are the correct technical solution.

>> The proposed interpretation of RTLD_GLOBAL for dlmopen would allow:
>>
>> * Use dlmopen with RTLD_GLOBAL, making the symbols of the first
>>   object loaded into the namespace immediately available to
>>   subsequent DSOs loaded in constructors or other dlopen implicitly
>>   into the namespace.
>>
>> * Use dlopen RTLD_GLOBAL to make symbols available for resolution
>>   only within the namespace the caller was in.
>>
>> * Allows complete isolation of a group of dependent DSOs, either
>>   via DT_NEEDED dependencies or via dlopen or subsequent dlmopen.
>>   This isolation allows plugin virtualization via dlmopen.
> 
> The above is what Solaris seems to provide.

OK.

>> Attached is a patch that fixes this for master. I still need to write
>> something like a dozen tests to show that this works as expected in
>> all the cases, but so far every test I've written works and doesn't
>> regress anything.
> 
> I've not yet had a chance to test this patch. Carlos, you may wish
> to try my code examples, and check how things look compared to Solaris.
> 
> One other deviation that I note from Solaris. The dlopen() man page
> currently says:
> 
>        If filename is NULL, then the returned handle is  for
>        the  main  program.
> 
> And this is what glibc currently does *regardless* of the namespace
> from which the dlopen(NULL, flags) call is made. But, in the context
> of dlmopen(LM_ID_NEWLM) namespaces, I'd expect this call to return 
> something like "the root of the this namespace". And that is what
> Solaris appears to do.

Agreed. We can fix that.

>> Obviously not for 2.22, but 2.23 material, along with Michael's
>> new dlmopen/dlinfo man pages we should be ready to help developers
>> use such a feature more extensively. At present I find almost no
>> code using dlmopen in userspace because it has languished as an
>> unsupported undocumented feature (Bug 15971, Bug 15271, and Bug 15134
>> all need fixing).
> 
> I would said "... because it currently serves no useful purpose".
> The dlmopen() seems to have been added to Solaris to support
> precisely the use cases that Carlos describes, and the glibc
> implementation doesn't support those cases at all.
> 
> The attached tarball contains a short build script that creates a few
> shared libraries from (mostly) simple (and commented) source files.
> 
> The overall structure is as follows:
> 
>     main():
> 
>         1. Loads libabc.so with either dlmopen() or dlopen() and 
>            with either RTLD_GLOBAL or RTLD_LOCAL, depending on the 
>            command-line arguments. If no arguments are provided, the 
>            default is dlmopen(..., RTLD_GLOBAL);
> 
>         2. Invokes abc_start() in libabc.so
> 
>     abc_start():
>         1. Loads some other shared libraries using different
>            combinations of dlmopen() and RTLD_GLOBAL vs RTLD_LOCAL.
> 
>         2. Invokes a function qrs_start() in the libqrs.so
>            library.
> 
>     qrs_start():
>         Looks up (dlsym()) various symbols in the other shared
>         libraries and reports on success or failure of the lookups.
> 
>     main():
>         Control eventually returns to main(), and it then looks up
>         some of the same symbols as qrs_start() and reports on
>         success or failure of the lookups.    
> 
> The program produces log messages that should make the results 
> reasonably easy to interpret. Annotated output from a sample
> run follows.
> 
> ---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
> $ uname -a
> SunOS login 5.10 Generic_150400-17 sun4v sparc SUNW,SPARC-Enterprise-T5220
> $ sh build.sh && ./main
> main(): lmid from dlopen(NULL) is 0 (handle = 0xff3634d8)
> main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
> main(): lmid from dlopen("libabc.so") is -13222656 (handle = 0xff371560)
> main(): invoking abc_start()
>     Called abc_start()
> # Note in next line that dlopen(NULL) gave us back a handle for something
> # other than initial NS. Linux differs on this point.
>     abc_start(): lmid from dlopen(NULL) is -13222656 (handle = 0xff173690)
>     abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
>     abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
>     abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
>     abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
>     abc_start(): invoking qrs_start()
>         Called qrs_start()
>         qrs_start(): lmid from dlopen(NULL) is -13222656 (handle = 0xff173690)
>         qrs_start(): lookup of "abc" succeeded   # In this NS, with 
>         qrs_start(): lookup of "def" failed      # Was loaded into initial NS
>         qrs_start(): lookup of "jkl" succeeded
>         qrs_start(): lookup of "mno" failed      # Was loaded with RTLD_LOCAL
>         qrs_start(): lookup of "main" failed     # Is in initial NS
> # Now do some lookups from initial NS
> main(): lookup of "abc" failed                   # In another NS
> main(): lookup of "def" succeeded                # Was loaded into initial NS
> main(): lookup of "jkl" failed                   # In another NS
> main(): lookup of "mno" failed                   # In another NS (+ RTLD_LOCAL)
> ---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

With a few more patches I get *almost* all the way there:

[carlos@athas dlmopen_expt]$ ./main
main(): lmid from dlopen(NULL) is 0 (handle = 0x0x7fa3e58ec188)
main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
main(): lmid from dlopen("libabc.so") is 1 (handle = 0x0x2267030)
main(): invoking abc_start()
    Called abc_start()
    abc_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x2267030)
    abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
    abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
    abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
    abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
    abc_start(): invoking qrs_start()
        Called qrs_start()
        qrs_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x2267030)
        qrs_start(): lookup of "abc" succeeded
        qrs_start(): lookup of "def" failed
Segmentation fault (core dumped)

There is more work to be done. This failure is from calling free() in
the non-LM_ID_BASE namespace for the first time.

My opinion is that this should all just work, but may require some special
cases in libc.so.6 and ld.so to make sure everything is initialized in the
new namespace and has it's own distinct TLS blocks (doesn't use the base
namespace TLS blocks).

The bummer is that gdb stops working to debug anything after the dlmopen.
We're going to need their help to continue debugging this after we get
the basic patches in place for 2.23.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]