Bug 31992 - Dynamic linker auditors need to be able to audit each other.
Summary: Dynamic linker auditors need to be able to audit each other.
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.39
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-18 23:53 UTC by Ben Woodard
Modified: 2024-09-18 23:59 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
reproducer (1.47 KB, application/gzip)
2024-07-18 23:53 UTC, Ben Woodard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Woodard 2024-07-18 23:53:12 UTC
Created attachment 15633 [details]
reproducer

Attached a reproducer that demonstrates these problems. To run this reproducer simply:

tar xvzf auditor-namespace-notification-silence.tar.gz
cd auditor-namespace-notification-silence
make

The output will look something like this:
Running test, success requires OK (not FAIL):
LD_AUDIT=./auditor.so:./victim.so ./main
[audit] la_objopen for ./victim.so, OK.
[audit] la_objopen for ./lib-base.so, OK.
[audit] la_symbind binding __cxa_finalize from ./lib-base.so to /lib64/libc.so.6, OK.
[audit] la_symbind binding foo from ./victim.so to ./lib-base.so, OK.
[audit] la_objclose for ./lib-base.so, OK.
[audit] la_symbind binding foo to unknown binary ./lib-victim.so, FAIL.

[audit] Expected la_objsearch for victim.so, FAIL.
[audit] Expected la_objclose for victim.so, FAIL.
[audit] Expected la_objsearch for lib-base.so, FAIL.
[audit] Expected la_objsearch for lib-victim.so, FAIL.
[audit] Expected la_objopen for lib-victim.so, FAIL.
[audit] Expected la_objclose for lib-victim.so, FAIL.

It takes a bit to understand the output though. The key command after it builds the source is:
LD_AUDIT=./auditor.so:./victim.so ./main

The outer auditor.so should see all dynamic loader events even those that happen in the second auditor. 
The second auditor is called victim.so and it has a void la_preinit(uintptr_t* cookie) function.  Which according to the rtld-audit man page "The dynamic linker invokes this function after all shared objects have been loaded, before control is passed to the application (i.e., before calling main()).  Note that main()  may still later dynamically load objects using dlopen(3)."

So it will be run before the process. That function does some dynamic loading:
1) it loads a library into the process's name space
2) it loads another library into its own name space.

Both of these are standard auditor behavior. The way that the dynamic linking process is supposed to work is first there is an la_objsearch() call this allows an auditor to replace the object being loaded with something of its selection. Then there is an la_objopen() which provides the link_map, the namespace, and the cookie to that object.

The bugs are as follows:
1)  The first auditor is never given the opportunity to replace the second auditor because there is no la_objsearch() for the second audtor's library. This is indicated by the output:
[audit] Expected la_objsearch for victim.so, FAIL.

2) When the second auditor loads an object into the process's namespace,  the la_objsearch() function in the first auditor doesn't get called. However, the la_objopen() function is called. This prevents the first auditor from replacing the library that the second auditor is placing in the process's namespace. This is what is indicated by the failures at the end of the output:
[audit] Expected la_objsearch for lib-base.so, FAIL.
[audit] Expected la_objsearch for lib-victim.so, FAIL.

3) When the second auditor loads an object into its own namespace the situation is even worse. Neither la_objsearch() nor la_objopen() are called.  Without the objsearch, the first auditor can't replace the library being loaded by the second auditor. Then without the objopen when the function is bound with the dlsym() call, and ld.so calls la_symbind() the first auditor doesn't know about the object that this function is associated with. This is what is indicated by:
[audit] la_symbind binding foo to unknown binary ./lib-victim.so, FAIL.

4) When the second auditor closes the library that it loaded into its own namespace, the first auditor doesn't get the expected la_objclose(). This is indicated by the line in the output:
 [audit] Expected la_objclose for lib-victim.so, FAIL.

5) Likewise when the process exits and the second auditor is removed, the first auditor doesn't get notified. The line in the output that indicates this is:
[audit] Expected la_objclose for victim.so, FAIL.

These are all filed together because, I believe that all of these problems are very closely related and the underlying problem is that the necessary callbacks have not been added to the auditor handling code.
Comment 1 Florian Weimer 2024-08-08 14:01:14 UTC
Regarding (1), (3), (4), (5), this is expected behavior. Auditors are not themselves subject to auditing within their own namespaces. For the case of a single auditor, that prevents itself from observing its own activity binding activity, for example. This is not an emergent property of the code, there are explicit checks to suppress audit callbacks for auditor-initiated actions.

The la_symbind call for foo (binding from ./victim.so to ./lib-victim.so) in the second auditor's namespace looks like a bug. It seems that we do not consistently suppress callbacks based on namespace. It's suppressed based on the __RTLD_AUDIT flag in _dl_relocate_object, but it looks like this isn't properly applied to the dlmopen call from an auditing namespace.

Likewise, the la_objopen callback for the second auditor is a bit unexpected.

However, we may need this la_objopen call to identify loaders into non-auditing namespaces, so that we can providing cookies for them, and eventually for la_symbind events in non-auditing namespaces.

This isn't going to be an easy fix. It would be interesting to know more about the intended applications for multiple auditors.
Comment 2 Ben Woodard 2024-08-08 22:11:58 UTC
(In reply to Florian Weimer from comment #1)
> Regarding (1), (3), (4), (5), this is expected behavior. Auditors are not
> themselves subject to auditing within their own namespaces. 

In case [1] (and also in [3] [4] [5]), the first auditor is not expecting to audit its own namespace. It is expecting to be able to audit the second auditor's namespace. The classic example where this comes up is when the user trys to use two incompatible tools. For example say there is an MPI traffic analysis tool that must be compiled against the system's MPI version. The first tool is reflectively aware that it has been compiled against one flavor of MPI but then notices that as the second auditor the user has specified an auditing tool for a different flavor of MPI. Since this happens quite a lot as user cut and paste things around, the first auditor knows the location of the second tool but compiled with the correct flavor of MPI. 

You can see that the first auditor is not trying to audit its own namespace.The command line is:
LD_AUDIT=./auditor.so:./victim.so ./main

The first error is:

[audit] Expected la_objsearch for victim.so, FAIL.

./victim.so is the second auditor and it is in its own namespace.

If the first auditor saw an la_objopen() for ./victim.so when the second audit library was being loaded it would have recorded that fact when it hit line 11 in auditor.c:

    10	char* la_objsearch(const char* name, uintptr_t* cookie, unsigned int flag) {
    11	  if(strcmp(name, "./victim.so") == 0) victim |= bit_search;

Each individual auditor is in its own private namespace. All auditors are not in one single namespace. There is a specified ordering to auditors:
LD_AUDIT=<first auditor>:<second auditor>:... application
where the application subsequently can have DT_AUDIT libraires that are loaded after the auditors specified in the environment variable but before the application. 
So the ultimate order would be:
<first LD_AUDIT auditor> <second LD_AUDIT auditor> ... <first DT_AUDIT auditor> <second DT_AUDIT audtor> ... application

The expectation is that auditors which are further away from the application in this ordering should see events by auditors closer to the application. The final auditor would only see events from the applictation itself.

It seems that the __RTLD_AUDIT flag is being applied over broadly preventing auditors from overseeing other auditors closer to the application.

The general direction that the tools community is going is to abandon old tooling interfaces such as LD_PRELOAD which are almost impossible to stack and replace them with stacks of audit libraries. Some of these are going to be linked with DT_AUDIT for always on functionality and others are going to be specified through the LD_AUDIT environment variable.

Some examples are:

hpctoolkit (a performance tool) combined with spindle (an audit library that makes use of multicast like capabilities of the backend network fabric to more quickly load MPI applications and all their libraires on the hundreds of compute nodes participating in a job) combined with a next generation malloc replacement which is is written as an audit library.

Other ideas include having an ABI checking library to verify that all the versions of the libs found when traversing the user's LD_LIBRARY_PATH are in fact ABI compatible and that the ABI of some seldom used call in some GPU library didn't change ABI between version 6.1.1 and 6.1.2 only to be discovered with a crash after several hours compute time across hundreds of compute nodes. This ABI checking audit library would copuld be combined with any of the aforementioned audit libraries like hpctoolkit or spindle.

A MPI replacement/thunking layer audit library which substitutes a generic MPI that any application can be compiled against with one that is specific to the supercomputer.

A MPI communication monitoring tool which helps identify communication bottlenecks in an application.

Similarly an audit library that substitutes a fixed or locally optimized version of a library for one that is known buggy or hasn't been optimized without requiring the user to relink thier application.

All of these things may be used individually or combined. One of the deficiencies of older tooling interfaces like LD_PRELOAD was the function wrapping and isolation of symbols was not sufficient to allow multiple different tools to be combined. That is one of the features driving us toward increasing use of LD_AUDIT.

The other feature is that our developer tools group can easily provide recipes to our developers allowing them to link in tools using DT_AUDIT so that these tools can be "always on". This appears to be less error prone than giving developers long LD_AUDIT and LD_LIBRARY_PATH environment variables when asking them to always use a particular tool.
Comment 3 Ben Woodard 2024-09-18 23:59:51 UTC
Probably the most compelling reason for this that has been presented so far is PC sampling. The tool doing PC sampling needs a full representation of the address space. If another tool does a dlopen() into its own namespace, not a dlmopen into the application namespace then there are loaded modules which the auditor doing PC sampling doesn't know about. So when it gets a PC sample in one of the other audit based tools it doesn't know what to do with it.

The other fairly complelling argument is that audit based tools are not only used to analzye and debug applications, they are also used to monitor, debug, and analyze other audit based tools. Therefore, auditors need to be able to audit other auditors.