A collection of LD_AUDIT bugs that are important for tools (with better formatting for this list)

John Mellor-Crummey johnmc@rice.edu
Mon Jun 21 19:42:17 GMT 2021


> On Jun 17, 2021, at 3:09 PM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Adhemerval Zanella:
> 
>> I checked the issues you listed below with master (I can't really
>> comment their status regarding the usual HPC distributions).
> 
> Thank you for reviewing the status quo.
> 
>>> ----------------------------------------------------------
>>> HIGH       | la_symbind isn't always called when appropriate.
>>>           | We observed that glibc 2.26 calls la_symbind
>>>           | when appropriate; glibc 2.28 does not.
>> 
>> I am not sure how to proper fix it, maybe an option is just to disable bind-now
>> if we have any audit module enabled. We will need to discuss the possible
>> implications.
>> 
>> (btw your auditor-tests/symbind does not enable bind-now with either
>> LD_BIND_NOW=1 or -Wl,-z,now).
> 
> The issue is that the la_symbind interface is not very good at
> communicating that PLT enter/exit hooks aren't available under these
> circumstances.  

This is a separate issue from the one we reported. The issue we reported
was that la_symbind wasn’t called and LD_BIND_NOW was not used.


> Maybe we can provide another flag and signal an error if
> la_symbind requests enter/exit hooks in BIND_NOW mode.

This may be reasonable, though my opinion would probably be different 
if I wanted PLT auditing. If some tool really needed PLT auditing, then 
being told NO might seem less reasonable.

If an auditor was not going to get the callbacks it wants with BIND_NOW, 
perhaps the auditor should be able to override the BIND_NOW.

> 
>>> ----------------------------------------------------------
>>> HIGH       | glibc does not save all necessary registers
>>>           | (e.g. X8 - the indirect result register, truncated
>>>           | SIMD registers) when auditing on aarch64 since
>>>           | the beginning of time.
>> 
>> This has been already discussed on the maillist:
>> 
>> * NEON support: we have a proposed [1] patch that addresses it by extending the 
>>  export struct used. Although the patch seems to fix the issue described by 
>>  BZ#26643 it is still incomplete: it would required to bump LAV_CURRENT for
>>  aarch64 (and add a arch-specific way to override it), add tests, and check 
>>  on how to handle possible incompatbilities. From a briefly chat with other
>>  glibc maintainer, we can just set that audit version lower LAV_CURRENT are
>>  just unsupported.
>> 
>> * SVE support: as indicated by Szabolcs SVE calls are marked with the 
>>  STO_AARCH64_VARIANT_PCS and thus explicit not supported by dynamic loader. 
>>  To support SVE, it would require save/restore all registers (but pass down 
>>  arg and retval registers to pltentry/exit callbacks according to the base 
>>  PCS). Another option would be to use different LAV_CURRENT version, one for
>>  NEON and SVE with 128-bits and another for SVE larger than 256-bits.
>>  This also has performance implications.
> 
> SVE support overlaps with the la_symbind issue quoted above because
> those bindings are conceptually BIND_NOW.
> 
>> [1] https://patchwork.sourceware.org/project/glibc/patch/20200923160448.2321909-1-woodard@redhat.com/
>> [2] https://sourceware.org/pipermail/libc-alpha/2020-September/117822.html
>> 
>>> ----------------------------------------------------------
>>> LOW        | When auditing, a dlopen of a shared library
>>>           | that uses R_X86_64_TLSDESC causes a SEGV. This
>>>           | is reportedly fixed in glibc 2.34.
>> 
>> It should be fixed by BZ#27137 (8f7e09f4dbdb5c815a18b8285fbc5d5d7bc17d86),
>> however it has regressed by 03e187a41d9.  We need the following fix and we
>> definitely need a regression tests for this (I will probably use your
>> auditor-test) as base:
>> 
>> diff --git a/elf/dl-open.c b/elf/dl-open.c
>> index d2240d8747..d2638ebf05 100644
>> --- a/elf/dl-open.c
>> +++ b/elf/dl-open.c
>> @@ -771,7 +771,7 @@ dl_open_worker (void *a)
>>     {
>>       struct link_map *libc_map = GL(dl_ns)[args->nsid].libc_map;
>> #ifdef SHARED
>> -      bool initial = libc_map->l_ns == LM_ID_BASE;
>> +      bool initial = libc_map != NULL ? libc_map->l_ns == LM_ID_BASE : false;
>> #else
>>       /* In the static case, there is only one namespace, but it
>>         contains a secondary libc (the primary libc is statically
> 
> Hmm, I'm surprised that this doesn't crash more widely.  Is it because
> DT_NEEDED on ld.so preceeds libc.so if __tls_get_addr is used, or
> something like that?
> 
>> And you are correct about your assessment on the document, we *really* need
>> more extensible tests for audit interface.  It would be really helpful if
>> we could add more tests to exercise the real world usage from tools that
>> rely on audit modules.
> 
> Agreed.  It would also be helpful to verify that the callbacks happened
> in the expected order.  This is always a troublesome aspect of a
> callback-based interface.

We used a state machine in our auditor to keep track of what stage we were in.

enum hpcrun_state {
   state_awaiting, state_found, state_attached,
  state_connecting, state_connected, state_disconnected,
};

In the LD_AUDIT interface, some callbacks inspect the current state of our auditor,
take appropriate action based on the state, and changing the state, if appropriate.

One could use such an approach with a state transition table that specifies the states
where a callback is legal and then perhaps a new state value that should be set before
the callback completes. Such a transition table should enable you to verify that the callbacks
occured in the expected partial order.

> 
>>> -  We want to use LD_AUDIT’s la_symbind32 and la_symbind64 to interpose
>>>   wrappers around key functions, e.g. pthread_create. This enables a
>>>   tool to intercept functions invoked through pointers obtained with
>>>   dlsym, which preloaded wrappers can’t do. (Note: We don’t use
>>>   la_symbind for interposition yet, but we plan to when it works
>>>   everywhere.)
>>> 
>>> -  We need auditing to work when an application or a tool library (e.g.,
>>>   Intel’s GT-Pin) opens a shared library with dlmopen.
>>> 
>>> -  We need auditing to work when opening a dynamic library with TLS
>>>   dialect gnu2 relocations on x86_64 (R_X86_64_TLSDESC). We don’t have
>>>   any special interest in such relocations; at present, they cause a
>>>   SEGV when auditing and that must be avoided.
>> 
>> This should be fixed minus the regression above.
> 
> pthread_create interception becomes more difficult in glibc 2.34 because
> the pthread_create symbol is no longer interposable.

I don’t understand why pthread_create will no longer be interposable in 2.34.
We have a set of other functions that we also need to intercept, shown below:

_Exit
_exit
execl
execle
execlp
execv
execve
execvp
exit
fork
pthread_create
pthread_exit
pthread_sigmask
sigaction
signal
sigprocmask
sigtimedwait
sigwait
sigwaitinfo
system
vfork

If by "pthread_create symbol is no longer interposable", that means we can’t 
insert a wrapper, then that is very bad for performance tools. Our tool and other
performance tools need to intercept thread creation, take appropriate action
in both the calling thread, and arrange for the new thread to call a wrapper as 
well so that the new thread can properly set up performance monitoring before
beginning application code begins to execute in the new thread.

> We could make pthread_create interposable in the same we way do that
> today for malloc.

Should we expect any problems for the other functions listed above in addition to pthread_create?

> 
> pthread_create interposition on glibc 2.0 architectures needs version
> information in la_symbind.  No 64-bit architecture except Alpha is that
> old, so maybe this doesn't matter today.  (libstdc++ currently binds to
> the pthread_create compatibility symbol on these architectures, so the
> issue is quite visible, but a rebuild of GCC against glibc 2.34 will fix
> that, too.  Then the default version will be used.)


> 
> One caveat: We should be able to make la_symbind work on current
> distributions with their hardened build defaults, *however* PLT
> enter/exit hooks will not work.  

	

> For the time being, you would have to
> find a way to wrap the call yourself.  This is theoretically fixable
> even without run-time code generation (Madhavan T. Venkataraman from
> Microsoft has submitted patches in this area for libffi), but it's
> entirely vaporware at this point.  This looks like a fairly isolated
> project someone could work on without spending considerable time on
> learning the internals of the dynamic loader.
> 
>>> -  We want to add an auditor to an application at link time, noted in the
>>>   DT_AUDIT entry of the dynamic section. Loading the DT_AUDIT entry as a
>>>   program is launched enables our profiler to be injected into an
>>>   application’s address space without a wrapper script that sets
>>>   LD_AUDIT and LD_PRELOAD.
>> 
>> So if I understood correctly you are asking something like a DT_FILTER
>> for DT_AUDIT?
> 
> Indeed, please let us know if the existing DT_AUDIT support does not
> work for you.

I’ve never used DT_FILTER before. I just skimmed the materials about DT_FILTER. 
I don’t believe that we need a DT_FILTER capability for LD_AUDIT.
> 
> Thanks,
> Florian
> 



More information about the Libc-alpha mailing list