[PATCH] elf: Do not signal LA_ACT_CONSISTENT for an empty namespace [BZ #26076]

Carlos O'Donell carlos@redhat.com
Tue Jul 7 00:19:18 GMT 2020


On 7/6/20 5:08 PM, H.J. Lu wrote:
> On Mon, Jul 6, 2020 at 1:53 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>
>> On 7/6/20 4:47 PM, H.J. Lu wrote:
>>> On Mon, Jul 6, 2020 at 1:41 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>>>
>>>> On 7/6/20 4:40 PM, H.J. Lu via Libc-alpha wrote:
>>>>> On Mon, Jul 6, 2020 at 12:45 PM Florian Weimer via Libc-alpha
>>>>> <libc-alpha@sourceware.org> wrote:
>>>>>>
>>>>>> The auditing interface identifies namespaces by their first loaded
>>>>>> module.  Once the namespace is empty, it is no longer possible to signal
>>>>>> LA_ACT_CONSISTENT for it because the first loaded module is already gone
>>>>>> at that point.
>>>>>>
>>>>>> Tested on i686-linux-gnu and x86_64-linux-gnu.
>>>>>>
>>>>>> ---
>>>>>>  elf/dl-close.c | 10 ++++++++--
>>>>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/elf/dl-close.c b/elf/dl-close.c
>>>>>> index 73b2817bbf..8e146ecee1 100644
>>>>>> --- a/elf/dl-close.c
>>>>>> +++ b/elf/dl-close.c
>>>>>> @@ -781,8 +781,14 @@ _dl_close_worker (struct link_map *map, bool force)
>>>>>>    if (__glibc_unlikely (do_audit))
>>>>>>      {
>>>>>>        struct link_map *head = ns->_ns_loaded;
>>>>>> -      /* Do not call the functions for any auditing object.  */
>>>>>> -      if (head->l_auditing == 0)
>>>>>
>>>>> I assume that "head" can be NULL.  Do you have a testcase?
>>>>
>>>> Yes, it's tst-auditmany in trunk right now. It fails because of this issue.
>>>
>>> I have seen tst-auditmany failure,  but not always.  The test was added
>>> more than 6 months ago. Why does it start failing now?
>>
>> A combination of, IMO, Szabolcs surplus tls fixes, rseq increasing static
>> TLS usage, and Florian's recent (months ago) fixes to make unloading reliable.
>>
>> The 9th audit module can fail to load if it runs out of static tls surplus
>> to use or if the audit module load fails (not enough namespaces).
>>
>> I see this failing with Szabolcs recent patches to adjust tls surplus.
>>
>> I have debugged the failure and it's like this:
>>
>> (a) We try to load the auditor.
>> (b) The auditor load fails and we unwind all the loads, leaving an
>>     empty link namespace.
>> (c) We try to indicate consistent namespace for an empty namespace
>>     and crash.
>>
>> With Florian's recent work to make unloading reliable we now end up with
>> empty link namespaces for failed to load audit modules. As they should be.
>>
>> However this above code expects there to be *something* left in the link
>> namespace and there isn't.
>>
>> We *could* arrange to call LA_ACT_CONSISTENT with a NULL cookie in this
>> case for all observing auditors. I'm not opposed to that.
> 
> There are
> 
>   bool do_audit = GLRO(dl_naudit) > 0 && !ns->_ns_loaded->l_auditing;
> ...
> 
>   if (__glibc_unlikely (do_audit))
>     {
>       struct link_map *head = ns->_ns_loaded;
>       struct audit_ifaces *afct = GLRO(dl_audit);
>       /* Do not call the functions for any auditing object.  */
>       if (head->l_auditing == 0)
> 
> It sounds like ns->_ns_loaded is changed between setting and using of
> do_audit.  Shouldn't do_audit be set to false when ns->_ns_loaded becomes
> NULL?

I don't think so. In fact we should still be auditing IMO. I think that
do_audit should remain true, but *what* we use as the cookie should have
been a more stable choice, *or* the location of the LA_ACT_CONSISTENT
is too late.

When the first loaded object fails to be loaded, as is the case here,
we end up with an empty link namespace and thus no link map to be able
to use as the cookie. It's not clear why we are using ns->_ns_loaded
rather than something else as the cookie.

A more stable cookie would be to rewrite the code to use the address
of a cookie that is attached to the link namespace and removed from there
*after* we issue the final LA_ACT_CONSISTENT that uses the cookie to track
the objects removal.

Such a change is more complex than we need today. For now I think it is
sufficient to say that la_activity LA_ACT_CONSISTENT will not be called if
the last object in the link namespace is removed.

Alternatively we can avoid deleting the link map until *after* the la_activity
audit point. That requires refactoring.

Note that in Solaris it too up to LAV_VERSION5 before they got all the
semantics to this sorted out :-)

-- 
Cheers,
Carlos.



More information about the Libc-alpha mailing list