This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] elf: Fix LD_AUDIT for modules with invalid version (BZ#24122)
On 1/24/19 9:16 AM, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>> On 24/01/2019 11:20, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> On 24/01/2019 10:33, Florian Weimer wrote:
>>>>> * Siddhesh Poyarekar:
>>>>>
>>>>>> On 24/01/19 2:14 AM, Carlos O'Donell wrote:
>>>>>>>> Siddhesh,
>>>>>>>>
>>>>>>>> It is acceptable for 2.29?
>>>>>>>
>>>>>>> It's OK with me. Siddhesh gets to make the call.
>>>>>>> This is only a bug fix.
>>>>>>>
>>>>>>> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
>>>>>>
>>>>>> OK
>>>>>
>>>>> This broke the build with at least GCC 8 and the GCC 9 pre-release on
>>>>> x86-64 and ppc64le:
>>>>>
>>>>> rtld.c: In function ‘dl_main’:
>>>>> rtld.c:1556:8: error: ‘lav’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
>>>>> _dl_debug_printf (
>>>>> ^~~~~~~~~~~~~~~~~~
>>>>> " auditor disabled since expected version %d is greater than "
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> "supported version %d.\n",
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> lav, LAV_CURRENT);
>>>>> ~~~~~~~~~~~~~~~~~
>>>>>
>>>>> Looks like the control flow is too complicated. I think it's a false
>>>>> positive.
>>>>>
>>>>> How should we fix this? Inhibit the warning (making the sources even
>>>>> more convoluted than they are)? Revert the patch? Factor out the audit
>>>>> module loading into a separate function and simplify the control flow?
>>>>
>>>> It does seem a false positive, since lav will always be set if laversion
>>>> is not NULL and the code explicitly checks for it. Let me try if
>>>> refactoring this can help gcc handle it correctly.
>>>
>>> I'm testing this patch. It changes the error messages slightly.
>>>
>>> It also calls _dl_close if we cannot find the laversion symbol,
>>> something the previous code did not do.
>>
>> The current code does calls _dl_close in this case, it does not if
>> _dl_catch_error returns an err_str (in this case the object is not
>> loaded).
>
> I meant this part:
>
> /* Check whether the interface version matches. */
> (void) _dl_catch_error (&objname, &err_str, &malloced,
> lookup_doit, &largs);
>
> unsigned int (*laversion) (unsigned int);
> unsigned int lav;
> if (err_str != NULL)
> goto not_loaded;
>
> The not_loaded label is in the true branch of the surrounding if, and
> the unloading on error only happens at the end of the else branch.
>
>>> +/* Called the audit DSO cannot be used, if it does not have the
>>> + appropriate interfaces or it expects something more recent. */
>>> +static void
>>> +unload_audit_modile (struct link_map *map, int original_tls_idx)
>>
>> s/modile/module
>
> Fixed.
>
>>> + (void) _dl_catch_error (&objname, &err_str, &malloced,
>>> + lookup_doit, &largs);
>>
>> Do we need this (void) cast?
>
> No, I can remove it, it came from the original code. I removed it.
>
>>
>>> + if (__glibc_likely (err_str != NULL))
>>> + {
>>> + unload_audit_modile (dlmargs.map, original_tls_idx);
>>> + report_audit_module_load_error (name, err_str, malloced);
>>> + return;
>>> + }
>>> +
>>> + unsigned int (*laversion) (unsigned int) = largs.result;
>>> + if (laversion == NULL)
>>> + {
>>> + _dl_error_printf ("\
>>> +ERROR: ld.so: audit interface '%s' has NULL laversion symbol; ignored.\n",
>>> + name);
>>
>> Maybe "ERROR: ld.so: audit interface does not contain a laversion symbol; ignored.\n" ?
>
> No, the missing symbol case is covered earlier, where lookup_doit will
> fail. This is really about a NULL symbol, a different error case.
> Maybe this is so bizarre that we should drop this check? (You need an
> SHN_ABS symbol or an IFUNC resolver that returns NULL to get this.)
>
>>> + unload_audit_modile (dlmargs.map, original_tls_idx);
>>> + return;
>>> + }
>>> +
>>> + unsigned lav = laversion (LAV_CURRENT);
>>> + if (lav == 0)
>>> + {
>>> + /* Only print an error message if debugging because this can
>>> + happen deliberately. */
>>> + if (GLRO(dl_debug_mask) & DL_DEBUG_FILES)
>>> + _dl_debug_printf ("\
>>> +audit interface '%s' requested to be ignored (returned version of 0).\n",
>>> + name);
>>
>> I used the '\nfile=%s...' to follow the messages already printed for DL_DEBUG_FILES.
>> Maybe 'file=%s requested to be ignored (returned version of 0).\n".
>
> This doesn't say that it's ignored as an audit module.
>
>>> + unload_audit_modile (dlmargs.map, original_tls_idx);
>>> + return;
>>> + }
>>> +
>>> + if (lav > LAV_CURRENT)
>>> + {
>>> + _dl_debug_printf ("\
>>> +ERROR: audit interface '%s' requires version %d (maximum supported version %d); ignored.\n",
>>> + name, lav, LAV_CURRENT);
>>
>> My understanding is Carlos asked to this scenario not be reported as
>> an error.
>
> I think Carlos meant the lav == 0 case, where I agree that warning
> should be suppressed by default. The lav > LAV_CURRENT suggests a
> broken audit module because it did not properly check the required
> version against LAV_CURRENT. (If it does not want ld.so to log an
> error, the audit module can always return 0 instead.)
>
> New patch below.
>
> Thanks,
> Florian
>
> elf: Refactor audit module loading out of dl_main
>
> This avoids a compiler warning about an uninitialized lav variable
> (seen with GCC 8 and 9 on x86-64).
>
> 2019-01-24 Florian Weimer <fweimer@redhat.com>
>
> * elf/rtld.c (unload_audit_module): New function.
> (report_audit_module_load_error): Likewise.
> (load_audit_module): Likewise. Extracted from dl_main. Call
> _dl_close if the laversion symbol cannot be found. Use early
> returns for error handling. Remove spurious comment about static
> TLS initialization. Remove (void) casts of function results.
> (load_audit_modules): New function. Extracted from dl_main.
> (dl_main): Call load_audit_modules.
This work looks great.
However, this has gotten too big.
Please revert the fix, and put this back in when 2.30 opens.
I have been convinced of the following:
* No error or warning when lav == 0.
* A warning when lav > LAV_CURRENT, because such modules can see
LAV_CURRENT as input and should disable themselves by returning
0, rather than returning a really high lav. This allows users
to catch this mistake (Florian convinced me of this on IRC).
Lastly, we should not look for a la_version == NULL, which really
means you had a (null) symbol (hard to create), but should instead
just assert(la_version != NULL) in this case I think.
--
Cheers,
Carlos.