Fwd: [PATCH v5 00/22] Some rtld-audit fixes

Adhemerval Zanella adhemerval.zanella@linaro.org
Fri Nov 19 19:56:25 GMT 2021



On 19/11/2021 16:18, Florian Weimer via Libc-alpha wrote:
> * Jonathon Anderson:
> 
>>>> Right now, we
>>>> only require the program headers which we can obtain from
>>>> getauxval(AT_PHDR), however this technique has questionable
>>>> portability and robustness (getauxval returns an unsigned long, not a
>>>> pointer).
> 
>>> A glibc port to an architecture where a long value cannot hold all
>>> pointer values will have to provide an alternative interface similar to
>>> getauxval, but that returns pointer values.
> 
>> I would go one step farther and say getauxval is already broken for
>> any 64-bit architecture, unsigned long is only required to support 32
>> bits as per the C standards. One of my greater fears is that some
>> exotic compiler will cleverly allocate only 4 bytes of stack space for
>> the return value, and we wouldn't know except for a subtle bug
>> (dependent on optimization flags!) that crashes our entire tool with
>> SEGVs in the auditor (where GDB doesn't give properly unwound call
>> stacks).
> 
> If ported to such an architecture, glibc would need several changes to
> accomodate this.  Newer architectures take this into account and do not
> do funny things.  But Morello, as a capabilities-based architecture,
> does not have this luxury, so they have to do something about this
> interface.  But that is (still) an outlier.
> 
> I think the important point is that glibc interfaces do not need to be
> fully API-compatible with future architecture requirements.  We can
> change APIs for future ports.
> 
>>> Of course that's not the
>>> only interface with this problem (ElfW(Addr) is an integer as well).
> 
>> AFAICT ElfW(Addr) is fine, it should always be an integer large enough
>> to store a pointer on the host architecture (i.e. a uintptr_t). Unless
>> I missed some specific arch where this doesn't work out to be the
>> case?
> 
> Morello and other capabilities-based architectures.  Pointers need to
> pointers there.  Weird architectures do not have uintptr_t.
> 
>>> makes the Morello glibc port quite interesting.  So I think *something*
>>> like getauxval (AT_PHDR) will always be available, with pretty much
>>> identical semantics.
>>>
>>>>  From an outside perspective the current l_addr semantic is fairly
>>>> undocumented, the dladdr and dlinfo man pages define it vaguely as
>>>> the "difference between the address in the ELF file and the address
>>>> in memory." That sounds (to me at least) like l_addr should point to
>>>> byte 0 in the file (the ELF header), and that seems to be correct in
>>>> all but the non-PIE case.
>>> I have struggled with this in the past.  I agree that it is confusing.
>>> l_addr is the offset between virtual addresses in the program header of
>>> the ELF object and the actual addresses in the process image.  This
>>> offset happens to be 0 for ET_EXEC objects, and only there.
> 
>> This is a much clearer description of the semantic, it would be very
>> helpful the man pages used that sentence (or one like it) wherever the 
>> l_addr value is exposed in the API (link_map->l_addr and
>> dl_phdr_info->dlpi_addr). It would also be very helpful if 
>> Dl_info.dli_fbase was clearly documented as *not* l_addr but instead
>> byte 0/ELF header in the process image.
> 
> I've made a note to update the manual pages.
> 
>> That does sound like the "correct" way out, but dl_iterate_phdr
>> operates on the caller's namespace so one would need to inject a shim
>> library to do the actual call.
> 
> Ugh, you are right.  It means we currently can't unwind across dlmopen
> boundaries.
> 
> So please use getauxval (AT_PHDR) for now.  It is fully portable across
> all present glibc targets.
> 
>>>> dladdr gets its value from link_map->l_map_start instead of l_addr,
>>>> so the semantic we want is already present in a private field. It
>>>> seems to me these two fields could be swapped with little issue, if
>>>> altering the public semantic is not acceptable we could also be sated
>>>> if l_map_start was made public.
>>> Applications which know about the current semantics of l_addr will
>>> break, though.  l_addr is also exposed to debuggers via the _r_debug
>>> interface.  I really do not think we can make changes to l_addr.
>>> We have a similar issue around l_name being "" for the main program, and
>>> unfortuantely I will have to argue quite strongly against changing that.
> 
>> Is adding new public fields completely off the table?
> 
> To struct link_map?  We could probably pull it off, but it would be
> years until such a change will be in the hands of the users.  There is
> an internal structure that overlaps with the public struct link_map,
> and some applications poke at the private bits at fixed offsets.
> 
> We've started not to strip ld.so downstream, so that these applications
> can switch to DWARF data to avoid dependencies on fixed offsets, but
> that has been a very recent change.
> 
>> If I can humor the impossible for a few moments longer, I personally
>> have a difficult time believing that anyone actually uses 
>> link_map->l_addr or link_map->l_name in a way that would break by
>> changing their semantics for the main executable:
> 
>>  - The documentation hasn't improved for years so there can't be many
>> users that care about (or even noticed) this case in particular.
> 
> "" for the main executable is widely known.  Usually code uses it to
> implement a fallback on argv[0] or /proc/self/exe, though.

There are still the issue where audit interface does not have direct
access to argv[0] from the audited process and '/proc' might also not
be accessible.  I am still not convinced that provided argv[0] for
l_name for main executable is worse than "", specially because the
fallback might not work.

> 
> Changing l_addr will break the libgcc unwinder.  It uses l_addr to
> relocate the program header (see the code I quoted previously).  Not
> everyone uses the platform unwinder, and the libgcc unwinder is
> sometimes linked statically.  This is different from the l_name change:
> The l_addr would definitely cause widespread breakage.
> 
>>  - Every use case I can think of for obtaining a link_map from the dl*
>> functions (dlinfo and dladdr1) will either already have the special 
>> handling, or won't operate on the main executable, or likely won't opt
>> to use l_addr (vs. dlsym or dli_fbase) or l_name (vs. dli_fname).
> 
> Some special-case the main executable based on l_name, I expect, which
> is why I'm so reluctant to change l_name.  The GDB comment is actually
> hinting strongly towards a "" convention (that Solaris broke).

So I take that Solaris does provide the application name to l_name? And
what kind of breakage it has done on gdb?


More information about the Libc-alpha mailing list