A collection of LD_AUDIT bugs that are important for tools (with better formatting for this list)

Szabolcs Nagy szabolcs.nagy@arm.com
Fri Aug 6 09:04:19 GMT 2021


The 08/05/2021 12:36, Ben Woodard wrote:
> > On Aug 5, 2021, at 3:32 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> > variant pcs does not mean 'sve call' it means 'arbitrary pcs'.
> > so we have to save all registers.
> > 
> > and a base pcs call does not have to preserve sve state so
> > we don't need to save the z regs even if sve is present.
> > 
> 
> Yeah honestly this is why I really believe that we need to define STO_AARCH64_VARIANT_SVE (or something like that) and go through the pain of changing the compilers and binutils. I do not believe that my users really will care about the full breadth of STO_AARCH64_VARIANT_PCS and all the potential strange ABI variants where all the registers must be saved. I also know that they will be willing to recompile any code which uses inter-object calls that pass parameters using SVE registers in the case where they want or need to use performance tools which will go through the audit interface.
> 
> SVE parameter passing is defined in the AAPCS https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing-rules <https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing-rules> while all the other PCS variants don’t obey the AAPCS. I would argue that using STO_AARCH64_VARIANT_PCS for inter-object SVE calls was a expedient kludge that needs to be backed out.
> 
> Having a design that can ultimately accommodate all the variants including ones that do not obey the AAPCS is great. However, the only problem we need to solve in the reasonably near future is being able to audit SVE calls. If and when there are changes to the AAPCS, having an extensible design such as yours, we can handle those too. (For example: I cannot tell yet if SME https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/scalable-matrix-extension-armv9-a-architecture will ultimately lead to changes to the AAPCS but if so, the changes do not appear to have landed yet.)

we thought about marking the exact pcs separately, but

1) we did not want to reserve the available STO_ bits
   for pcs variants (there are only 6 bits and many
   potential future extensions and pcs variants)

2) we did not think that plt hooks will want to know
   the pcs in more detail than just marking non-base
   calls. plt hooks seemed to me unreliable and very
   rarely used.

if we add STO_AARCH64_SVE_PCS then we need
_ADVSIMD_VECTOR_PCS too and possibly SME variants soon.
and each variant requires their own PLT0 entry: the
STO_ flag can be checked at load time but not easily
at plt entry time, so separate plt entries are needed.
and each new variant requires a glibc update. this
approach was on the table, but i considered it overkill
when we had no usecases.

> > alternatively use _dl_runtime_profile_vpcs entrypoint for
> > an elf module that has DT_AARCH64_VARIANT_PCS set and then
> > always save all registers present. then the default entry
> > point does not need to deal with extensions. this may be
> > slow for some hpc usecases.
> 
> uuugh — my users are all HPC users.

i think we need to do some measurements how much overhead
it is to save/restore all regs for all calls.

and estimate how common it is to have elf modules with
sve calls in a hpc setting.

> Really, it comes down to I’m not a fan of DT_AARCH64_VARIANT_PCS for SVE inter-object procedure calls. I would sort of like to leave it unauditable and just move SVE into its own variant. However, other than SVE I do not know of any other uses of DT_AARCH64_VARIANT_PCS. Writing assembly where all the registers must be preserved is such a pain.

i only want to do the SVE marking if we have no solution
with existing elf abi.

> > yeah, i think we need to understand how the plt hooks are
> > used: do they actually look at these registers? or they only
> > need the registers to be preserved? we may not need easy
> > access to the z reg contents.
> > 
> 
> Tools that actually use the PLT hooks are very rare — even more rare than tools that use LD_AUDIT. The ones that I have seen use that interface to inspect and modify parameters and return values. So I would argue that providing RW access to the Z registers is required. 
> 
> The two broad categories that I have seen were either trivial and used for basic curiosity and debugging (kind of like a targeted ltrace) and ones that worked around bugs in the libraries or did additional security checks not implemented in the library. The latter seems to be what the PLT portion of the LD_AUDIT interface was designed to do. Quickly coding up security bandaids to detect and prevent exploitation until the library could be properly fixed, has been really handy a couple of times. For example, I wrote one a few years ago that detected prevented a buffer overflow in library where fixing the underlying problem in the library required an API change that would have required broader refactoring of the application. (IIRC that was libpng or some graphics format handling library - the effectiveness of this solution was limited by the fact that glibc and binutils didn’t yet implement DEPAUDIT.)

i see, i think these can work with variant pcs,
the main question is the overhead.


More information about the Libc-alpha mailing list