A collection of LD_AUDIT bugs that are important for tools (with better formatting for this list)

Szabolcs Nagy szabolcs.nagy@arm.com
Thu Aug 5 10:32:45 GMT 2021


The 08/04/2021 15:11, Adhemerval Zanella wrote:
> I updated my branch with a POC patch to support SVE for rtld-audit [1].
> In the end the layout I ended up using is:
> 
>   typedef union
>   {
>     float s;
>     double d;
>     long double q;
>     long double *z;
>   } La_aarch64_vector;
> 
>   /* Registers for entry into PLT on AArch64.  */
>   typedef struct La_aarch64_regs
>   {
>     uint64_t          lr_xreg[9];
>     La_aarch64_vector lr_vreg[8];
>     uint64_t          lr_sp;
>     uint64_t          lr_lr;
>     uint8_t           lr_sve;
>     uint16_t          *lr_sve_pregs[4];
>   } La_aarch64_regs;
> 
>   /* Return values for calls from PLT on AArch64.  */
>   typedef struct La_aarch64_retval
>   {
>     /* Up to eight integer registers can be used for a return value.  */
>     uint64_t          lrv_xreg[8];
>     /* Up to eight V registers can be used for a return value.  */
>     La_aarch64_vector lrv_vreg[8];
>     uint8_t           lrv_sve;
>   } La_aarch64_retval;
> 
> It means the if 'lr_sve' is 0 in either La_aarch64_regs or La_aarch64_retval
> the La_aarch64_vector contains the floating-pointer registers that can be
> accessed directly.  Otherwise, 'La_aarch64_vector.z' points to a memory area
> that holds up to 'lr_sve' bytes from the Z registers, which can be loaded
> with svld1 intrinsic for instance (as tst-audit28.c does). The P register
> follows the same logic, with each La_aarch64_regs.lr_sve_pregs pointing
> to an area of memory 'lr_sve/8' in size.

i'd try a more generic extension mechanism like in
the linux sigcontext struct. then it's less likely
that existing plt hook code needs to change when
new register state is present and used.

and i think we need to handle variant pcs in a
generic way: we don't know if that's sve pcs or not.

for example

struct {
  uint64_t lr_x[9];
  uint64_t lr_lr;
  uint64_t lr_sp;
  uint64_t lr_flags; // e.g. is this a variant PCS call?
  vreg_t lr_v[8];
  struct extension *lr_ext;
};

struct extension {
  struct extension *next;
  uint32_t type; // e.g. sve extension
  uint32_t len; // can copy the contents even for unknown type
};

struct xreg_extension {
  struct extension header;
  uint64_t x[30];
};

struct vreg_extension {
  struct extension header;
  vreg_t v[24];
};

struct sve_extension {
  struct extension header;
  uint16_t vl;
  zreg_t *z[32];
  preg_t *p[16];
  char data[];
};

> 
> So, to access the FP register as float you can use:
> 
>   static inline float regs_vec_to_float (const La_aarch64_regs *regs, int i)
>   {
>     float r;
>     if (regs->lr_sve == 0)
>       r = regs->lr_vreg[i].s;
>     else
>       memcpy (&r, &regs->lr_vreg[i].z[0], sizeof (r));
>     return r;
>   }
> 
> To implement it I had to setup lazy binding when profiling or auditing is
> used, even when STO_AARCH64_VARIANT_PCS is being set.  Also, to not incur
> in performance penalties on default non-SVE configuration, the patch uses
> a new PTL entrypoint, _dl_runtime_profile_sve, which is used iff 'hwcap'
> has HWCAP_SVE bit set.

variant pcs does not mean 'sve call' it means 'arbitrary pcs'.
so we have to save all registers.

and a base pcs call does not have to preserve sve state so
we don't need to save the z regs even if sve is present.

the main difficulty i see is that we cannot easily tell in
a plt entry if it is for a variant pcs symbol: you have to
look at the symbol table entry using the symbol index from
the relocation. usually such code is in c, but c code does
not preserve all registers, so here it has to be in asm.
the clean way would be a different entrypoint for variant
pcs calls, but that requires linker changes (another PLT0
like construct where variant pcs PLT can go).

alternatively use _dl_runtime_profile_vpcs entrypoint for
an elf module that has DT_AARCH64_VARIANT_PCS set and then
always save all registers present. then the default entry
point does not need to deal with extensions. this may be
slow for some hpc usecases.

> 
> I think this is a fair assumption since SVE has a defined set of registers
> for argument passing and return values.  A new ABI with either different
> argument passing or different registers would require a different PLT
> entry, but I assume this would require another symbol flag anyway (or
> at least a different ELF mark to indicate so).
> 
> For this POC, the profile '_dl_runtime_profile_sve' entrypoint assumes
> the largest SVE register size possible (2048 bits) and thus it requires
> a quite large stack (8976 bytes).  I think it would be possible make the
> stack requirement dynamic depending of the vector length, but it would
> make the PLT audit function way more complex.

yeah, i think we need to understand how the plt hooks are
used: do they actually look at these registers? or they only
need the registers to be preserved? we may not need easy
access to the z reg contents.

> 
> This patch is not complete yet: the tst-audit28 does not check if compiler
> supports SVE (we would need a configure check to disable for such case),
> I need to add a proper comment for the _dl_runtime_profile_sve
> stack layout, the test need to check for the P register state clobbering.
> 
> I also haven't check the performance penalties with this approach, and
> maybe the way I am saving/restoring the SVE register might be optimized.
> 
> In any case, I checked on a SVE machine and at least the testcase work
> as expected without any regressions.  I also did a sniff test on a non SVE
> machine.
> 
> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/ld-audit-fixes


More information about the Libc-alpha mailing list