Summary: | register x8 and quad sized NEON registers are not properly saved when using ld_audit on aarch64 | ||
---|---|---|---|
Product: | glibc | Reporter: | Ben Woodard <woodard> |
Component: | dynamic-link | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | nnye, nsz |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | 2.35 | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Bug Depends on: | |||
Bug Blocks: | 28366 | ||
Attachments: |
reproducer that demonstrates that x8 is not being saved.
Patch to fix the x8 problem and the NEON quad sized vector register problems V2 of the patch Second V2 version of the patch |
Created attachment 12854 [details]
Patch to fix the x8 problem and the NEON quad sized vector register problems
his patch was tested to work on the x8 reproducer attached to this bug.
$ LD_AUDIT=../break-glibc/arm-audit-bug/simple/libauditor.so ../break-glibc/arm-audit-bug/simple/main
Segmentation fault (core dumped)
$ builddir=`pwd` LD_AUDIT=../break-glibc/arm-audit-bug/simple/libauditor.so CONV_PATH="${builddir}"/iconvdata LOCPATH="${builddir}"/localedata LC_ALL=C "${builddir}"/elf/ld-linux-aarch64.so.1 --library-path "${builddir}":"${builddir}"/math:"${builddir}"/elf:"${builddir}"/dlfcn:"${builddir}"/nss:"${builddir}"/nis:"${builddir}"/rt:"${builddir}"/resolv:"${builddir}"/mathvec:"${builddir}"/support:"${builddir}"/crypt:"${builddir}"/nptl ../break-glibc/arm-audit-bug/simple/main
$
The NEON Vector register size problem is also fixed but a reproducer has not been developed. The analogous SVE problem is not yet solved. I do not think that we want to be pushing 7*2048 bits of z registers and 3*256 bits of p registers into the La_aarch64_regs structure each and every call. So I do not know exactly how that should be addressed.
Created attachment 12856 [details]
V2 of the patch
Fixes added to the La_aarch64_retval structure.
Created attachment 12859 [details]
Second V2 version of the patch
Patch numbering system kind of got jumbled this is the 2nd version of the patch that I sent upstream. It incorporates feedback from Szabolcs Nagy
This looks related to https://sourceware.org/bugzilla/show_bug.cgi?id=28366 Proposed patch: https://patchwork.sourceware.org/project/glibc/patch/20210730194715.881900-21-adhemerval.zanella@linaro.org/ fixed for 2.35 in commit ce9a68c57c260c8417afc93972849ac9ad243ec4 https://sourceware.org/git/?p=glibc.git;a=commit;h=ce9a68c57c260c8417afc93972849ac9ad243ec4 elf: Fix runtime linker auditing on aarch64 (BZ #26643) The rtld audit support show two problems on aarch64: 1. _dl_runtime_resolve does not preserve x8, the indirect result location register, which might generate wrong result calls depending of the function signature. 2. The NEON Q registers pushed onto the stack by _dl_runtime_resolve were twice the size of D registers extracted from the stack frame by _dl_runtime_profile. While 2. might result in wrong information passed on the PLT tracing, 1. generates wrong runtime behaviour. The aarch64 rtld audit support is changed to: * Both La_aarch64_regs and La_aarch64_retval are expanded to include both x8 and the full sized NEON V registers, as defined by the ABI. * dl_runtime_profile needed to extract registers saved by _dl_runtime_resolve and put them into the new correctly sized La_aarch64_regs structure. * The LAV_CURRENT check is change to only accept new audit modules to avoid the undefined behavior of not save/restore x8. * Different than other architectures, audit modules older than LAV_CURRENT are rejected (both La_aarch64_regs and La_aarch64_retval changed their layout and there are no requirements to support multiple audit interface with the inherent aarch64 issues). * A new field is also reserved on both La_aarch64_regs and La_aarch64_retval to support variant pcs symbols. Similar to x86, a new La_aarch64_vector type to represent the NEON register is added on the La_aarch64_regs (so each type can be accessed directly). Since LAV_CURRENT was already bumped to support bind-now, there is no need to increase it again. Checked on aarch64-linux-gnu. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> |
Created attachment 12853 [details] reproducer that demonstrates that x8 is not being saved. Description of problem: According to the Arm Cortex-A Series Programmers Guide for ARMv8-A It is part of the ABI for the architecture. See Section 9.1.1 Parameters in general-purpose registers on page 9-3 where it states: Registers with a special purpose (X8, X16-X18, X29, X30) • X8 is the indirect result register. This is used to pass the address location of an indirect result, for example, where a function returns a large structure.gister should be preserved. You can see that this register is saved in the normal code path which is what is normally run. See sysdeps/aarch64/dl-trampoline.S:48 : /* Save arguments. */ stp x8, x9, [sp, #-(80+8*16)]! cfi_adjust_cfa_offset (80+8*16) cfi_rel_offset (x8, 0) cfi_rel_offset (x9, 8) However, when you use LD_AUDIT the rtld shifts into what it calls profiling mode. In that profiling mode x8 is not properly saved. See there is no x8 being saved around line 185 in _dl_runtime_profile: stp x0, x1, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*0] cfi_rel_offset (x0, OFFSET_RG + DL_OFFSET_RG_X0 + 16*0 + 0) cfi_rel_offset (x1, OFFSET_RG + DL_OFFSET_RG_X0 + 16*0 + 8) stp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1] cfi_rel_offset (x2, OFFSET_RG + DL_OFFSET_RG_X0 + 16*1 + 0) cfi_rel_offset (x3, OFFSET_RG + DL_OFFSET_RG_X0 + 16*1 + 8) stp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2] cfi_rel_offset (x4, OFFSET_RG + DL_OFFSET_RG_X0 + 16*2 + 0) cfi_rel_offset (x5, OFFSET_RG + DL_OFFSET_RG_X0 + 16*2 + 8) stp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3] cfi_rel_offset (x6, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 0) cfi_rel_offset (x7, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 8) Yeah you got the Parameter and result registers but where is the Indirect result location register? After that block of code, you immediately start saving the d registers. When you start looking at the code in _dl_runtime_profile it looks like it has been copied from a previous version of the processor where there were no Q registers in the V register group. In other words the V registers were only uint64_t's not uint128_t's. x29 ends up being loaded with a pointer to a struct of type la_aarch64_regs /* Registers for entry into PLT on AArch64. */ typedef struct La_aarch64_regs { uint64_t lr_xreg[8]; uint64_t lr_dreg[8]; uint64_t lr_sp; uint64_t lr_lr; } La_aarch64_regs; This doesn't include space for x8 the indirect result register. Nor does it have space for the full sized v registers. Furthermore when you look at line 136 where it describes the stack frame layout Stack frame layout: [sp, #...] lr [sp, #...] &PLTGOT[n] [sp, #96] La_aarch64_regs [sp, #48] La_aarch64_retval [sp, #40] frame size return from pltenter [sp, #32] dl_profile_call saved x1 [sp, #24] dl_profile_call saved x0 [sp, #16] t1 [sp, #0] x29, lr <- x29 what _dl_runtime_resolve actually pushes on the stack the La_aarch64_regs part doesn't match the what _dl_runtime_profile is expecting. The size of the structure is quite different. To be able to support those the structure would have to be: /* Registers for entry into PLT on AArch64. */ typedef struct La_aarch64_regs { uint64_t lr_xreg[10]; __uint128_t lr_vreg[8]; /* or uint64_t lr_vreg[16]; */ uint64_t lr_sp; uint64_t lr_lr; } La_aarch64_regs; Version-Release number of selected component (if applicable): All - this bug has been there since the beginning of the aarch64 port How reproducible: Always Steps to Reproduce: There are two reproducers in the uploaded tar.gz which I got for the original reporter. One is very simple and should work but variations in the compiler can change how the structure is returned and cause that reproducer not to work. The other reproducer is a much bigger bit of code which always forces the use of x8 but it is much more code. Additional info: This bug report was provided by developers at Rice University Jonathon Anderson, John Mellor-Crummey, Xiaozhu Meng, Mark Krentel working on behalf of LLNL