[RFC PATCH 9/9] s390: Initial support to generate .sframe from CFI directives in assembler
Jens Remus
jremus@linux.ibm.com
Tue Apr 9 15:07:08 GMT 2024
Hello Indu,
thank you for reviewing my series and for your valuable feedback! Sorry
for the long delay in getting back to you.
I'll separate the non-s390 and s390-specific patches as you suggested
and will send a V3 shortly.
Am 29.02.2024 um 08:01 schrieb Indu Bhagat:
> On 2/23/24 09:08, Jens Remus wrote:
>> This introduces initial support to generate .sframe from CFI directives
>> in assembler on s390x. Due to the following SFrame limitations it is
>> incomplete and does not work for most real-world applications (i.e.
>> generation of SFrame FDE is skipped):
>>
>
> Hi Jens,
>
> I am curious to know more about the use-case that triggered the interest
> in exploring SFrame for s390x. It will be helpful if you can elaborate
> a bit.
We are evaluating whether SFrame could be used on s390x for Linux Kernel
perf unwinding, similar as it is being worked on for x86-64.
> My comments are scattered inline, but let me state some high-level
> comments here:
>
> - As I previously mentioned, we cannot go down the route of tracking all
> callee-saved registers in SFrame. This will significantly bloat up the
> information. SFrame (like ORC) is designed to provide the exact offsets
> for stacktracing, which is what enables it to support fast stacktracing.
>
> - In general, SFrame format relies heavily on ABI and is meant to cater
> to needs of ABI compliant code. If some information is recoverable
> unambiguously from the ABI, SFrame does not encode it. This manifests
> itself in what you already observed:
>
> + SFrame assumes RA is either on stack or a fixed/single designated
> register.
> + SFrame assumes FP is either on stack or a fixed/single designated
> register.
> + SFrame assumes CFA tracking is SP or FP based.
>
> Since the s390x ABI is flexible on each one of the above aspects,
> SFrame cannot be used easily.
>
>> - SFrame FP/RA tracking assumes the register contents to be saved on
>> the stack (i.e. .cfi_offset). It does not support FP/RA register
>> contents being saved in other registers (i.e. .cfi_register). GCC
>> on s390x can be observed to save the FP/RA register contents in
>> other registers (e.g. floating-point registers).
>>
>
> Is it possible to limit the choice of registers which the compiler uses
> to save the FP ? If there is one more register to track (apart from
> CFA, RA, FP), we can see if an additional column can be accommodated in
> the format representation.
The s390x ABI does not specify any set of register number(s) to be used
to save the FP/RA. Therefore this could not be reduced down to a single
additional register.
Meanwhile we have determined that GCC on s390x does only do so for
leaf-functions. That are functions, which do not call any other
functions. Therefore they can only ever occur as topmost frame during
unwinding.
Therefore it would be sufficient if SFrame could represent FP/RA in
another register on s390x, as it would not need to be able to track
those other registers at all. An unwinder would have access to all of
the registers of the topmost frame.
I will send an experimantal patch for that in my split-off s390x patch
series.
> Is it possible to maintain the role "return address" for the register
> r14 at all times ?
Similar if GCC on s390x would restore the RA from the stack into a
register other than the RA-register r14 for the purpose of returning to
the caller it would not perform any further function calls. That is this
also only ever can occur in the topmost frame, so that an unwinder would
have all registers available.
> What I am driving to is: If there is some degree of freedom here, is it
> possible to create a compiler option of, say -msframe, so that the
> generated code is more amenable to easier backtracing using SFrame format.
We need to review the performance impact of any optimization
restrictions such a SFrame-specific compiler option would have and how
that would compare to the alternative to use -mbackchain for unwinding
in Linux Kernel perf.
In general it would be preferable not to impose additional restrictions
wherever possible.
>> - SFrame assumes a static RA register number. While the s390x ELF ABI
>> [1] specifies register 14 to contain the return address on entry,
>> GCC can be observed to copy the return address to another register
>> and may even use that to return. Additionally glibc on s390x has
>> two functions (mcount and fentry) that specify register 0 to hold
>> the return address. This would either require a change to glibc (if
>> possible) or SFrame support to track the RA register number specified
>> in CFI directive .cfi_return_column.
>>
>> - SFrame assumes a static FP register number. The s390x ELF ABI [1]
>> does not specify any FP register number. GCC and clang on s390x
>> usually use register 11 as frame pointer. GCC on s390x can also be
>> observed to use register 14 (e.g. binutils and glibc builds). glibc
>> on s390x contains code that uses register 12 for the frame pointer.
>> This would require SFrame support to track the SP/FP register number
>> specified in CFI directives .cfi_def_cfa and .cfi_def_cfa_register.
>>
>> - SFrame does not support CFI directive .cfi_val_offset. glibc on s390x
>> has two functions (mcount and fentry), that make use of that CFI
>> directive. This would either require a change in glibc (if possible)
>> or SFrame to support CFI directive .cfi_val_offset.
>>
>> This s390x support is largely based on the AArch64 support from commit
>> b52c4ee46657 ("gas: generate .sframe from CFI directives").
>>
>> SFrame ABI/arch identifier SFRAME_ABI_S390_ENDIAN_BIG is introduced
>> for s390 and added to the SFrame specification.
>>
>> According to the s390x ELF ABI [1] the following calling conventions
>> are used on s390x architecture:
>> - Register 15 is used as stack pointer (SP). The CFA is initially
>> located at offset +160 from the SP.
>> - Register 14 is used as return address (RA).
>> - There is no dedicated frame pointer. GCC and LLVM currently use
>> register 11 as frame pointer.
>
> Can FP be pinned to r11 in the hypothetical option of -msframe ?
We determined that GCC only chooses another register than r11 (e.g.
RA-register r14) as temporary frame pointer in the stack protector in
the function prologue. Again a case that only can occur when in the
topmost stack frame.
It should be possible to eliminate this and use the FP-register r11 instead.
Another option would be to extend SFrame to track the CFA base register
number on s390x without tracking that register contents. Since this
special case would only occur on the topmost frame an unwinder would
have access to all registers.
>> The s390x ELF ABI [1] standard stack frame layout has the following
>> conventions:
>> - The return address (RA, r15) may be saved at offset -40 from the
>> CFA. Unlike x86 AMD64 architecture it is not necessarily saved on
>> the stack. Also compilers may use a non-standard stack frame layout,
>> such as but not limited to GCC with option -mpacked-stack.
>> Therefore SFrame RA tracking is used.
>
> If SFrame RA tracking is enabled, this allows:
> - the return address to be in register (RA, r14), or
> - at an offset from CFA
>
> Does the non-standard stack layout affect the identification of RA as
> CFA+offset ? If not, the remaining thing to worry about here (wrt SFrame
> usage) is if RA is saved in another register.
No, that is not affected. I just wanted to explain that we cannot assume
a fixed offset on the stack, if it is saved on the stack. But that is
already taken care by SFrame.
> I see that the s390x does define r14 as return register. But adds
> "Except at function entry, no special role
> is assigned to r14.", which bring the complication for SFrame.
The RA is restored from the stack into another register than r14 only to
perform the return.
Therefore we are wondering whether SFrame could be extended on s390x to
track the FP/RA being saved in another register in leaf functions and
the RA being saved in another register when being used to return to the
caller. Both cases only occur when in the topmost stack frame.
An idea would be to use one unused bits of the FP/RA offset from CFA to
identify whether it is actually an offset or a DWARF register number. On
s390x the CFA offsets are always a multiple of -8. Unless SFrame would
make use of the DWARF data alignment factor to encode the offsets in the
future, this would enable to use the lowest-significant bit (LSB) for
that purpose. DWARF register numbers could then be encoded as ((regno <<
1) | 1) and offsets as (offset).
One hopefully minor issue with that approach would be that it would
probably be impossible to distinguish between our compiler generated
cases and any handwritten assembler code and CFI cases, which could be
impossible to represent with SFrame. That is generation of SFrame
information might not be able to warn that the resulting information may
be insufficient during unwinding. Only during unwinding the unwinder
would be able to detect this and throw an error.
> But at least for the packages that you have tested, we do not see
> occurrences of "skipping SFrame FDE due to .cfi_register specifying RA
> register (r14)", which is good to see.
According to my colleagues GCC can be observed using another register
than r14 to restore the return address from the stack and return to the
caller. Unless there is a .cfi_register that should not bother us, as
the RA would then still be on the stack. I still do not have an example
for this to experiment with.
>> - The potential frame pointer (FP, r11) may be saved at offset -72
>> from the CFA. It is not necessarily saved on the stack and compilers
>> may use a non-standard stack frame layout (see above).
>> Therefore SFrame FP tracking is used.
>>
>> Support for SFrame is only enabled for z/Architecture (s390x) with
>> 64-bit architecture mode. It is disabled for 32-bit architecture mode
>> and ESA/390 (s390).
>>
>> Add s390-specific SFrame test cases. As for the error test cases add
>> ones that use a non-default frame pointer (FP) register number and ones
>> that save the return address (RA) in a non-default RA register number.
>>
>> [1] ELF ABI s390x Supplement:
>> https://github.com/IBM/s390x-abi/releases
>> [2] ELF ABI s390 Supplement:
>> https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_s390.html
>> https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_s390.pdf
>>
>> include/
>> * sframe.h (SFRAME_ABI_S390_ENDIAN_BIG): Define SFrame ABI/arch
>> identifier for s390x. Reference s390x architecture in comments.
>>
>> libsframe/
>> * doc/sframe-spec.texi: Document SFrame ABI/arch identifier for
>> s390x and reference s390x architecture.
>>
>> gas/
>> * config/tc-s390.h: s390x support to generate .sframe from CFI
>> directives in assembler.
>> * config/tc-s390.c: Likewise.
>> * gen-sframe.c: Reference s390x architecture in comments.
>>
>> gas/testsuite/
>> * gas/cfi-sframe/cfi-sframe.exp: Enable common SFrame test cases
>> for s390x. Add s390-specific SFrame test cases.
>> * gas/cfi-sframe/cfi-sframe-s390-1.s: New s390-specific SFrame
>> test case.
>> * gas/cfi-sframe/cfi-sframe-s390-1.d: Likewise.
>> * gas/cfi-sframe/cfi-sframe-s390-2.s: Likewise.
>> * gas/cfi-sframe/cfi-sframe-s390-2.d: Likewise.
>> * gas/cfi-sframe/cfi-sframe-s390-err-1.s: New s390-specific
>> SFrame error test case that uses a non-default register number
>> for the frame pointer.
>> * gas/cfi-sframe/cfi-sframe-s390-err-1.l: Likewise.
>> * gas/cfi-sframe/cfi-sframe-s390-err-2.s: Likewise.
>> * gas/cfi-sframe/cfi-sframe-s390-err-2.l: Likewise.
>> * gas/cfi-sframe/cfi-sframe-s390-err-3.s: New s390-specific
>> SFrame error test case that uses a non-default register number
>> to save the return address.
>> * gas/cfi-sframe/cfi-sframe-s390-err-3.l: Likewise.
>> * gas/cfi-sframe/cfi-sframe-s390-err-4.s: New s390-specific
>> SFrame error test case that used a non-default return column
>> (return address) register number.
>> * gas/cfi-sframe/cfi-sframe-s390-err-4.l: Likewise.
>>
>> Reviewed-by: Andreas Krebbel <krebbel@linux.ibm.com>
>> Signed-off-by: Jens Remus <jremus@linux.ibm.com>
>> ---
>>
>> Notes (jremus):
>> The SFrame support for s390x provided by this patch still has
>> some open
>> issues, which need to be addressed. Any ideas or assistance to
>> overcome
>> the SFrame limitations listed in the commit description are very
>> welcome.
>> Note that unlike the AArch64 and x68 AMD64 implementation this s390x
>> implementation does statically initialize the architecture-dependent
>> variables s390_sframe_cfa_{sp|fp|ra}_reg, which are referenced by
>> the
>> SFrame interface macros SFRAME_CFA_{SP|FP|RA}_REG. The other
>> implementations do initialize them in md_begin. I verified that all
>> occurrences of the macros SFRAME_CFA_{SP|FP|RA}_REG are in
>> context of
>> test/read and never write.
>> Is there a reason to define the SFrame interface macros
>> SFRAME_CFA_{SP|FP|RA}_REG to variables? For instance should the
>> SFrame
>> common code be able to modify these? Currently I do not see how it
>> would work well, if the architecture-specific code would change
>> these
>> at run time. Similar for the predicate sframe_ra_tracking_p.
>
> No, there is no scenario so far where these variables will be modified.
> So the code can be simplified, yes.
Great!
> IOW, you are right about that : SFrame format, in its current V2
> representation, does not support the flexibility of changing
> SFRAME_CFA_{SP|FP|RA}_REG dynamically across FDEs.
>
>> Example #1: Warnings when compiling binutils with SFrame
>> $ ../configure CC="gcc -B$HOME/temp/binutils-gcc/bin
>> -Wa,--gsframe" \
>> CXX="g++ -B$HOME/temp/binutils-gcc/bin -Wa,--gsframe" \
>> --prefix=$HOME/temp/binutils-sframe
>> $ make -j $(nprocs) 2>&1 | tee make.log
>> $ grep --only-matching "Warning:.*" make.log | sort | uniq -c
>> 1 Warning: skipping SFrame FDE due to .cfi_def_cfa
>> specifying CFA
>> base register other than SP or FP (14 instead of 15 or 11)
>> 205 Warning: skipping SFrame FDE due to .cfi_register
>> specifying FP
>> register (11)
>> 56 Warning: skipping SFrame FDE due to .cfi_register
>> specifying SP
>> register (15)
>> Example #2: Warnings when compiling glibc with SFrame
>> $ ../configure CC="gcc -B$HOME/temp/binutils-sframe/bin
>> -Wa,--gsframe" \
>> CXX="g++ -B$HOME/temp/binutils-sframe/bin -Wa,--gsframe" \
>> --prefix=$HOME/temp/glibc-sframe
>> $ make -j $(nprocs) 2>&1 | tee make.log
>> $ grep --only-matching "Warning:.*" make.log | sort | uniq -c
>> 2 Warning: skipping SFrame FDE due to .cfi_def_cfa_register
>> specifying CFA base register other than SP or FP (12
>> instead of
>> 15 or 11)
>> 7 Warning: skipping SFrame FDE due to .cfi_def_cfa
>> specifying CFA
>> base register other than SP or FP (14 instead of 15 or 11)
>> 225 Warning: skipping SFrame FDE due to .cfi_register
>> specifying FP
>> register (11)
>> 187 Warning: skipping SFrame FDE due to .cfi_register
>> specifying SP
>> register (15)
>> 2 Warning: skipping SFrame FDE due to DWARF CFI op CFI_escape
>> (0x103)
>> 2 Warning: skipping SFrame FDE due to non-default DWARF return
>> column (0 instead of 14)
>> .cfi_def_cfa 14, ... originates from GCC generated code, which uses
>> register %r14 as frame pointer (FP).
>> .cfi_def_cfa_register 12 originates from hand-written assembler code
>> sysdeps/s390/s390-64/dl-trampoline.S, which uses register %r12 as
>> frame
>> pointer (FP).
We can possibly change the handwritten assembler code to use the
FP-register r11 as frame pointer instead of r12.
>> .cfi_return_column 0 and .cfi_escape both originate from
>> hand-written
>> assembler code sysdeps/s390/s390-64/s390x-mcount.S, which is
>> compiled
>> twice (with and without -DSHARED).
It does not look like we can remove the use of r0 as return column in
this special case. But since this only affects mcount/fentry, which are
used for profiling we can potentially live with that, as long as SFrame
would only be used for Linux Kernel perf unwinding.
>> - .cfi_escape 0x14, 0x15, 0x14: Is used to code
>> ".cfi_val_offset r15, -160", which would require binutils 2.28+
>> (glibc
>> currently supports a minimum binutils 2.25). This CFI directive
>> states
>> that the contents of register %r15 are CFA-160 (not to be confused
>> with saved at CFA-160).
>
> This indirection property is not representable in SFrame format
> currently. BTW, x86, AArch64 also suffer with the negative consequences
> of this limitation, but in practice I have run into very limited
> occurrences of this.
I meanwhile believe SFrame does not need to care about .cfi_register,
.cfi_offset, and .cfi_val_offset definitions involving the SP register.
The reason is that SFrame does not track the SP register contents. It
only tracks the CFA offset from SP/FP register. As a result the SP
contents are restored using the CFA offset from SP/FP of the current FRE
and the first FRE of a FDE, if I am not mistaken.
>> - .cfi_return_column 0: The following comment explains why
>> register %r0
>> contains the return address:
>> "The _mcount implementation now has to call __mcount_internal
>> with the
>> address of .LP0 as first parameter and the return address as
>> second
>> parameter. &.LP0 was loaded to %r1 and the return address is in
>> %r14.
>> _mcount may not modify any register.
>> Alternatively, at the start of each function __fentry__ is called
>> using a single
>> brasl 0,__fentry__
>> instruction. In this case %r0 points to the callee, and %r14
>> points
>> to the caller. These values need to be passed to
>> __mcount_internal
>> using the same sequence as for _mcount, so the code below is
>> shared
>> between both functions.
>> The only major difference is that __fentry__ cannot return through
>> %r0, in which the return address is located, because br
>> instruction
>> is a no-op with this register. Therefore %r1, which is
>> clobbered by
>> the PLT anyway, is used."
>> Excerpt of the relevant glibc source code:
>> .globl C_SYMBOL_NAME(MCOUNT_SYMBOL)
>> .type C_SYMBOL_NAME(MCOUNT_SYMBOL), @function
>> cfi_startproc
>> .align ALIGNARG(4)
>> C_LABEL(MCOUNT_SYMBOL)
>> cfi_return_column (glue(r, MCOUNT_CALLEE_REG))
>> /* Save the caller-clobbered registers. */
>> aghi %r15,-224
>> cfi_adjust_cfa_offset (224)
>> /* binutils 2.28+: .cfi_val_offset r15, -160 */
>> .cfi_escape \
>> /* DW_CFA_val_offset */ 0x14, \
>> /* r15 */ 0x0f, \
>> /* scaled offset */ 0x14
>> stmg %r14,%r5,160(%r15)
>> cfi_offset (r14, -224)
>> cfi_offset (r0, -224+16)
>> ...
>>
>
> Thanks for your clear notes. They were very helpful.
>
> Indu
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303) and z/VSE Support
+49-7031-16-1128 Office
jremus@de.ibm.com
IBM
IBM Deutschland Research & Development GmbH; Vorsitzender des
Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der
Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
More information about the Binutils
mailing list