Bug 29738 - Arm M-profile dwarf2 unwinder performance suffers from exponential growth
Summary: Arm M-profile dwarf2 unwinder performance suffers from exponential growth
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: tdep (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 13.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-31 18:18 UTC by tomas.vanek
Modified: 2023-02-02 03:43 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2022-11-02 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tomas.vanek 2022-10-31 18:18:15 UTC
Commit a6e4a48c02acf29d6bec2ff63fc909b57cf4bc78
introduced synchronization sp with other xsp_x registers.
Unfortunately the code reads all four xsp_x registers by get_frame_register_unsigned(). dwarf2 unwinder just passes this
request to the next inner frame. If the next inner frame is also
serviced by dwarf2, each of 4 xsp_x requests generates another 4 requests.

So e.g. 'backtrace' command on the chain of dwarf2 frames (normal debugging)
needs
  4 ^ number_of_frame
calls of arm_dwarf2_prev_register() just for stack unwinding!!!
The number of calls gets to very insane high values very soon,
8th and 9th frame unwinding is noticeable slow and 10 and more frames
looks as if gdb hangs.

Empirically tested with a counter added to the beginning of
arm_dwarf2_prev_register():

'bt' frames   total arm_dwarf2_prev_register() calls
	2	42
	3	181
	4	828
	5	3909
	6	18652
	7	89271
	8	427608
	9	2048659
Comment 1 Luis Machado 2022-11-02 08:16:17 UTC
My understanding is that there is only one active SP at any point, and that is:

SP if there are not MSP*/PSP*
PSP or MSP if there are no MSP_*/PSP_*
PSP_S or PSP_NS or MSP_S or MSP_NS.

We should probably have code to figure out what is the expected stack pointer register before we go about fetching values. That should bring the number of requests down. Does that sound reasonable?
Comment 2 Luis Machado 2022-11-02 16:26:32 UTC
I'm marking this as require in GDB 13, as it seems important to fix.
Comment 3 Luis Machado 2022-11-03 08:20:11 UTC
I suppose we'd have the same issue with non-secure m-profiles, as they'd need to do 3 ^ number_of_frame calls as well, based on this code:

      else if (tdep->is_m)
        {
          CORE_ADDR sp
            = get_frame_register_unsigned (this_frame, ARM_SP_REGNUM);
          CORE_ADDR msp
            = get_frame_register_unsigned (this_frame,
                                           tdep->m_profile_msp_regnum);
          CORE_ADDR psp
            = get_frame_register_unsigned (this_frame,
                                           tdep->m_profile_psp_regnum);
Comment 4 tomas.vanek 2022-11-03 08:43:39 UTC
(In reply to Luis Machado from comment #3)
> I suppose we'd have the same issue with non-secure m-profiles, as they'd
> need to do 3 ^ number_of_frame calls as well, based on this code:
> 
>       else if (tdep->is_m)
>         {
>           CORE_ADDR sp
>             = get_frame_register_unsigned (this_frame, ARM_SP_REGNUM);
>           CORE_ADDR msp
>             = get_frame_register_unsigned (this_frame,
>                                            tdep->m_profile_msp_regnum);
>           CORE_ADDR psp
>             = get_frame_register_unsigned (this_frame,
>                                            tdep->m_profile_psp_regnum);

Yes, the problem exists on all m-profiles. I edited the bug summary.

As I wrote in the reply to Torbjorn email from 31th Oct (you should have received a CC, didn't you?), the exponential growth has the base "only" 2 for m-profile w/o the security extension, because ARM_SP_REGNUM is read from CFA in dwarf2 cache and does not generate new get_frame_register_unsigned() request.

Unfortunately each exponential base greater than 1 is strong enough to spoil the performance. If the debugged app has 16 consecutive dwarf2 frames the 'bt' command is for very very patient users only.
Comment 5 tomas.vanek 2022-11-04 11:04:29 UTC
(In reply to Luis Machado from comment #1)
> My understanding is that there is only one active SP at any point, ...
> 
> We should probably have code to figure out what is the expected stack
> pointer register before we go about fetching values. That should bring the
> number of requests down. Does that sound reasonable?

Let's have M-profile without sec ext.
To figure out the active stack pointer we need xPSR and CONTROL registers.
So we need to fetch 2 registers and the 3rd fetch will get the active stack
pointer. Provided that fetching xPSR and CONTROL does not generate additional requests we need approx n^2 + 2n calls get_frame_register_unsigned() where n is frame level. It's much better than exponential but not a great performance... And the code would be kind of balancing on the edge: if somebody adds fetching one more register we get into an exponential again.

Why not use a cache to ensure linear dependency to the frame level?

The prologue cache is occupied by dwarf2 unwinder.
We need small changes in dwarf2/frame.c code to allow arm_dwarf2_prev_register()
to allocate struct arm_prologue_cache and store the this_cache pointer to struct dwarf2_frame_cache as a secondary, cascaded cache. This would allow re-using the stack pointer related code in arm_make_prologue_cache(), arm_cache_switch_prev_sp() and arm_cache_set_active_sp_value()

Or alternatively modify dwarf2/frame.c to allow changing struct dwarf2_frame_state_reg on the fly from arm_dwarf2_prev_register().
arm_dwarf2_frame_init_reg() initializes
  reg->how = DWARF2_FRAME_REG_FN;
  reg->loc.fn = arm_dwarf2_prev_register;
The first arm_dwarf2_prev_register() call makes the decision which sp is active and sets
  reg->how = DWARF2_FRAME_REG_CFA;
for the active sp and
  reg->how = DWARF2_FRAME_REG_SAME_VALUE;
for an inactive sp.
All following get_frame_register_unsigned() will reuse the decision cached in reg->how and fetch the selected register directly.

What do you thing?
Comment 6 Torbjörn SVENSSON 2022-11-04 15:27:38 UTC
I think this series addresses the issues raised here...
https://sourceware.org/pipermail/gdb-patches/2022-November/193391.html

@Tomas, can you take a look at them and see if it resolves the issues you've seen?
Comment 7 Joel Brobecker 2023-02-02 03:43:28 UTC
Just to clarify the impact of this issue (prior to fixing), Luis kindly explained on gdb-patches:

> It affects 32-bit m-profile Arm targets that report the additional stack pointers.  This is either
> the org.gnu.gdb.arm.m-system feature or the org.gnu.gdb.arm.secext feature.
>
> I'm fairly sure these features are only reported by emulators and bare-metal targets.
>
> Now, a 64-bit gdb can debug 32-bit Arm as well, so you could have a 64-bit gdb running into this too
> if the target is a 32-bit m-profile Arm. But the target is still 32-bit Arm.
>
> This doesn't affect AArch64 at all.

The patches have now been pushed to both master:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d72ba177c85f2ad18d0dcabdd8844532c9acb819
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=2d36c9404e360126551fef20f3f79f5a56f6ad8b

... and the gdb-13-branch:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=2d36c9404e360126551fef20f3f79f5a56f6ad8b
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7944d457903af4ba966df2a48aa6a2582b823b55

Closing.