9711 – Quadratic slowdown in backtrace for deep stack traces

Bug 9711 - Quadratic slowdown in backtrace for deep stack traces

Summary: Quadratic slowdown in backtrace for deep stack traces

Status:	RESOLVED FIXED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	backtrace (show other bugs)
Version:	6.8

Importance:	P2 normal
Target Milestone:	6.8
Assignee:	Joel Brobecker

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-01-05 18:08 UTC by Paul Pluzhnikov
Modified:	2009-09-10 04:51 UTC (History)
CC List:	2 users (show)

See Also:
Host:	x86_64-unknown-linux-gnu
Target:	x86_64-unknown-linux-gnu
Build:	x86_64-unknown-linux-gnu
Last reconfirmed:

Attachments
debug trace referenced above (3.50 KB, text/plain) 2009-01-05 18:10 UTC, Paul Pluzhnikov	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Paul Pluzhnikov 2009-01-05 18:08:12 UTC

Issue originally reported to gdb mailing list:
http://sourceware.org/ml/gdb/2008-11/msg00030.html

Here is the conversation so far:

----------
From: Paul Pluzhnikov <ppluzhnikov@google.com>
Date: Tue, Nov 4, 2008 at 3:57 PM
To: gdb@sourceware.org
Cc: ppluzhnikov@google.com, dan@codesourcery.com

Greetings,

While debugging something else, I noticed extreme slowdown of
'where' command for a deeply recursive stack trace, and was able
to reproduce it on a small example (below). AFAICT, this is related
to this change:

 2008-04-30  Daniel Jacobowitz  <dan@codesourcery.com>

       * valops.c (value_fetch_lazy): Handle both memory and register
       lvals.

Here is the timing info (doubling size quadruples time):

$ time gdb -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex where -ex
quit --args ./deep  2500  > /dev/null

real    0m0.617s
user    0m0.575s
sys     0m0.034s

$ time gdb -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex where -ex
quit --args ./deep  5000  > /dev/null

real    0m2.121s
user    0m2.049s
sys     0m0.048s

$ time gdb -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex where -ex
quit --args ./deep  10000  > /dev/null

real    0m9.140s
user    0m8.991s
sys     0m0.077s

$ time gdb -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex where -ex
quit --args ./deep  20000  > /dev/null

real    0m47.940s
user    0m47.543s
sys     0m0.182s

I used current CVS head: GNU gdb (GDB) 6.8.50.20081104-cvs ...

The results are also rather interesting: running 'where' by hand
shows that GDB prints all levels except the last one rather fast,
but then 'freezes' for a long time printing the last level.

This can be seen in dramatic difference between these timings:

$ time gdb64-cvs -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex 'where
10004' -ex quit --args ./deep  10000  > /dev/null

real    0m0.438s
user    0m0.353s
sys     0m0.073s

$ time gdb64-cvs -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex 'where
10005' -ex quit --args ./deep  10000  > /dev/null

real    0m9.083s
user    0m8.953s
sys     0m0.067s

I looked at "what's different WRT the last frame" in valops.c,
but couldn't spot anything obvious :-(

Finally, here is the test case:

--- cut --- deep.c ---
// Compile with GCC-4.3.1: "gcc -g deep.c -o deep -O2".
// Note: -O2 is required to see the bad behavior.
//
// The test case itself is nonsense, crafted to prevent
// GCC from optimizing too much.

#include <stdlib.h>

struct Node
{
 struct Node *l;
 struct Node *r;
 int x;
};

int limit;
int visit (struct Node *n)
{
 int ll, rr;
 if (!n) return 0;
 if (limit < n->x)
   abort ();
 n->x += 1;
 ll = visit (n->l);
 rr = visit (n->r);
 return ll + rr + n->x;
}

int main (int argc, char *argv[])
{
 struct Node n;

 limit = 1000;
 if (1 < argc)
   limit = atoi (argv[1]);

 n.x = 0;
 n.l = &n;
 n.r = &n;
 return visit (&n);
}
--- cut --- deep.c ---

--
Paul Pluzhnikov

----------
From: Daniel Jacobowitz <dan@codesourcery.com>
Date: Mon, Nov 10, 2008 at 6:36 AM
To: Paul Pluzhnikov <ppluzhnikov@google.com>

Hi Paul,

I wanted to let you know that I did see this.  I'm just ages behind on
email again :-(
Does "set debug frame 1" show anything interesting?

--
Daniel Jacobowitz
CodeSourcery

----------
From: Paul Pluzhnikov <ppluzhnikov@google.com>
Date: Mon, Nov 10, 2008 at 9:33 AM
To: Daniel Jacobowitz <dan@codesourcery.com>

It does show something (which we already know): that unwinding the
last frame takes more work than all previous frames combined.

Attached is the gdb.txt from:

set logging on
run 10
set debug frame 1
where

--
Paul Pluzhnikov

----------
From: Daniel Jacobowitz <dan@codesourcery.com>
Date: Mon, Nov 10, 2008 at 10:03 AM
To: Paul Pluzhnikov <ppluzhnikov@google.com>

{ get_prev_frame_1 (this_frame=8) ->
{level=9,type=NORMAL_FRAME,unwind=

argv=0x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a0,pc=0x400341,id={stack=0x7fffffffe5f0,code=0x400310,!special},func=0x400310}
   // cached

Hum... I guess stdout and stderr are mixed, but I still don't
understand that.

It looks like because of where the arguments to main live, we are
trying to unwind %rsi, without realizing that it is actually not
call-saved (I think?  Check me on that, I don't know the x86-64 ABI).
So we're walking all the way up the call stack trying to locate it.
We determine that in frame 14 it comes from frame 13, in frame 13 it
comes from frame 12, and so on.

Maybe having frame_find_by_id (or whatever is calling that long series
of frame_id_inner) do something smarter for searching already cached
frames would fix this?  It's clearly quadratic as-is.

--
Daniel Jacobowitz
CodeSourcery

----------
From: Tom Tromey <tromey@redhat.com>
Date: Sat, Jan 3, 2009 at 11:20 AM
To: Paul Pluzhnikov <ppluzhnikov@google.com>
Cc: gdb@sourceware.org

>>>>> "Paul" == Paul Pluzhnikov <ppluzhnikov@google.com> writes:

Paul> While debugging something else, I noticed extreme slowdown of
Paul> 'where' command for a deeply recursive stack trace, and was able
Paul> to reproduce it on a small example (below).

FWIW, I'm able to reproduce this as well.

Paul> The results are also rather interesting: running 'where' by hand
Paul> shows that GDB prints all levels except the last one rather fast,
Paul> but then 'freezes' for a long time printing the last level.

I wasn't able to reproduce this.  That is, I ran the test a number of
times, and sometimes I saw this effect, but sometimes I did not.

Could you file this in bugzilla?

Tom
----------

Comment 1 Paul Pluzhnikov 2009-01-05 18:10:01 UTC

Created attachment 3636 [details]
debug trace referenced above

Comment 2 Joel Brobecker 2009-01-16 02:59:28 UTC

Just for the record:

I can also reproduce on my x86-linux laptop. I started at 1000, and each
thousand I added almost multiplied the time by 2.

I then tried with a debugger based on 6.8, just to see what type of behavior we
get, and it seems linear: Each thousand I added added an extra ~120ms (I tried
from 1000 to 10_000).

Comment 3 Joel Brobecker 2009-08-31 23:13:50 UTC

The slowdown comes from printing the frame corresponding to "main", because argv
is stored in %rsi. The following transcript shows a more isolated way of
reproducing the issue:

(gdb) set print frame-arguments none
(gdb) run 5000
Starting program: /[...]/deep 5000

Program received signal SIGABRT, Aborted.
0x00007ffff7a9efb5 in raise () from /lib/libc.so.6
(gdb) frame 5004
#5004 0x00000000004005a2 in main (argc=..., argv=...) at deep.c:34
34       return visit (&n);
(gdb) p /x $rsi
$1 = 0x748b

The "p /x $rsi" take a long time, while the rest is pretty much instantaneous.
That's as much as I can do for today, but look at it some more tomorrow.

Comment 4 Joel Brobecker 2009-09-10 04:51:33 UTC

Fixed by the following patch:
http://www.sourceware.org/ml/gdb-patches/2009-09/msg00251.html.