Issue originally reported to gdb mailing list: http://sourceware.org/ml/gdb/2008-11/msg00030.html Here is the conversation so far: ---------- From: Paul Pluzhnikov <ppluzhnikov@google.com> Date: Tue, Nov 4, 2008 at 3:57 PM To: gdb@sourceware.org Cc: ppluzhnikov@google.com, dan@codesourcery.com Greetings, While debugging something else, I noticed extreme slowdown of 'where' command for a deeply recursive stack trace, and was able to reproduce it on a small example (below). AFAICT, this is related to this change: 2008-04-30 Daniel Jacobowitz <dan@codesourcery.com> * valops.c (value_fetch_lazy): Handle both memory and register lvals. Here is the timing info (doubling size quadruples time): $ time gdb -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex where -ex quit --args ./deep 2500 > /dev/null real 0m0.617s user 0m0.575s sys 0m0.034s $ time gdb -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex where -ex quit --args ./deep 5000 > /dev/null real 0m2.121s user 0m2.049s sys 0m0.048s $ time gdb -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex where -ex quit --args ./deep 10000 > /dev/null real 0m9.140s user 0m8.991s sys 0m0.077s $ time gdb -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex where -ex quit --args ./deep 20000 > /dev/null real 0m47.940s user 0m47.543s sys 0m0.182s I used current CVS head: GNU gdb (GDB) 6.8.50.20081104-cvs ... The results are also rather interesting: running 'where' by hand shows that GDB prints all levels except the last one rather fast, but then 'freezes' for a long time printing the last level. This can be seen in dramatic difference between these timings: $ time gdb64-cvs -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex 'where 10004' -ex quit --args ./deep 10000 > /dev/null real 0m0.438s user 0m0.353s sys 0m0.073s $ time gdb64-cvs -nx -ex 'set height 0' -ex 'set confirm off' -ex run -ex 'where 10005' -ex quit --args ./deep 10000 > /dev/null real 0m9.083s user 0m8.953s sys 0m0.067s I looked at "what's different WRT the last frame" in valops.c, but couldn't spot anything obvious :-( Finally, here is the test case: --- cut --- deep.c --- // Compile with GCC-4.3.1: "gcc -g deep.c -o deep -O2". // Note: -O2 is required to see the bad behavior. // // The test case itself is nonsense, crafted to prevent // GCC from optimizing too much. #include <stdlib.h> struct Node { struct Node *l; struct Node *r; int x; }; int limit; int visit (struct Node *n) { int ll, rr; if (!n) return 0; if (limit < n->x) abort (); n->x += 1; ll = visit (n->l); rr = visit (n->r); return ll + rr + n->x; } int main (int argc, char *argv[]) { struct Node n; limit = 1000; if (1 < argc) limit = atoi (argv[1]); n.x = 0; n.l = &n; n.r = &n; return visit (&n); } --- cut --- deep.c --- -- Paul Pluzhnikov ---------- From: Daniel Jacobowitz <dan@codesourcery.com> Date: Mon, Nov 10, 2008 at 6:36 AM To: Paul Pluzhnikov <ppluzhnikov@google.com> Hi Paul, I wanted to let you know that I did see this. I'm just ages behind on email again :-( Does "set debug frame 1" show anything interesting? -- Daniel Jacobowitz CodeSourcery ---------- From: Paul Pluzhnikov <ppluzhnikov@google.com> Date: Mon, Nov 10, 2008 at 9:33 AM To: Daniel Jacobowitz <dan@codesourcery.com> It does show something (which we already know): that unwinding the last frame takes more work than all previous frames combined. Attached is the gdb.txt from: set logging on run 10 set debug frame 1 where -- Paul Pluzhnikov ---------- From: Daniel Jacobowitz <dan@codesourcery.com> Date: Mon, Nov 10, 2008 at 10:03 AM To: Paul Pluzhnikov <ppluzhnikov@google.com> { get_prev_frame_1 (this_frame=8) -> {level=9,type=NORMAL_FRAME,unwind= argv=0x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a00x69d5a0,pc=0x400341,id={stack=0x7fffffffe5f0,code=0x400310,!special},func=0x400310} // cached Hum... I guess stdout and stderr are mixed, but I still don't understand that. It looks like because of where the arguments to main live, we are trying to unwind %rsi, without realizing that it is actually not call-saved (I think? Check me on that, I don't know the x86-64 ABI). So we're walking all the way up the call stack trying to locate it. We determine that in frame 14 it comes from frame 13, in frame 13 it comes from frame 12, and so on. Maybe having frame_find_by_id (or whatever is calling that long series of frame_id_inner) do something smarter for searching already cached frames would fix this? It's clearly quadratic as-is. -- Daniel Jacobowitz CodeSourcery ---------- From: Tom Tromey <tromey@redhat.com> Date: Sat, Jan 3, 2009 at 11:20 AM To: Paul Pluzhnikov <ppluzhnikov@google.com> Cc: gdb@sourceware.org >>>>> "Paul" == Paul Pluzhnikov <ppluzhnikov@google.com> writes: Paul> While debugging something else, I noticed extreme slowdown of Paul> 'where' command for a deeply recursive stack trace, and was able Paul> to reproduce it on a small example (below). FWIW, I'm able to reproduce this as well. Paul> The results are also rather interesting: running 'where' by hand Paul> shows that GDB prints all levels except the last one rather fast, Paul> but then 'freezes' for a long time printing the last level. I wasn't able to reproduce this. That is, I ran the test a number of times, and sometimes I saw this effect, but sometimes I did not. Could you file this in bugzilla? Tom ----------
Created attachment 3636 [details] debug trace referenced above
Just for the record: I can also reproduce on my x86-linux laptop. I started at 1000, and each thousand I added almost multiplied the time by 2. I then tried with a debugger based on 6.8, just to see what type of behavior we get, and it seems linear: Each thousand I added added an extra ~120ms (I tried from 1000 to 10_000).
The slowdown comes from printing the frame corresponding to "main", because argv is stored in %rsi. The following transcript shows a more isolated way of reproducing the issue: (gdb) set print frame-arguments none (gdb) run 5000 Starting program: /[...]/deep 5000 Program received signal SIGABRT, Aborted. 0x00007ffff7a9efb5 in raise () from /lib/libc.so.6 (gdb) frame 5004 #5004 0x00000000004005a2 in main (argc=..., argv=...) at deep.c:34 34 return visit (&n); (gdb) p /x $rsi $1 = 0x748b The "p /x $rsi" take a long time, while the rest is pretty much instantaneous. That's as much as I can do for today, but look at it some more tomorrow.
Fixed by the following patch: http://www.sourceware.org/ml/gdb-patches/2009-09/msg00251.html.