[RFA] bug in symtab.c:lookup_block_symbol()'s search method

Sat Sep 15 00:54:00 GMT 2001

On Fri, Sep 14, 2001 at 08:49:28PM +0300, Eli Zaretskii wrote:

> Even if the performance hit is significant, I fail to understand how
> can someone say the entire program is broken, or that it cannot be
> released.  Can we please get things back into their proportion?

I'll stand what I said earlier.

> > Why, MacOS X just happens to be one.  In some of our
> > libraries, there are a lot of these opaque struct pointers in use.
> > Looking up one of those variables requires gdb to sit-and-spin for
> > 5-10 seconds, before realizing that it has no definition.
> 
> On what machine?  Timing data is useless without saying on what kind
> of CPU and with what clock speed did you see that.

I re-ran the timings to verify on a 733 MHz G4 PPC Mac OS X machine.
My test program is called "SimpleText", it is 80k of C source, it is
a little notepad GUI program that is included with the Apple development
tools.  It links in two of our fundamental shared libraries.

As I mentioned earlier, our Carbon library makes heavy use of these
"opaque struct pointers" - in the Carbon header files you'll see
things like "typedef struct OpaqueWindowPtr *WindowPtr", and people
will pass variables of type "WindowPtr" around to various system
calls.

If you have a variable of this type, every time you examine it,
gdb will try to find a definition of struct OpaqueWindowPtr.

With the gdb 5.1 symtab.c, typing "p aWindowPtr" in SimpleText will
take 3.5 seconds to complete (as reported by 'maint time 1').

With the patches (or with gdb 5.0), typing "p aWindowPtr" in
SimpleText takes between 0.01 and 0.00 seconds (as reported by
'maint time 1' -- it's at the limits of maint time's granularity.
Hence my earlier characterization as "unmeasurable".)

I've already written this more than once, but I'll cover the
technical details again.  lookup_block_symbol is given a symbol
name to find, and a pointer to a block to search--the block has a
list of symbol names present there.  l_b_s() does a binary search
to get to the bottom range of possible matches of the symbol, then
uses a simple linear search to step over a few symbols.  Once it
is beyond the range of plausible matches, it exits out of the linear
search and returns a no-match result.

Dan's patch dropped the "exits out of linear search" -- now it
binary searches to the beginning of plausible ranges, and steps
through to the end of the block.  It's now an O(1/2*N) algorithm
for worst-case, i.e. non-matches.

More concretely.  With the current gdb 5.1 sources, in SimpleText,
running "p aWindowPtr" will cause gdb to loop 6,785,175 times in
that linear-search after the binary search.  With the patch I
submitted, or the gdb 5.0 sources, running "p aWindowPtr" will 
run this loop 4,390 times.

> Anyway, I don't consider 5-10 seconds such a long time.  We still have
> in GDB operations that take much more, and we don't consider it
> ``broken'' because of that.

It used to run in a hundredth of a second.  Now it's taking over
three seconds, on one of our fastest machines.  When you hit the
'next' button in a GUI and it seizes up for four - ten seconds,
and it didn't used to do that.  How else can it be described?  This
behavior is not an act of god, an inevitable consequence of making
gdb better, or an unavoidable tradeoff.  It's a mistake.

The argument against this this change is that there _might_ be a
language which has space chars at the beginning of a demangled
string.  lookup_block_symbol() is already very, very broken if
that's the case.  The initial binary search uses a combination of
first-letter comparisons and strcmp() -- the latter of which is
clearly out of the question for making unmangled comparisons.

> > If you're using GDB in under an IDE and you have a Locals window
> > open, and one of those locals is an opaque structure, whenever you
> > step into our out of that frame, you'll have this 5-10 second delay.
> 
> So display the hourglass for 10 seconds and be done with that.  No one
> will really notice, except you and me.  The world is full with good
> software that sometimes has 10-sec delays, to say nothing of bad
> software.

I strongly disagree.  A word processor that delays for two seconds
every time you add a new character is not acceptable.  A debugger
that can take ten seconds to complete a "next" is not acceptable.
If we're paging in a big shared library, there's nothing we can do
about it.  If we're doing a linear search instead of a binary seach,
there _is_ something we can do about it.

I'm not just making things up here.  Our IDE at Apple currently
has "next" performance of around 0.5 - 0.7 seconds -- and we hear
no end of complaints about how it is too slow.  There is a very
tangible speed at which users want a GUI debugger to respond to a 
simple Next, and we're not meeting that yet.  If 0.7 seconds is
too slow, you can sure bet that 5 seconds will gain their ire.

Jason