This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [RFC] Move the frame zero PC check earlier
I won't reply paragraph-by-paragraph; there's getting to be a lot of
paragraphs. Please, let me know if I've cut something that I shouldn't
have.
On Sat, May 13, 2006 at 06:42:44PM +0200, Mark Kettenis wrote:
> > Explanatory output ("why did that backtrace stop?") is available in
> > "set debug frame 1". If you think it's routinely useful, then we can
> > make it available in some prettier form, perhaps in "info frame" for
> > the outermost frame.
>
> If we can reliably tell that a frame is the outermost frame, we might
> indeed print that as part of "info frame".
That's just about the opposite of what I'm suggesting. I think that
"the stack ended jaggedly" might be useful in "info frame".
> > Also, I don't think that "gdb is confused" errors are as desirable as
> > you think they are. This extra frame has been reported to me as a bug
> > at least three times that I can think of (twice for RTOSes and once for
> > Linux KGDB).
>
> I can imagine you'd like to get these people off your back. And
> perhaps they're right that the extra frame is caused by a bug in GDB.
> But that bug is not the printing of the extra frame itself. The bug
> is GDB not being able to determine that it is at the end of the stack,
> which might actually be a bug in the compiler or system libraries
> they're using.
No! No no no.
First of all, I'm not just trying to get them off my back. I think
they're right and it shouldn't be displayed. Second, this _is_ GDB
being able to determine that it's at the end of the stack.
A return address of zero is a fairly common convention for this.
It's natural, if you think about it. On architectures with a
well entrenched frame pointer (exhibit A, our earlier conversation
about cache->base and %ebp on x86) then that can be initialized to zero
either before calling a generic higher-level function or else by
handwritten startup code. The same is true for the return address; it
can be set to return to nowhere. Neither "makes sense", so they are
useful markers. About the only other option is the stack pointer, and
you can't do that unless you're calling handwritten startup code that
also knows where the stack is supposed to go - pretty rare in modern
systems where that code is being called by anything other than a reset
vector.
> Then we should improve the unwinder. If we didn't error out with that
> error, the backtrace would never end.
As you well know, in many cases it is either impractical or downright
impossible to improve the prologue unwinder, e.g. when OS vendors ship
system libraries with neither unwind information nor symbols.
> > And Joel recently reported that Ada tasking generates this message
> > on at least one platform, and users are unhappy about that, too.
>
> IIRC this is a case where the outermost frame wasn't marked properly,
> or at least not detected as such by GDB. That's the problem that
> needs to be fixed.
I guess that depends what you mean by "marked properly" and what you
mean by "fixed".
The problem was that a routine in the system libraries was called
directly from new threads. The name of the routine is OS-dependant
and maybe even OS-version-dependant. Changing the system libraries is
out of the question here; all the world isn't free software and I
believe the system in question was either HP-UX or OSF. Recognizing it
by name would, I suppose, work - but you'd have to be careful of the
user reusing a fairly generic function name! Or else limit the check
to a specific shared object, and ditch caring about static linking.
Seems kind of sketchy to me.
> > And when we've run out of useful information, the stack appears to
> > end, and we're quite justified in reporting that the stack ended.
> > It's quite complex enough already without reporting "but the end of
> > the stack looks a little funny to me...".
>
> No, if a stack doesn't end properly on a platform where it should end
> properly, that's useful information that should be reported to the
> user.
I answered this one in another message, and it's basically the same as
my first bit above in this message. I think it's precisely proper to
report this as an end of stack condition.
We're reading the stack frame from memory, so we're automatically
vulnerable to displaying corrupted information if the user has
scribbled on the stack. But I don't think that merits displaying
something that we know is garbage. A non-zero unknown PC might be
garbage or it might be code we don't have symbols for; we display it as
if it were code we didn't have a symbol for. I posit that there's a
difference in kind between that and a zero PC, which might be garbage
or might be the end of the stack; and by analogy, I'm suggesting we
display it as if it is the end of the stack.
Anyway, that's my opinion. I have no idea how to proceed on this;
I don't really expect any of that to change your mind, and you probably
don't use any system where this extra frame is a serious annoyance.
--
Daniel Jacobowitz
CodeSourcery