Wishlist: declarations suitable for post mortem debugging

David Kastrup dak@gnu.org
Thu Apr 26 14:12:00 GMT 2012


Sorry for the top post here: it's one of those few cases where I think
that adding the context below makes sense.

Paul Pluzhnikov <ppluzhnikov@google.com> writes:

> David,
>
> This is probably best discussed on libc-alpha (CC'd).
>
> Providing a link to previous discussion, bugzilla PR, etc. might help.

<URL:http://sourceware.org/bugzilla/show_bug.cgi?id=6522>

> Providing an actual example where you wasted days on "fake" stack
> trace may also help.

Here is actually another reason:
<URL:http://lists.gnu.org/archive/html/emacs-devel/2005-03/msg00048.html>

If you have a failed assertion, _fixing_ some condition and/or
tentatively returning in the debugger can be quite helpful for figuring
out more things.  This reason, of course, is softer than the need for a
useful core dump since it could be claimed, with varying degrees of
being convincing, for pretty much any function.

Here are some other links about this:

<URL:http://permalink.gmane.org/gmane.emacs.devel/80050>

<URL:http://thread.gmane.org/gmane.emacs.devel/33962>

<URL:http://lists.gnu.org/archive/html/emacs-devel/2005-02/msg01226.html>

<URL:http://lists.gnu.org/archive/html/emacs-devel/2005-02/msg01268.html>

<URL:http://lists.gnu.org/archive/html/emacs-devel/2005-03/msg00001.html>

There are probably a few more mails in February where I try diagnosing
that particular bug based on the debug backtrace at the time the
assertion triggers.

>
> On Thu, Apr 26, 2012 at 2:49 AM, David Kastrup <dak@gnu.org> wrote:
>>
>> We have attributes like
>>
>> /* This prints an "Assertion failed" message and aborts.  */
>> extern void __assert_fail (__const char *__assertion, __const char *__file,
>>                           unsigned int __line, __const char *__function)
>>     __THROW __attribute__ ((__noreturn__));
>>
>> in assert.h and
>>
>> extern void abort (void) __THROW __attribute__ ((__noreturn__));
>>
>> in stdlib.h.  These functions, in contrast to exit, have a side effect
>> of dumping core as a regular effect of their execution.  The purpose is
>> to enable post-mortem debugging.
>>
>> The attribute __noreturn__ directly conflicts with that purpose since it
>> tells the compiler it may trash the stack when calling the function, not
>> requiring any useful information to be retained on the stack.  In
>> particular, those functions may be _jumped_ to instead of called, or an
>> existing call to these functions in an unrelated part of source may get
>> recycled by jumping to it.
>>
>> As a result, backtraces from the core dump are quite unreliable.  I
>> have, on several occasions, spent days of futile debugging on backtraces
>> that did not correspond with reality.
>>
>> So I would strongly suggest that functions that are _explicitly_
>> intended to dump core don't get marked as "__noreturn__".  This seems
>> like a rather straightforward way to stop the compiler from making those
>> core dumps much less useful than they should be.  While it might be
>> conceivable to invent a special __coredump__ flag to make sure that the
>> generated code around such a call (including local variables) fully and
>> uniquely corresponds with the available debug information, that seems
>> like a quite more complex endeavor.
>>
>> I did suggest removing __noreturn__ attributes on core dumping functions
>> some years ago, but was chased away without use of minced words since my
>> request was considered incompatible with the holy grail of optimization,
>> and talking about debugging should only be allowed on entirely
>> unoptimized code.  The kind of arguments and name-calling used for
>> putting a stop to my request were nothing I would have considered
>> compelling on technical grounds, so I am reraising that request in light
>> of assurances of a changed overall climate in glibc development.  I am
>> quoting a passage from Emacs "DEBUG" file that resulted from the
>> non-acceptance of my proposal.  I may add that _several_ Emacs
>> developers were afflicted by this problem and wasted several days on
>> different bugs each, so it is not academical.
>>
>>    ** When you are trying to analyze failed assertions, it will be
>>    essential to compile Emacs either completely without optimizations or
>>    at least (when using GCC) with the -fno-crossjumping option.  Failure
>>    to do so may make the compiler recycle the same abort call for all
>>    assertions in a given function, rendering the stack backtrace useless
>>    for identifying the specific failed assertion.
>
> Yes, but assertions *also* print file/line info, and appears that
> file/line should be quite sufficient for identifying the specific
> assertion that failed.
>
>>
>> Of course, it is not just glibc that is concerned here since GCC itself
>> has built-in definitions for some of those functions.  But those are
>> intended to follow the glibc in spirit, I should think, so I think that
>> one should start here concerning this issue.
>>
>> Thanks for caring
>>
>> --
>> David Kastrup

-- 
David Kastrup



More information about the Libc-help mailing list