stack_info::walk and alloca don't mix

Ryan Johnson ryan.johnson@cs.utoronto.ca
Tue May 3 15:13:00 GMT 2011


Hi all,

FYI in case anyone else has been seeing strange crashes inside calls to 
api_fatal():

It seems that functions which use alloca() set up a non-standard stack 
frame which confuses both stack_info::walk and windbg. The former tends 
to either enter an infinite loop or end up executing code in la-la land; 
the latter crashes instantly. Worse, if an exception handler was active, 
it will detect the crash and attempt to generate a (second) stack dump, 
leading to an infinite loop until the stack space is exhausted and the 
process terminates.

The above seems to be the reason why fork failures often emit the "died 
waiting for longjmp" message instead of (or in addition to) "resource 
temporarily unavailable" -- the failed child enters an infinite loop 
trying to error-exit and the parent eventually times out.

The pernicious part is, gcc converts even normal stack allocations into 
alloca calls if they are "large," so just eliminating direct calls to 
alloca isn't enough. For example, dll_list::alloc declares a "WCHAR 
name[NT_MAX_PATH]" which gcc turns into a call to alloca() under the 
hood. The relevant assembler output is:

mov    $0x1002c,%eax
call   0x6115cde0 <_alloca>
mov    %edi,0x10024(%esp)
mov    0x10034(%esp),%edi
mov    %ebp,0x10028(%esp)
lea    0x1c(%esp),%ebp
mov    %ebx,0x1001c(%esp)
mov    %esi,0x10020(%esp)
movl   $0x10000,0x8(%esp)
mov    %ebp,0x4(%esp)
mov    %edi,(%esp)

mov    0x1001c(%esp),%ebx
mov    0x10020(%esp),%esi
mov    0x10024(%esp),%edi
mov    0x10028(%esp),%ebp
add    $0x1002c,%esp
ret

As nearly as I can tell, debug info would be required to recover the 
caller-saved %ebp from such a stack frame. Problem is, I don't know any 
way to identify such a stack frame short of using debug info, either.

An alternative might be to enable exceptions: even if no code actually 
throws exceptions, gcc emits unwind information which can be accessed 
quite easily using the definitions in lib_gcc's <unwind.h> (example 
below). The good thing is this info will be accurate unless the stack 
has been corrupted, but it's still far from ideal because it comes with 
all the space overheads that accompany exception handling. It also 
doesn't allow to recover function args, though I'm not sure that 
actually works in the presence of optimized code even today. That said, 
the unwind logic exits cleanly if it can't find any unwind info to use, 
so it might make a good debug-build option.

extern "C" _Unwind_Reason_Code trace_fcn(_Unwind_Context *ctx, void *d)
{
     int *depth = (int*)d;
     printf("\t0x%08x\n", _Unwind_GetIP(ctx));
     (*depth)++;
     return _URC_NO_REASON;
}

void print_backtrace_here()
{
     int depth = 0;
     printf("Stack trace:\n");
     _Unwind_Backtrace(&trace_fcn, &depth);
     fflush(stdout);
}


Thoughts?
Ryan





More information about the Cygwin-developers mailing list