This is the mail archive of the
mailing list for the glibc project.
Re: Notes on a frame_unwind_address_in_block problem
- From: Mark Kettenis <mark dot kettenis at xs4all dot nl>
- To: drow at false dot org
- Cc: gdb at sourceware dot org, libc-alpha at sourceware dot org
- Date: Thu, 13 Jul 2006 22:20:12 +0200 (CEST)
- Subject: Re: Notes on a frame_unwind_address_in_block problem
- References: <20060706222157.GA1377@nevyn.them.org>
> Date: Thu, 6 Jul 2006 18:21:57 -0400
> From: Daniel Jacobowitz <email@example.com>
> I. 64-bit
> On my AMD64 GNU/Linux system, sigaltstack.exp is currently failing both
> 32-bit and 64-bit with similar symptoms. Saying "finish" from a signal
> handler fails to stop in the trampoline. We correctly insert the
> breakpoint, but the frame ID doesn't match: get_frame_id (get_prev_frame
> (get_current_frame ())) when in the signal handler does not equal
> get_frame_id (get_current_frame ()) once we've returned to the trampoline.
> The causes are quite different despite the similar symptoms. On 64-bit,
> it appears to be partly related to bogus call frame information in glibc's
> __restore_rt - there's dwarf2 info but (A) it doesn't start until the first
> instruction of the trampoline, when for signal trampolines it ought to start
> one byte earlier, and (B) it doesn't describe the signal frame at all.
It depends a bit on how the implementation of the trampoline. If the
trampoline itself calls the signal handler, there's no problem, but if
the kernel calls the signal handler directly and sets up the stack to
return to the signal handler, then yes, the unwind info should really
start before the "entry point" of the trampoline.
Looking at sysdeps/unix/sysv/linux/x86_64/sigaction.c in the glibc sources:
".align 16\n" \
CFI_STARTPROC "\n" \
"__" #name ":\n" \
" movq $" #syscall ", %rax\n" \
" syscall\n" \
CFI_ENDPROC "\n" \
Someone should either add the proper unwind info or remove the unwind
> Another problem is that sometimes we get the amd64-specific frame unwinder
> for this code and sometimes we get the dwarf2 unwinder. When $rip is in
> __restore_rt, the dwarf2 frame sniffer successfully attaches to it. At that
> point, because of the bogus CFI, we can't backtrace. But when it's up the
> frame, the dwarf2 sniffer doesn't get it (because the unwind information
> doesn't start one byte too early, as we seem to have concluded that it ought
> to). So instead the amd64 fallback sniffer gets control, identifies it as a
> sigtramp, and sets up for real backtraces.
The proper way to fix this is to fix glibc.
> I think that correct CFI in glibc for the signal restore trampolines will
> sort out the 64-bit case. Or no CFI, but glibc would probably want correct
> CFI for other reasons.
I think the ABI calls for CFI, but really, no CFI would be much better
than the bogus CFI.
> II. 32-bit.
> For 32-bit, though, it gets even more interesting. Here's where $SUBJECT
> comes into play. I have a loaded vDSO (virtual shared object), which
> exports __kernel_sigreturn. This points at the first instruction of the
> trampoline, i.e. one byte after the start of the dwarf2 FDE. When I am
> stopped in the signal handler, frame_unwind_address_in_block decides to
> subtract one. That points before the symbol, so the frame ID's function
> ends up being NULL; there's no symbol covering that address. Then when we
> arrive at the signal trampoline during "finish", we no longer subtract one
> - since we're at an executable instruction in the topmost frame - and thus
> we do find the symbol. The two frame IDs don't compare equal.
Ouch. So in the end having the signal trampoline call the signal
handler *is* a better design.
> One possible solution is the nasty patch I have in my working directory,
> which boils down to this:
> if (next_frame->level >= 0
> - && get_frame_type (next_frame) == NORMAL_FRAME)
> + && get_frame_type (next_frame) == NORMAL_FRAME
> + && (next_frame->prev == NULL
> + || next_frame->prev->unwind == NULL
> + || get_frame_type (next_frame->prev) == NORMAL_FRAME))
> But this makes frame_unwind_address_in_block change its behavior over time
> for the same frame, which is awful.
I'm always very suspicious of such multi-condition if statements.
> Another solution would be to use the FDE start address as the code address
> for dwarf2 signal frame IDs, instead of the function. This would work on
> the assumption that a single FDE would generally cover the entire trampoline
> - a reasonable assumption, I think, and the consequences for the frame ID
> changing while single-stepping are less disruptive here than the
> Mark, what do you think of that idea? It seems to work. It looks like the
> patch at the end of this message.
In general, I think it's a bad idea to do this, but for the special
case of a signal frame, especially in the presence of the 'S"
augmentation, that might be a reasonable thing to do. However, I
think we can do better than that. What about checking whether the
address returned by frame_unwind_address_in_block() is equal to the
FDE start address and add one bytes if that's the case before looking
up the function corresponding to that address?