Bug 3529 - libunwind temporarily drops a stack frame when stepping between functions
Summary: libunwind temporarily drops a stack frame when stepping between functions
Status: RESOLVED FIXED
Alias: None
Product: frysk
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Jan Kratochvil
URL:
Keywords:
Depends on:
Blocks: 1839 2936 3076
  Show dependency treegraph
 
Reported: 2006-11-16 18:12 UTC by Mike Cvet
Modified: 2006-11-24 21:26 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
test program (427 bytes, text/plain)
2006-11-16 18:13 UTC, Mike Cvet
Details
Fix requiring to disable internal caching (included) (1.63 KB, patch)
2006-11-21 17:34 UTC, Jan Kratochvil
Details | Diff
libunwind-local testcase for debugging (1.43 KB, patch)
2006-11-21 17:37 UTC, Jan Kratochvil
Details | Diff
Fixed patch, handles signal frames, caching is fixed/unaffected (1.68 KB, patch)
2006-11-23 17:12 UTC, Jan Kratochvil
Details | Diff
Testcase including signal frame (1.57 KB, patch)
2006-11-23 17:16 UTC, Jan Kratochvil
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Cvet 2006-11-16 18:12:29 UTC
Not apparent on FC5, however on FC6.

Attached is a test program. The two functions foo() and jump() perform the 
mundane tasks of incrementing and calculating integer values in a while(1) loop 
controled by foo(). There is a dummy function at the top of the file, 
shazaam(), which does nothing.

At the end of this loop, a call to jump() is made. However, shazaam() ends up 
getting highlighted, and the stack is missle foo() from its middle. According 
to the jump() frame, its address is 0x400528, but when we examine this in gdb:

(gdb) x/i 0x400528
0x400528 <shazaam>:     push   %rbp

It's actually pointing to shazaam(). Subsequent stepping resumes correctly 
after this initial incorrect address.

I placed print statements in the constructors of each of our StackFrame 
objects, printing out the name of the frame, its CFA, and address. Here are the 
outputs for the last three steps:

foo 140737031278864 59 0x4005fa
main 140737031278912 71 0x400641
__libc_start_main 140737031278960 0 0x2aaaaaefda44
_start 140737031279152 0 0x400499

foo 140737031278864 59 0x4005ff
main 140737031278912 71 0x400641
__libc_start_main 140737031278960 0 0x2aaaaaefda44
_start 140737031279152 0 0x400499

jump 140737031278856 13 0x400528
main 140737031278912 71 0x400641
__libc_start_main 140737031278960 0 0x2aaaaaefda44
_start 140737031279152 0 0x400499

The frame for foo() is missing, and jump()'s address points to shazaam, which 
has never been called.

Apparently libunwind is returning garbage for the trace at this point?
Comment 1 Mike Cvet 2006-11-16 18:13:45 UTC
Created attachment 1422 [details]
test program
Comment 2 Jan Kratochvil 2006-11-21 17:34:15 UTC
Created attachment 1430 [details]
Fix requiring to disable internal caching (included)

000000000804848c jump				  (sp=00000000bff7bfbc)
	proc=000000000804848c-00000000080484ba
	handler=0 lsda=0
000000000804856e foo+0xa2			  (sp=00000000bff7bfc0)
	proc=00000000080484cc-0000000008048570
	handler=0 lsda=0
00000000080485b3 main+0x43			  (sp=00000000bff7bfe0)
	proc=0000000008048570-00000000080485c1
	handler=0 lsda=0
000000000082fdec __libc_start_main+0xdc 	  (sp=00000000bff7c010)
	proc=000000000082fd10-000000000082fded
	handler=0 lsda=0
0000000008048401 _start+0x21			  (sp=00000000bff7c080)
	proc=00000000080483e0-0000000008048402
	handler=0 lsda=0
================

000000000804848d jump+0x1			  (sp=00000000bff7bfb8)
	proc=000000000804848c-00000000080484ba
	handler=0 lsda=0
000000000804856e foo+0xa2			  (sp=00000000bff7bfc0)
	proc=00000000080484cc-0000000008048570
	handler=0 lsda=0
00000000080485b3 main+0x43			  (sp=00000000bff7bfe0)
	proc=0000000008048570-00000000080485c1
	handler=0 lsda=0
000000000082fdec __libc_start_main+0xdc 	  (sp=00000000bff7c010)
	proc=000000000082fd10-000000000082fded
	handler=0 lsda=0
0000000008048401 _start+0x21			  (sp=00000000bff7c080)
	proc=00000000080483e0-0000000008048402
	handler=0 lsda=0
================

000000000804848f jump+0x3			  (sp=00000000bff7bfb8)
	proc=000000000804848c-00000000080484ba
	handler=0 lsda=0
000000000804856e foo+0xa2			  (sp=00000000bff7bfc0)
	proc=00000000080484cc-0000000008048570
	handler=0 lsda=0
00000000080485b3 main+0x43			  (sp=00000000bff7bfe0)
	proc=0000000008048570-00000000080485c1
	handler=0 lsda=0
000000000082fdec __libc_start_main+0xdc 	  (sp=00000000bff7c010)
	proc=000000000082fd10-000000000082fded
	handler=0 lsda=0
0000000008048401 _start+0x21			  (sp=00000000bff7c080)
	proc=00000000080483e0-0000000008048402
	handler=0 lsda=0
================
Comment 3 Jan Kratochvil 2006-11-21 17:37:03 UTC
Created attachment 1432 [details]
libunwind-local testcase for debugging
Comment 4 Jan Kratochvil 2006-11-21 17:38:30 UTC
There is still needed to make the internal caching compatible with the patch.
Still the functionality should be final, I hope.
Comment 5 Andrew Cagney 2006-11-21 18:30:16 UTC
Jan FYI, nice catch, but there's more ...

-  --ip;
+  /* In the current (lowest) frame we must not touch `ip' as the current
+     address is where we stand.  On the other hand any upper frames will stand
+     on the next instruction behind our call which may have a different stack
+     DWARF information (for `stdcall' called functions) or the next instruction
+     even may belong already to a different continuing function.  */
+  if (!c->first_step)
+    --ip;

this can also occure a function was interrupted with a signal at its first
instruction giving the layout:

   inner-most
   <signal-trampoline>
   foo-at-first-instruction

more generally there are two cases, where the function was interrupted (inner
most and caller of signal-trampoline, and others making a normal call
Comment 6 Jan Kratochvil 2006-11-23 17:12:40 UTC
Created attachment 1435 [details]
Fixed patch, handles signal frames, caching is fixed/unaffected

Final patch.
Comment 7 Jan Kratochvil 2006-11-23 17:16:29 UTC
Created attachment 1436 [details]
Testcase including signal frame

Testcase still needs to get properly integrated into libunwind testsuite.

Expected output:
00000000080484ec jump				  (sp=00000000bf90062c)
	proc=00000000080484ec-000000000804851a
	handler=0 lsda=0
000000000804855d foo+0x31			  (sp=00000000bf900630)
	proc=000000000804852c-00000000080485d5
	handler=0 lsda=0
0000000000eba420 __kernel_sigreturn		  (sp=00000000bf900650)
	proc=0000000000eba41f-0000000000eba428
	handler=0 lsda=0
00000000080485d5 lockup 			  (sp=00000000bf90092c)
	proc=00000000080485d5-00000000080485da
	handler=0 lsda=0
0000000008048632 prefoo+0x58			  (sp=00000000bf900930)
	proc=00000000080485da-0000000008048634
	handler=0 lsda=0
0000000008048697 main+0x63			  (sp=00000000bf900960)
	proc=0000000008048634-00000000080486a5
	handler=0 lsda=0
000000000082fdec __libc_start_main+0xdc 	  (sp=00000000bf900990)
	proc=000000000082fd10-000000000082fded
	handler=0 lsda=0
0000000008048461 _start+0x21			  (sp=00000000bf900a00)
	proc=0000000008048440-0000000008048462
	handler=0 lsda=0
================
Comment 8 Jan Kratochvil 2006-11-23 23:11:01 UTC
Still not committed - x86_64 signal frames affected by glibc:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217087

Is it enough to get the unwinding fixed in glibc RawHide or should libunwind
provide a workaround for legacy glibc releases? I believe RawHide is enough.
Comment 9 Jan Kratochvil 2006-11-24 21:26:52 UTC
x86_64 signal frame functionality still dependent on resolving glibc's:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217087