This is the mail archive of the
frysk@sources.redhat.com
mailing list for the frysk project.
Re: remote unwinding of libunwind
- From: Mike Cvet <mcvet at redhat dot com>
- To: Wu Zhou <woodzltc at cn dot ibm dot com>
- Cc: frysk at sources dot redhat dot com, Alexandre Oliva <aoliva at redhat dot com>
- Date: Tue, 19 Sep 2006 10:58:58 -0400
- Subject: Re: remote unwinding of libunwind
- References: <450FD0D2.5000703@cn.ibm.com>
On Tue, 2006-09-19 at 19:13 +0800, Wu Zhou wrote:
> Noticing there are quite some stack-unwind code checked into CVS, I spared some time to play around.
> The test results seem to be quite satisfactory. It can now get the function name in the
> dynamically-loaded library, and extract the source and line information if available. And it also
> start to support multi-thread unwinding now.
>
> But I also noticed some little problems. The first one is while I am playing with Kyle's code. It
> can step / unwind both threads now, but it seems the unwinder swallows some frames for itself own
> consumption. :-) Looking into the below unwind session, you will notice that there are four level
> fames in both threads. But in fact, there are six frames in each. You can see this from the pstack
> output.
>
> $ ./unwinddebug
> Enter the PID of the main therad: 8297
> Assuming second thread is pid 8298
> Tracing main thread!
> Frames of pid 8297:
>
> found frame 0
> 0000000000bfb402 (sp=00000000bfe87ba4)
> found frame 1
> 0000000008048893 main+0x10e (sp=00000000bfe87d70)
> found frame 2
> 0000000000c2e724 __libc_start_main+0xdc (sp=00000000bfe87dd0)
> found frame 3
> 0000000008048521 _start+0x21 (sp=00000000bfe87e40)
>
> Trace Depth = 4
>
> Tracing second thread!
> Frames of pid 8298:
>
> found frame 0
> 0000000000bfb402 +0x21 (sp=00000000b7eef264)
> found frame 1
> 00000000080486b6 thread1+0x77 (sp=00000000b7eef430)
> found frame 2
> 0000000000db440b start_thread+0xa9 (sp=00000000b7eef460)
> found frame 3
> 0000000000ce1b7e __clone+0x5e (sp=00000000b7eef4d0)
>
> Trace Depth = 4
>
> $ pstack 8297
> Thread 2 (Thread -1209074784 (LWP 8298)):
> #0 0x00bfb402 in __kernel_vsyscall ()
> #1 0x00ca3f16 in __nanosleep_nocancel () from /lib/libc.so.6
> #2 0x00ca3d3b in sleep () from /lib/libc.so.6
> #3 0x080486b6 in thread1 ()
> #4 0x00db440b in start_thread () from /lib/libpthread.so.0
> #5 0x00ce1b7e in clone () from /lib/libc.so.6
> Thread 1 (Thread -1209071296 (LWP 8297)):
> #0 0x00bfb402 in __kernel_vsyscall ()
> #1 0x00ca3f16 in __nanosleep_nocancel () from /lib/libc.so.6
> #2 0x00ca3d3b in sleep () from /lib/libc.so.6
> #3 0x08048893 in main ()
>
>
> The second one is found while I am playing with Tromey's fdtrace:
>
> # ./frysk/bindir/fdtrace /home/woodzltc/fdtrace/Closer2
> bad close() call at:
> val = 0; in function: null (<Unknown file> at line 0)
> val = 134513583; in function: doit2 (/home/woodzltc/AboutFrame/libunwind/fdtrace/Closer2.c at line 9)
> val = 134513607; in function: main (/home/woodzltc/AboutFrame/libunwind/fdtrace/Closer2.c at line 13)
> val = 12773156; in function: __libc_start_main (Unknown file at line 0)
> val = 134513409; in function: _start (Unknown file at line 0)
> bad close() call at:
> val = 0; in function: null (<Unknown file> at line 0)
> val = 134513583; in function: doit2 (/home/woodzltc/AboutFrame/libunwind/fdtrace/Closer2.c at line 9)
> val = 134513607; in function: main (/home/woodzltc/AboutFrame/libunwind/fdtrace/Closer2.c at line 13)
> val = 12773156; in function: __libc_start_main (Unknown file at line 0)
> val = 134513409; in function: _start (Unknown file at line 0)
>
> The address of the first frame seems to be 0, and "doit()" and "close()" was swallowed as well.
>
> Anyone noticed these problems before? Is there any work to make improvement on this?
>
Yup, I noticed it yesterday as well. However, Alex still has some
pending patches to go into libunwind. When those get in we'll take a
closer look at this... need to fix one problem at a time.
>
> BTW, I also have one observation that libunwind has only two test cases for remote unwinding. That
> is far from enough, IMO. Stack unwind has quite some different scenarios, especially in remote
> unwind. We will have no way to be sure how it works in these scenario, if we have not test them.
> So I predict there are yet some other problems some where we didn't noticed.
>
> My two cents is we need to write much more cases to evaluate how libunwind works in various
> scenarios: single thread and multi-threads, normal operation and abnormal operation (signal frame or
> exception handler or non-local jump)... It is better if we can also extract the backtrace
> information from the core dumped out.
You're absolutely right - the testcases are lacking. I'll beef this up
when I get some time this week!
- Mike