Bug 3863 - libunwind occasionally does not return frame function names.
Summary: libunwind occasionally does not return frame function names.
Status: RESOLVED FIXED
Alias: None
Product: frysk
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
: 3241 3691 (view as bug list)
Depends on:
Blocks: 3076
  Show dependency treegraph
 
Reported: 2007-01-11 21:57 UTC by Nurdin Premji
Modified: 2007-10-10 11:57 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nurdin Premji 2007-01-11 21:57:05 UTC
Running 
./frysk-core/TestRunner -r 1000 
'testStressMultiThreadedDetach(frysk.util.StressTestFStack)'

eventually gives output of the form 
Task #5174
#0 0xe99402 in __kernel_vsyscall ()
#1 0x1392b7 in sigsuspend ()
#2 0x804913a in server ()
#3 0x29b3db in start_thread ()
#4 0x1dd26e in clone ()
Task #5175
#0 0x9305faa in [unknown]
#1 0x104b5ce2 in [unknown]
#2 0x8707f83 in [unknown]
#3 0x8649e16 in [unknown]

when in fact each task should be exactly the same (except the main task)
Comment 1 Nurdin Premji 2007-01-11 22:02:32 UTC
*** Bug 3691 has been marked as a duplicate of this bug. ***
Comment 2 Mike Cvet 2007-01-11 22:11:03 UTC
Should note that this is similar, but not identical, to #3728.

In that case, stepping through J.T.R. will accumulate unknown function names in 
various frames, seemingly at random.
Comment 3 Jan Kratochvil 2007-01-12 11:21:12 UTC
While it is not a direct dependency Bug 3791 bugfix should get corrected as
otherwise the addresses in the fail case (where symbols are no longer resolved)
get corrupted and it messes up a bit the debugging.
Comment 4 Jan Kratochvil 2007-01-12 11:26:38 UTC
Disowning this Bug as it is not libunwind related.

libunwind only fails to resolve the symbols after the fd (file descriptor)
process table gets filled up (>1024 fds).

You can debug it by:
valgrind --track-fds=yes ./frysk-core/TestRunner -r 1000
'testStressMultiThreadedDetach(frysk.util.StressTestFStack)'

resulting in libunwind's failings:
Warning: invalid file descriptor 1019 in syscall open()

and the final reported leaked fds:
Open file descriptor 178: /lib/libc-2.5.so
   at 0x40515C2: open64 (open64.c:45)
   by 0x85A5E5A: dwfl_linux_proc_find_elf (linux-proc-maps.c:297)
   by 0x85A407D: find_file (dwfl_module_getdwarf.c:103)
   by 0x85A4B71: find_dw (dwfl_module_getdwarf.c:395)
   by 0x85A4CC6: dwfl_module_getdwarf (dwfl_module_getdwarf.c:462)
   by 0x85AADB2: dwfl_module_getsrc (dwfl_module_getsrc.c:57)
   by 0x85A6CA3: dwfl_getsrc (dwfl_getsrc.c:55)
   by 0x8161192: _ZN3lib2dw4Dwfl11dwfl_getsrcEJxx (Dwfl.cxx:138)
   by 0x81583B6: _ZN3lib2dw4Dwfl13getSourceLineEJPNS0_8DwflLineEx (Dwfl.java:110)
   by 0x8133C36: frysk::rt::StackFrame::StackFrame(lib::unwind::FrameCursor*,
frysk::proc::Task*, frysk::rt::StackFrame*) (StackFrame.java:127)
   by 0x8133952:
_ZN5frysk2rt12StackFactory16createStackFrameEJPNS0_10StackFrameEPNS_4proc4TaskEi
(StackFactory.java:79)
   by 0x8133AB4:
_ZN5frysk2rt12StackFactory16createStackFrameEJPNS0_10StackFrameEPNS_4proc4TaskE
(StackFactory.java:112)

Open file descriptor 177: /usr/lib/debug/lib/ld-2.5.so.debug
   at 0x40515C2: open64 (open64.c:45)
   by 0x85A501E: try_open (find-debuginfo.c:79)
   by 0x85A53D4: dwfl_standard_find_debuginfo (find-debuginfo.c:178)
   by 0x85A45AF: find_debuginfo (dwfl_module_getdwarf.c:178)
   by 0x85A4BEA: find_dw (dwfl_module_getdwarf.c:417)
   by 0x85A4CC6: dwfl_module_getdwarf (dwfl_module_getdwarf.c:462)
   by 0x85AADB2: dwfl_module_getsrc (dwfl_module_getsrc.c:57)
   by 0x85A6CA3: dwfl_getsrc (dwfl_getsrc.c:55)
   by 0x8161192: _ZN3lib2dw4Dwfl11dwfl_getsrcEJxx (Dwfl.cxx:138)
   by 0x81583B6: _ZN3lib2dw4Dwfl13getSourceLineEJPNS0_8DwflLineEx (Dwfl.java:110)
   by 0x8133C08: frysk::rt::StackFrame::StackFrame(lib::unwind::FrameCursor*,
frysk::proc::Task*, frysk::rt::StackFrame*) (StackFrame.java:125)
   by 0x8133B12: frysk::rt::StackFrame::StackFrame(lib::unwind::FrameCursor*,
frysk::proc::Task*) (StackFrame.java:92)


There is some leakage, libdwfl's dwfl_end() is not called appropriately.
As it is called from the Dwfl binding's finalize() I assume there are some
leaked Java object references.
But I did not analyse it more as the bug looks to lie in the Java land.
Also running of the testcase above eats about 0.5GB of memory also suggesting
some leakage occurs there.
Comment 5 Mark Wielaard 2007-01-15 18:43:13 UTC
*** Bug 3241 has been marked as a duplicate of this bug. ***
Comment 6 Jan Kratochvil 2007-03-02 21:49:57 UTC
Found that it is not (may not) be a Java code problem (not calling dwfl_end()
from its finalizers).
The leakage is present in elfutils even if one calls dwfl_end():
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=230793
Comment 7 Mark Wielaard 2007-10-10 11:57:45 UTC
elfutils 0.127 fixes this.