Bug 6621 - frysk command line tools failing with an NPE
Summary: frysk command line tools failing with an NPE
Status: NEW
Alias: None
Product: frysk
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-10 21:52 UTC by Ray Ruvinskiy
Modified: 2008-10-21 21:40 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ray Ruvinskiy 2008-06-10 21:52:36 UTC
I'm trying to run the frysk command line tools (fstack in particular, but they all seem to fail similarly) on 
an x86_64 Fedora 8 VM (running on VMWare Server). (The same behaviour was observed on a Ubuntu 
8.04 i386 box.)

It works when run against a small number of processes (e.g., bash) but fails when run against almost 
anything else (e.g., gnome-terminal, gconf2d, etc.).

I've tried the pre-built binaries from the 0.4 RPM, as well as building the binaries from the source in 
the head branch.

The debug trace output is as follows:

00:00:00.018 27105.27105 frysk.util.CommandlineParser: checking for pid
00:00:00.022 27105.27105 frysk.sys.Wait: adding 29 (I/O possible)
00:00:00.022 27105.27105 [Thread[Thread-1,5,main]]: new
00:00:00.023 27105.27105 [{frysk.proc.live.LinuxPtraceHost@2f8b1875}]: Host
00:00:00.025 27105.27105 [Thread[Thread-1,5,main]]: add WaitBuilder 
<<frysk.proc.live.LinuxWaitBuilder@2f8b5e6d>>
00:00:00.025 27105.27105 [{frysk.proc.live.LinuxPtraceHost@2f8b1875}]: requestProc 2875
00:00:00.026 27105.27105 [Thread[Thread-1,5,main]]: add Event 
<<frysk.proc.live.LinuxPtraceHost$3@2f8c13f5>>
00:00:00.026 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.027 27105.27105 [Thread[Thread-1,5,main]]: run
00:00:00.027 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop pendingOnly false
00:00:00.028 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.028 27105.27105 [Thread[Thread-1,5,main]]: remove ... return 
<<frysk.proc.live.LinuxPtraceHost$3@2f8c13f5>>
00:00:00.029 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop executing 
<<frysk.proc.live.LinuxPtraceHost$3@2f8c13f5>>
00:00:00.029 27105.27105 [{frysk.proc.live.LinuxPtraceHost@2f8b1875}]: execute - requestProc 
<<2875>>
00:00:00.030 27105.27105 [frysk.sys.proc.Stat@2f7bd185]: scan <<2875>>
00:00:00.031 27105.27105 [/proc/2875/stat]: pid <<2875>>
00:00:00.031 27105.27105 [/proc/2875/stat]: state S
00:00:00.032 27105.27105 [/proc/2875/stat]: ppid <<1>>
00:00:00.032 27105.27105 [frysk.sys.proc.Stat@2f7bd04d]: scan <<1>>
00:00:00.033 27105.27105 [/proc/1/stat]: pid <<1>>
00:00:00.033 27105.27105 [/proc/1/stat]: state S
00:00:00.034 27105.27105 [/proc/1/stat]: ppid <<0>>
00:00:00.035 27105.27105 [{frysk.proc.live.LinuxPtraceProc@2f8c7455,pid=1,state=<null>}]: new - 
create unattached running proc
00:00:00.036 27105.27105 frysk.proc.live.LinuxPtraceProcState: initial false
00:00:00.036 27105.27105 [{frysk.proc.live.LinuxPtraceProc@2f8c7455,pid=1,state=detached}]: 
add(Proc) -- a child process
00:00:00.037 27105.27105 [{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=<null>}]: new 
- create unattached running proc
00:00:00.038 27105.27105 frysk.proc.live.LinuxPtraceProcState: initial false
00:00:00.039 27105.27105 frysk.proc.Task: new -- create unattached
00:00:00.040 27105.27105 [Thread[Thread-1,5,main]]: requestStop
00:00:00.041 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.041 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.041 27105.27105 [frysk.util.ProcStopUtil$1@2f8543c5]: parse pids 
[<<{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=detached}>>]
00:00:00.042 27105.27105 [frysk.proc.ProcBlockAction@2f8c5b55]: new
00:00:00.043 27105.27105 [Thread[Thread-1,5,main]]: add Event 
<<frysk.proc.ProcBlockAction$3@2f8de515>>
00:00:00.043 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.044 27105.27105 [Thread[Thread-1,5,main]]: run
00:00:00.044 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop pendingOnly false
00:00:00.045 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.045 27105.27105 [Thread[Thread-1,5,main]]: remove ... return 
<<frysk.proc.ProcBlockAction$3@2f8de515>>
00:00:00.045 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop executing 
<<frysk.proc.ProcBlockAction$3@2f8de515>>
00:00:00.046 27105.27105 
[{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=detached}]: 
requestAddInstructionObserver
00:00:00.046 27105.27105 [{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=detached}]: 
requestAddInstructionObserver
00:00:00.047 27105.27105 [Thread[Thread-1,5,main]]: add Event 
<<Observation[observable:frysk.proc.live.TaskObservable@2f8d0215observer:frysk.proc.ProcBlockActi
on$ProcBlockTaskObserver@2f8de535]>>
00:00:00.047 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.048 27105.27105 
[{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=detached}]: 
requestAddTerminatedObserver
00:00:00.048 27105.27105 [{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=detached}]: 
requestAddObservation
00:00:00.048 27105.27105 [Thread[Thread-1,5,main]]: add Event 
<<Observation[observable:frysk.proc.live.TaskObservable@2f8d02a5observer:frysk.proc.ProcBlockActi
on$ProcBlockTaskObserver@2f8de535]>>
00:00:00.049 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.049 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.050 27105.27105 [Thread[Thread-1,5,main]]: remove ... return 
<<Observation[observable:frysk.proc.live.TaskObservable@2f8d0215observer:frysk.proc.ProcBlockActi
on$ProcBlockTaskObserver@2f8de535]>>
00:00:00.050 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop executing 
<<Observation[observable:frysk.proc.live.TaskObservable@2f8d0215observer:frysk.proc.ProcBlockActi
on$ProcBlockTaskObserver@2f8de535]>>
00:00:00.051 27105.27105 frysk.proc.live.LinuxPtraceProcState: handleAddObserver 
<<{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=detached}>>
00:00:00.051 27105.27105 frysk.proc.live.LinuxPtraceProcState: state 
<<{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=detached}>>
00:00:00.051 27105.27105 frysk.proc.live.LinuxPtraceTaskState: handleAttach 
<<{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=detached}>>
00:00:00.052 27105.27105 
[{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=detached}]: sendAttach
00:00:00.052 27105.27105 frysk.sys.ptrace.Ptrace: attach <<2875>>
00:00:00.053 27105.27105 [frysk.sys.proc.Status@2f8e5155]: stan <<2875>>
00:00:00.053 27105.27105 frysk.sys.proc.Status: state 'T'
00:00:00.054 27105.27105 frysk.sys.proc.Status: stopped false
00:00:00.054 27105.27105 frysk.sys.proc.Status: uid 500
00:00:00.054 27105.27105 frysk.sys.proc.Status: gid 500
00:00:00.055 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.055 27105.27105 [Thread[Thread-1,5,main]]: remove ... return 
<<Observation[observable:frysk.proc.live.TaskObservable@2f8d02a5observer:frysk.proc.ProcBlockActi
on$ProcBlockTaskObserver@2f8de535]>>
00:00:00.056 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop executing 
<<Observation[observable:frysk.proc.live.TaskObservable@2f8d02a5observer:frysk.proc.ProcBlockActi
on$ProcBlockTaskObserver@2f8de535]>>
00:00:00.056 27105.27105 frysk.proc.live.LinuxPtraceProcState: handleAddObservation 
<<{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=Attaching.ToMainTask}>>
00:00:00.057 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.057 27105.27105 [Thread[Thread-1,5,main]]: block timeout 0xffffffffffffffff
00:00:00.057 27105.27105 frysk.sys.Wait: waitpid 2875 -> status 0x137f WIFSTOPPED 19 (Stopped 
(signal))
00:00:00.058 27105.27105 frysk.sys.Wait: waitpid 0 -> errno 0 (Success)
00:00:00.058 27105.27105 [frysk.proc.live.LinuxWaitBuilder@2f8b5e6d]: stopped <<2875>>
00:00:00.059 27105.27105 
[{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=attaching}]: stoppedEvent 
<<SIGSTOP(19)>>
00:00:00.059 27105.27105 
[{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=attaching}]: 
initializeAttachedState
00:00:00.060 27105.27105 frysk.sys.ptrace.Ptrace: setOptions <<2875>> options 0x5a
00:00:00.061 27105.27105 
[{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=Attaching.ToMainTask}]: 
performTaskAttachCompleted
00:00:00.062 27105.27105 [Thread[Thread-1,5,main]]: add Event <<[ProcEvent,procnull]>>
00:00:00.062 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.063 27105.27105 [Thread[Thread-1,5,main]]: checkForTimerEvents
00:00:00.063 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.063 27105.27105 [Thread[Thread-1,5,main]]: remove ... return <<[ProcEvent,procnull]>>
00:00:00.063 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop executing 
<<[ProcEvent,procnull]>>
00:00:00.063 27105.27105 frysk.proc.live.LinuxPtraceProcState: handleTaskAttachCompleted 
<<{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=Attaching.ToMainTask}>>
00:00:00.064 27105.27105 frysk.proc.live.LinuxPtraceProcState: allAttached 
<<{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=Attaching.ToMainTask}>>
00:00:00.064 27105.27105 frysk.proc.live.LinuxPtraceTaskState: handleAddObservation 
<<{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForContinueOr
Unblock}>>
00:00:00.064 27105.27105 [Thread[Thread-1,5,main]]: add Event 
<<frysk.proc.ProcBlockAction$1@2f8ed25d>>
00:00:00.064 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.065 27105.27105 frysk.proc.live.LinuxPtraceTaskState: handleAddObservation 
<<{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForContinueOr
Unblock}>>
00:00:00.065 27105.27105 frysk.proc.live.LinuxPtraceTaskState: handleContinue 
<<{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForContinueOr
Unblock}>>
00:00:00.065 27105.27105 [frysk.proc.Proc$ObservableXXX@2f8cb0d5]: notify -- all observers
 <<{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=Attaching.ToMainTask}>>
00:00:00.066 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.066 27105.27105 [Thread[Thread-1,5,main]]: remove ... return 
<<frysk.proc.ProcBlockAction$1@2f8ed25d>>
00:00:00.066 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop executing 
<<frysk.proc.ProcBlockAction$1@2f8ed25d>>
00:00:00.066 27105.27105 [frysk.proc.ProcBlockAction@2f8c5b55]: this taskList <<[]>> proc.taskList 
<<[{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForUnblock,sig
nal=SIGNONE(0)}]>>
00:00:00.067 27105.27105 [Thread[Thread-1,5,main]]: add Event 
<<frysk.util.ProcStopUtil$UtilEvent@2f8cff8d>>
00:00:00.067 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.067 27105.27105 
[{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForUnblock,signal
=SIGNONE(0)}]: requestDeleteInstructionObserver
00:00:00.067 27105.27105 [{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=running}]: 
requestDeleteInstructionObserver
00:00:00.068 27105.27105 [Thread[Thread-1,5,main]]: add Event 
<<Observation[observable:frysk.proc.live.TaskObservable@2f8d0215observer:frysk.proc.ProcBlockActi
on$ProcBlockTaskObserver@2f8de535]>>
00:00:00.068 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.068 27105.27105 
[{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForUnblock,signal
=SIGNONE(0)}]: requestDeleteTerminatedObserver
00:00:00.068 27105.27105 [Thread[Thread-1,5,main]]: add Event 
<<Observation[observable:frysk.proc.live.TaskObservable@2f8d02a5observer:frysk.proc.ProcBlockActi
on$ProcBlockTaskObserver@2f8de535]>>
00:00:00.068 27105.27105 [Thread[Thread-1,5,main]]: wakeupIfBlocked
00:00:00.069 27105.27105 [Thread[Thread-1,5,main]]: remove ...
00:00:00.069 27105.27105 [Thread[Thread-1,5,main]]: remove ... return 
<<frysk.util.ProcStopUtil$UtilEvent@2f8cff8d>>
00:00:00.069 27105.27105 [Thread[Thread-1,5,main]]: runEventLoop executing 
<<frysk.util.ProcStopUtil$UtilEvent@2f8cff8d>>
00:00:00.069 27105.27105 frysk.util.ProcStopUtil: execute live 
<<{frysk.proc.live.LinuxPtraceProc@2f8c7395,pid=2875,state=running}>>
00:00:00.069 27105.27105 [frysk.bindir.fstack$StackerEvent@2f574fe5]: printTasks
00:00:00.079 27105.27105 [frysk.stack.LibunwindAddressSpace@2f8b966d]: getReg 
<<UnwindRegistersX8664_RIP>> register <<[Register: rip type: 
{{frysk.value.PointerType@2f921485,name=VOIDPTR64L_T,size=8}}]>>
00:00:00.081 27105.27105 [frysk.sys.ptrace.RegisterSet@2f9692d5]: transfer <<2875>> read ...
00:00:00.082 27105.27105 [frysk.stack.LibunwindAddressSpace@2f8b966d]: read value 0x35c8acbd2f
00:00:00.082 27105.27105 [frysk.stack.LibunwindAddressSpace@2f8b966d]: getReg 
<<UnwindRegistersX8664_RSP>> register <<[Register: rsp type: 
{{frysk.value.PointerType@2f921485,name=VOIDPTR64L_T,size=8}}]>>
00:00:00.082 27105.27105 [frysk.sys.ptrace.RegisterSet@2f9692d5]: transfer <<2875>> read ...
00:00:00.082 27105.27105 [frysk.stack.LibunwindAddressSpace@2f8b966d]: read value 
0x7fff505d9cd0
00:00:00.083 27105.27105 frysk.stack.Frame: Getting task:  
<<{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForUnblock,sig
nal=SIGNONE(0)}>>
00:00:00.084 27105.27105 frysk.stack.Frame: Getting task:  
<<{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForUnblock,sig
nal=SIGNONE(0)}>>
00:00:00.084 27105.27105 frysk.dwfl.DwflCache: entering createDwfl, task 
<<{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForUnblock,sig
nal=SIGNONE(0)}>>
00:00:00.085 27105.27105 frysk.dwfl.DwflCache: creating new dwfl for task 
<<{frysk.proc.live.LinuxPtraceTask@2f8d7455,pid=2875,tid=2875,state=Attached.WaitForUnblock,sig
nal=SIGNONE(0)}>>
00:00:00.085 27105.27105 [frysk.util.ProcStopUtil$1@2f8543c5]: parse failed <<exception 
java.lang.NullPointerException>>
Error: java.lang.NullPointerException
Comment 1 Andrew Cagney 2008-06-10 22:53:19 UTC
Since I'm doing other fixes to fstack, I'll look at this one.
Comment 2 Ray Ruvinskiy 2008-06-11 02:56:02 UTC
I've dug into this. I hope you don't mind.

This actually seems to be a bug in the readlink function rather than in frysk. The call to readlink in frysk::sys::proc::Exe::get returns a much higher byte count as a return value than the actual length of 
the path it fills in the buffer. The buffer also seems to be partly filled with garbage:

(gdb) p file
$9 = "/proc/14211/exe", '\0' <repeats 258 times>, "?\224??*\000\000\000qV_?\177\000\000\000?
\224??*\000\000U\204??5", '\0' <repeats 1716 times>, "?\224??*\000\000?
\000\000\000\000\000\000\000*\000\000\000\000\000\000\000?\204??5", '\0' <repeats 1723 
times>, "8M??5\000\000\000\b\000\000\000\000\000\000\000\000K??5\000\000\000@U??
5\000\000\000?C??5", '\0' <repeats 11 times>, "8M??
5\000\000\000\b\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000?\230??
5\000\000\000?\006@?5\000\000\000@\205?\000\000\000\000\000@\210?\000"...
(gdb) p len
$10 = 48
(gdb) p link
$11 = "/usr/libexec/gconfd-2\000\000attachment-remi (deleted)", '\0' <repeats 849 times>, "?\224?
?*\000\000\000qV_?\177\000\000\000?\224??*\000\000U\204??5", '\0' <repeats 1716 times>, "?
\224??*\000\000\000qV_?\177\000\000\000?\224??*\000\000U\204??5", '\0' <repeats 1438 
times>

It looks like readlink fills in garbage (?) after the first NULL, and the byte count it returns includes this 
garbage. This confuses frysk, causing the NPE because we a null is returned from 
LinuxPtraceProc.getExeFile, at line 238.

I was able to reproduce this problem in a sample program outside of frysk. I have glibc 2.7-2, which 
appears to be the latest Fedora 8 package.

Also, it appears I was mistaken in saying that this problem exists on Ubuntu. I tried fstack again today 
on Ubuntu 8.04 (also glibc 2.7), and this did not reproduce.

Comment 3 Phil Muldoon 2008-06-11 07:50:46 UTC
Just for extra information, in Fedora 8, x86-64 I cannot reproduce on
gnome-termnal. Could be a Fedora 8 specific issue? 

Without debug-info installed:

[pmuldoon@localhost frysk_bin]$ ps -ax | grep gnome-terminal
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ
 3686 ?        Sl     0:00 gnome-terminal
 3820 pts/0    R+     0:00 grep gnome-terminal
[pmuldoon@localhost frysk_bin]$ ./frysk-core/frysk/bindir/fstack 3686
Task #3686
#0 0x000000332f4da926 in __poll ()
#1 0x000000329603ab18 in [unknown]
#2 0x000000329603b1ad in g_main_loop_run ()
#3 0x00000035edb83a98 in gtk_main ()
#4 0x000000000041959e in [unknown]
#5 0x000000332f41e32a in __libc_start_main ()
#6 0x000000000040ef39 in [unknown]

Task #3693
#0 0x000000333000db8b in __read ()
#1 0x000000329603852b in [unknown]
#2 0x0000003296060434 in [unknown]
#3 0x000000333000729a in start_thread ()
#4 0x000000332f4e42cd in __clone ()


With debug-info installed.

[pmuldoon@localhost frysk_bin]$ ./frysk-core/frysk/bindir/fstack 3686
Task #3686
#0 0x000000332f4da926 in __poll ()
#1 0x000000329603ab18 in g_main_context_iterate ()
#2 0x000000329603b1ad in g_main_loop_run ()
#3 0x00000035edb83a98 in gtk_main ()
#4 0x000000000041959e in [unknown]
#5 0x000000332f41e32a in __libc_start_main ()
#6 0x000000000040ef39 in [unknown]

Task #3693
#0 0x000000333000db8b in __read ()
#1 0x000000329603852b in child_watch_helper_thread ()
#2 0x0000003296060434 in g_thread_create_proxy ()
#3 0x000000333000729a in start_thread ()
#4 0x000000332f4e42cd in __clone ()
Comment 4 Andrew Cagney 2008-06-11 18:15:16 UTC
(In reply to comment #2)
> I've dug into this. I hope you don't mind.

not at all!

> This actually seems to be a bug in the readlink function rather than in frysk.
The call to readlink in frysk::sys::proc::Exe::get returns a much higher byte
count as a return value than the actual length of 
> the path it fills in the buffer. The buffer also seems to be partly filled
with garbage:
> 
> (gdb) p file
> $9 = "/proc/14211/exe", '\0' <repeats 258 times>,
"?\224??*\000\000\000qV_?\177\000\000\000?
> \224??*\000\000U\204??5", '\0' <repeats 1716 times>, "?\224??*\000\000?
> \000\000\000\000\000\000\000*\000\000\000\000\000\000\000?\204??5", '\0'
<repeats 1723 
> times>, "8M??5\000\000\000\b\000\000\000\000\000\000\000\000K??5\000\000\000@U??
> 5\000\000\000?C??5", '\0' <repeats 11 times>, "8M??
> 5\000\000\000\b\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000?\230??
> 5\000\000\000?\006@?5\000\000\000@\205?\000\000\000\000\000@\210?\000"...

ah!

this is sounding like the kernel; you could try "fexe" on a process showing the
problem or just ls -l /proc/PID/exe?

If the last one prints garbage then I think it is the kernel and we should push
it to fedora.

For a workaround, I guess we could sanity check using strlen and then log the
problem.
Comment 5 Andrew Cagney 2008-06-12 12:57:19 UTC
Here's a repo:

cagney@rat$ cp /bin/sleep /tmp
cagney@rat$ /tmp/sleep 1000 &
[1] 28790
cagney@rat$ 
cagney@rat$ 
cagney@rat$ fexe 28790
/tmp/sleep
cagney@rat$ 
cagney@rat$ rm /tmp/sleep 
cagney@rat$ fexe 28790
Error: java.lang.NullPointerException
cagney@rat$ 
cagney@rat$ ls -l /proc/28790/exe
lrwxrwxrwx 1 cagney cagney 0 2008-06-12 08:51 /proc/28790/exe -> /tmp/sleep
(deleted)

Hmm, I suspect this was an "it was working"; we've a relatively straight forward
test case implement.
Comment 6 Andrew Cagney 2008-06-12 20:34:42 UTC
This prints an error explainng what's gone wrong, rather than NPEing.

commit e7fe8b0edbf5bc538a3e7fbcd37997bc800d27a5
Author: Andrew Cagney <cagney@toil.yyz.redhat.com>
Date:   Thu Jun 12 16:31:00 2008 -0400

    When /proc/PID/exe is invalid throw an exception.
    
    frysk-core/frysk/bindir/ChangeLog
    2008-06-12  Andrew Cagney  <cagney@redhat.com>
    
        * TestFexe.java (testExeOfDeletedFile()): Enable.  Match expected
        error.
        * fexe.java: Use Exe.getName.
    
    frysk-core/frysk/proc/live/ChangeLog
    2008-06-12  Andrew Cagney  <cagney@redhat.com>
    
        * LinuxPtraceProc.java (getExe()): Delete.
    
    frysk-sys/frysk/sys/ChangeLog
    2008-06-12  Andrew Cagney  <cagney@redhat.com>
    
        * cni/Errno.cxx (throwUserException): New.
        * cni/Errno.hxx (throwUserException): New.
    
    frysk-sys/frysk/sys/proc/ChangeLog
    2008-06-12  Andrew Cagney  <cagney@redhat.com>
    
        * Exe.java (getName): Rename get.
        (openReadOnly): New.
        * jni/Exe.cxx (Exe::getName): Update.
        * cni/Exe.cxx (Exe::getName): Update.
    
    frysk-sys/jnixx/ChangeLog
    2008-06-12  Andrew Cagney  <cagney@redhat.com>
    
        * exceptions.cxx (userException): Implement.
        * exceptions.hxx (userException): Declare.

cagney@toil$ cp /bin/sleep .
cagney@toil$ ./sleep 1000 &
[2] 14879
cagney@toil$ ./frysk/bindir/fexe $!
/notnfs/cagney/git/native/frysk-core/sleep
cagney@toil$ rm sleep 
cagney@toil$ ./frysk/bindir/fexe $!
Error: The link /proc/14879/exe points to the deleted file
/notnfs/cagney/git/native/frysk-core/sleep
Comment 7 Ray Ruvinskiy 2008-06-12 21:17:17 UTC
In my case, the binary was not deleted. If you look at the gdb output, you will see that the buffer contains 
the valid path, followed by a NULL, followed by garbage which happens to include "(deleted)". The byte 
count went to the end of "(deleted)". Of course, printing out the buffer will result in the right output, due 
to the NULL at the end of the correct path.

There's definitely procfs or glibc weirdness here, but it has nothing to do with frysk.
Comment 8 Ray Ruvinskiy 2008-06-13 04:28:48 UTC
It appears that as of commit 531115f2541df6ed1e8a8f16e24c519fb858b209, Errno.h includes 
frysk/UserException.h, but the latter does not seem to have been checked in, causing a build failure.
Comment 9 Andrew Cagney 2008-06-13 18:41:52 UTC
(In reply to comment #7)
> In my case, the binary was not deleted. If you look at the gdb output, you
will see that the buffer contains 
> the valid path, followed by a NULL, followed by garbage which happens to
include "(deleted)".

Ah, yes, changing the code, I found an old comment (now living in
frysk/sys/proc/{cni,jni}/Exe.cxx:

  if ((int) strlen(link) != len) {
    // Assume that an EXE that has somehow ended up with an embedded
    // NUL character is invalid.  This happens when the kernel screws
    // up "mv a-really-long-file $exe" leaving the updated EXE string
    // with something like "$exe<NUL>ally-long-file (deleted)".
    userException(env, "The link %s is corrupt", file);
  }

here's the actual sequence, I think I might tweak fexe a little so it prints
this out:

cagney@rat$ /tmp/sleep 1000 &
[1] 31378
cagney@rat$ 
cagney@rat$ mv /tmp/a-really-long-file /tmp/sleep
cagney@rat$ ls -l /proc/31378/exe
lrwxrwxrwx 1 cagney cagney 0 2008-06-13 14:21 /proc/31378/exe -> /tmp/sleep
cagney@rat$ 

(the string actually contains "/tmp/sleep<NUL>.... (deleted)"), and as you can
see the exe, while looking ok, is pointing to the wrong file.  Pre-link, and
possibly update, can do this.

(i've re-opened the bug) Can you check for this.  I'm also going to tweak fexe
so its easier to see this.
Comment 10 Ray Ruvinskiy 2008-06-23 12:16:38 UTC
Ah, this makes sense and explains what I've been seeing.

I originally built frysk on Ubuntu. Prior to doing that, I updated quite a few packages and did not reboot 
or restart X. When I tried fstack, I would get the NPE.

I then decided to try frysk on Fedora but wasn't able to get to it for several weeks. When I had some 
free time, I installed a Fedora 8 VM and proceeded to update all my packages (which was a fun project 
in itself...). I then built frysk and saw the same NPE. However, in those two weeks, I had rebooted my 
Ubuntu machine, and so when I went back to try frysk on it again, I could not reproduce the problem.

Tricky little problem.