Bug 4313

Summary: fcore on 'sleep' process doesn't work on x86_64
Product: frysk Reporter: Mark Wielaard <mark>
Component: generalAssignee: Unassigned <frysk-bugzilla>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on: 3388    
Bug Blocks:    
Attachments: Debug statements patch

Description Mark Wielaard 2007-04-03 14:45:28 UTC
As reported at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234879

The following fcore example doesn't work on x86_64:

sleep 1h & pid=$! ;sleep 1;fcore -o /tmp/sleep.core $pid

fcore (from CVS) takes a very long time (about 7 minutes) on the above example
and then generates the following stacktrace:

Exception in thread "main" inua.eio.BufferUnderflowException
   at inua.eio.ByteBuffer.get(fcore)
   at frysk.util.CoredumpAction$CoreMapsBuilder.buildMap(fcore)
   at frysk.sys.proc.MapsBuilder.construct(fcore)
   at frysk.sys.proc.MapsBuilder.construct(fcore)
   at frysk.util.CoredumpAction.write_elf_file(fcore)
   at frysk.util.CoredumpAction.allExistingTasksCompleted(fcore)
   at frysk.proc.ProcBlockAction.checkFinish(fcore)
   at frysk.proc.ProcBlockAction$ProcBlockTaskObserver$1.execute(fcore)
   at frysk.event.EventLoop.runEventLoop(fcore)
   at frysk.event.EventLoop.run(fcore)
   at fcore.main(fcore)


Comment #3 From Phil Muldoon (pmuldoon@redhat.com) 	on 2007-04-03 08:56 EST 
[reply] 	Private

Hi Mark,

Locally (from CVS) I cannot reproduce as I get:

[pmuldoon@localhost filesystems]$ sleep 1h & pid=$! ;sleep 1;fcore -o
/tmp/sleep.core $pid
[1] 8533
[pmuldoon@localhost filesystems]$ 

[pmuldoon@localhost filesystems]$ eu-readelf  -h /tmp/sleep.core.8533 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              CORE (Core file)

Additionally it completes in < 30 seconds. Is there access to your machine so I
can test?


Comment #4 From Mark Wielaard (mwielaard@redhat.com) 	on 2007-04-03 09:11 EST 
[reply] 	Private

(In reply to comment #3)
> Locally (from CVS) I cannot reproduce as I get:
> [...]
> [pmuldoon@localhost filesystems]$ eu-readelf  -h /tmp/sleep.core.8533 
> > [...]
> Additionally it completes in < 30 seconds. Is there access to your machine so I
> can test?

Cool! That looks promising. I am using a x86_64 machine for my tests (looks you
use something 32 bits based). I'll try to find you on irc.gimp.org in #frysk to
coordinate a debugging session.

Thanks,

Mark
Comment 1 Phil Muldoon 2007-04-03 15:05:24 UTC
Created attachment 1663 [details]
Debug statements patch
Comment 2 Phil Muldoon 2007-04-03 15:06:11 UTC
Mark can you apply attachment 1663 [details] as a patch to CVS HEAD and print the output here.
Comment 3 Mark Wielaard 2007-04-03 15:24:01 UTC
proc.getMainTask().getMemory().get(0x604000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x605000, memory, 0, 0x21000);
proc.getMainTask().getMemory().get(0x3347c19000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x3347c1a000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x3348144000, memory, 0, 0x4000);
proc.getMainTask().getMemory().get(0x3348148000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x3348149000, memory, 0, 0x5000);
proc.getMainTask().getMemory().get(0x2aaaaaaab000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x2aaaaaac8000, memory, 0, 0x2000);
proc.getMainTask().getMemory().get(0x2aaaaaaca000, memory, 0, 0x34ff000);
proc.getMainTask().getMemory().get(0x7fffbd139000, memory, 0, 0x15000);
proc.getMainTask().getMemory().get(0xffffffffff600000, memory, 0, 0x1000);
Exception in thread "main" inua.eio.BufferUnderflowException
   at inua.eio.ByteBuffer.get(fcore)
   at frysk.util.CoredumpAction$CoreMapsBuilder.buildMap(fcore)
   at frysk.sys.proc.MapsBuilder.construct(fcore)
   at frysk.sys.proc.MapsBuilder.construct(fcore)
   at frysk.util.CoredumpAction.write_elf_file(fcore)
   at frysk.util.CoredumpAction.allExistingTasksCompleted(fcore)
   at frysk.proc.ProcBlockAction.checkFinish(fcore)
   at frysk.proc.ProcBlockAction$ProcBlockTaskObserver$1.execute(fcore)
   at frysk.event.PollEventLoop.runEventLoop(fcore)
   at frysk.event.EventLoop.run(fcore)
   at fcore.main(fcore)

According to /proc/<pid>/map this is:
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vdso]
Comment 4 Phil Muldoon 2007-04-03 16:28:43 UTC
Looking in LinuxPtraceTask fillMemory, the memory is mapped via a PtraceByteBuffer:

      if (getIsa().getWordSize() == 8)
          memory = new PtraceByteBuffer(id.id, PtraceByteBuffer.Area.DATA,
                                        0x7fffffffffffffffl);
      // For 32-bit address space.
      else
          memory = new PtraceByteBuffer(id.id, PtraceByteBuffer.Area.DATA,
                                        0xffffffffl);

Is this bug happening as the maps overflows the mapped space?
Comment 5 Mark Wielaard 2007-04-04 16:05:52 UTC
Fixed by:

2007-04-04  Tim Moore  <timoore@redhat.com>

        * LinuxPtraceTask.java (fillMemory): Allocate PtraceByteBuffer with
        'unsigned' long (-1) length.