4313 – fcore on 'sleep' process doesn't work on x86_64

Bug 4313 - fcore on 'sleep' process doesn't work on x86_64

Summary: fcore on 'sleep' process doesn't work on x86_64

Status:	RESOLVED FIXED

Alias:	None

Product:	frysk
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Unassigned

URL:
Keywords:

Depends on:	3388
Blocks:
	Show dependency tree / graph

Reported:	2007-04-03 14:45 UTC by Mark Wielaard
Modified:	2007-04-04 15:05 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Debug statements patch (369 bytes, text/plain) 2007-04-03 15:05 UTC, Phil Muldoon	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mark Wielaard 2007-04-03 14:45:28 UTC

As reported at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234879

The following fcore example doesn't work on x86_64:

sleep 1h & pid=$! ;sleep 1;fcore -o /tmp/sleep.core $pid

fcore (from CVS) takes a very long time (about 7 minutes) on the above example
and then generates the following stacktrace:

Exception in thread "main" inua.eio.BufferUnderflowException
   at inua.eio.ByteBuffer.get(fcore)
   at frysk.util.CoredumpAction$CoreMapsBuilder.buildMap(fcore)
   at frysk.sys.proc.MapsBuilder.construct(fcore)
   at frysk.sys.proc.MapsBuilder.construct(fcore)
   at frysk.util.CoredumpAction.write_elf_file(fcore)
   at frysk.util.CoredumpAction.allExistingTasksCompleted(fcore)
   at frysk.proc.ProcBlockAction.checkFinish(fcore)
   at frysk.proc.ProcBlockAction$ProcBlockTaskObserver$1.execute(fcore)
   at frysk.event.EventLoop.runEventLoop(fcore)
   at frysk.event.EventLoop.run(fcore)
   at fcore.main(fcore)


Comment #3 From Phil Muldoon (pmuldoon@redhat.com) 	on 2007-04-03 08:56 EST 
[reply] 	Private

Hi Mark,

Locally (from CVS) I cannot reproduce as I get:

[pmuldoon@localhost filesystems]$ sleep 1h & pid=$! ;sleep 1;fcore -o
/tmp/sleep.core $pid
[1] 8533
[pmuldoon@localhost filesystems]$ 

[pmuldoon@localhost filesystems]$ eu-readelf  -h /tmp/sleep.core.8533 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              CORE (Core file)

Additionally it completes in < 30 seconds. Is there access to your machine so I
can test?


Comment #4 From Mark Wielaard (mwielaard@redhat.com) 	on 2007-04-03 09:11 EST 
[reply] 	Private

(In reply to comment #3)
> Locally (from CVS) I cannot reproduce as I get:
> [...]
> [pmuldoon@localhost filesystems]$ eu-readelf  -h /tmp/sleep.core.8533 
> > [...]
> Additionally it completes in < 30 seconds. Is there access to your machine so I
> can test?

Cool! That looks promising. I am using a x86_64 machine for my tests (looks you
use something 32 bits based). I'll try to find you on irc.gimp.org in #frysk to
coordinate a debugging session.

Thanks,

Mark

Comment 1 Phil Muldoon 2007-04-03 15:05:24 UTC

Created attachment 1663 [details]
Debug statements patch

Comment 2 Phil Muldoon 2007-04-03 15:06:11 UTC

Mark can you apply attachment 1663 [details] as a patch to CVS HEAD and print the output here.

Comment 3 Mark Wielaard 2007-04-03 15:24:01 UTC

proc.getMainTask().getMemory().get(0x604000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x605000, memory, 0, 0x21000);
proc.getMainTask().getMemory().get(0x3347c19000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x3347c1a000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x3348144000, memory, 0, 0x4000);
proc.getMainTask().getMemory().get(0x3348148000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x3348149000, memory, 0, 0x5000);
proc.getMainTask().getMemory().get(0x2aaaaaaab000, memory, 0, 0x1000);
proc.getMainTask().getMemory().get(0x2aaaaaac8000, memory, 0, 0x2000);
proc.getMainTask().getMemory().get(0x2aaaaaaca000, memory, 0, 0x34ff000);
proc.getMainTask().getMemory().get(0x7fffbd139000, memory, 0, 0x15000);
proc.getMainTask().getMemory().get(0xffffffffff600000, memory, 0, 0x1000);
Exception in thread "main" inua.eio.BufferUnderflowException
   at inua.eio.ByteBuffer.get(fcore)
   at frysk.util.CoredumpAction$CoreMapsBuilder.buildMap(fcore)
   at frysk.sys.proc.MapsBuilder.construct(fcore)
   at frysk.sys.proc.MapsBuilder.construct(fcore)
   at frysk.util.CoredumpAction.write_elf_file(fcore)
   at frysk.util.CoredumpAction.allExistingTasksCompleted(fcore)
   at frysk.proc.ProcBlockAction.checkFinish(fcore)
   at frysk.proc.ProcBlockAction$ProcBlockTaskObserver$1.execute(fcore)
   at frysk.event.PollEventLoop.runEventLoop(fcore)
   at frysk.event.EventLoop.run(fcore)
   at fcore.main(fcore)

According to /proc/<pid>/map this is:
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vdso]

Comment 4 Phil Muldoon 2007-04-03 16:28:43 UTC

Looking in LinuxPtraceTask fillMemory, the memory is mapped via a PtraceByteBuffer:

      if (getIsa().getWordSize() == 8)
          memory = new PtraceByteBuffer(id.id, PtraceByteBuffer.Area.DATA,
                                        0x7fffffffffffffffl);
      // For 32-bit address space.
      else
          memory = new PtraceByteBuffer(id.id, PtraceByteBuffer.Area.DATA,
                                        0xffffffffl);

Is this bug happening as the maps overflows the mapped space?

Comment 5 Mark Wielaard 2007-04-04 16:05:52 UTC

Fixed by:

2007-04-04  Tim Moore  <timoore@redhat.com>

        * LinuxPtraceTask.java (fillMemory): Allocate PtraceByteBuffer with
        'unsigned' long (-1) length.