This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Script to measure resource usage based on process arguments

mmlnx wrote:

> Forgot to mention one important thing... I've seen this script crash
> the system a few times.  The sequence of events that cause the crash
> are a bit strange.  I start the script, then start a kernel build.
> The script and build both run to completion.  Then I start the script
> again and it crashes. [...]

Interesting.  It's a little reminiscent of an old bug that did not
correctly unregister all kprobes under some failure exit conditions.

> Unable to handle kernel paging request at ffffffff8832e1dd RIP:
> [<ffffffff8832e1dd>]
> PGD 203027 PUD 205027 PMD 396e6067 PTE 0
> Oops: 0010 [1] SMP last sysfs file: /module/scsi_mod/sections/.text
> CPU 0 Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth
> sunrpc dm_mirror dm_mod video sbs i2c_ec button battery asus_acpi ac
> lp parport_pc parport snd_hda_intel snd_hda_codec snd_seq_dummy
> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss sg
> e1000 snd_mixer_oss snd_pcm serio_raw snd_timer ehci_hcd snd uhci_hcd
> ide_cd i2c_i801 shpchp soundcore snd_page_alloc cdrom i2c_core pcspkr
> ext3 jbd ahci libata sd_mod scsi_mod

Note that no stap_* module is listed here as loaded.  This could mean
that the problem occurred during initialization of the new copy of the
module. From the "bad RIP value" / fault traceback, it looks as if
there was still some kind of timer task left running in the system by
a prior systemtap run.  That module was then unloaded, which unmapped
the module executable region.  It would help if you followed the steps
in the HowToReportProblems wiki page, specifically to identify
probable stap module loading addresses.

We use timer type tasks in two contexts: probes on and like,
and an I/O related widget in the runtime.  We should review both bits
of code to ensure that we always start up and clean up carefully.  It
is probably also helpful to have systemtap emit a few DEBUG-level
printk's during setup to note vital statistics about systemtap
modules: base addresses, number of probes, memory used, that sort of
thing.  Any volunteers?

- FChE

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]