Bug 4066 - hist_linear() with large H value crashes system
Summary: hist_linear() with large H value crashes system
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 critical
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-16 21:39 UTC by Mike Mason
Modified: 2007-02-17 12:28 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Mason 2007-02-16 21:39:03 UTC
I was writing a simple script that uses hist_linear() and decided to pass it a
ridiculously large H value to see what happens.  To my surprise, it crashed the
system.  

Here's the script:

global reads
probe netdev.receive { reads <<< length }
probe end { print (@hist_linear(reads, 0, 300000000000, 50)) }

I tried the script on ppc64 and x86 running SLES10 SP1 and x86_64 running RHEL5
RC1 and the latest FC5 kernel.  I tried the SLES10 SP1, RHEL5 RC1 and CVS
versions of systemtap.  All crashed.  I suspect there's a problem in
_stp_stat_init(). More details to follow.
Comment 1 Mike Mason 2007-02-16 22:13:15 UTC
Console output from crash on an x86_64 system running 2.6.19-1.2288.fc5 and
systemtap CVS from Feb 15th: 

Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
 [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: /module/scsi_mod/sections/.text
CPU 0 
Modules linked in: stap_d084cedcd06497638f61939c59dd9ce0_807(U) ipv6 autofs4
hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_mod video sbs i2c_ec button
battery asus_acpi ac lp parport_pc parport snd_hda_intel snd_hda_codec
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
ehci_hcd uhci_hcd snd_mixer_oss sg ata_piix snd_pcm e1000 ide_cd serio_raw
snd_timer i2c_i801 snd soundcore cdrom snd_page_alloc i2c_core pcspkr shpchp
ext3 jbd ahci libata sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.19-1.2288.fc5 #1
RIP: 0010:[<ffffffff8833c2f0>]  [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
RSP: 0018:ffffffff806bae70  EFLAGS: 00010093
RAX: 0000000000000002 RBX: ffff810025d1e000 RCX: ffff810025d1e048
RDX: 0000000000000063 RSI: 0000000000000063 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff88345b60 R09: 0000000000000000
R10: ffff81003fc8fc80 R11: 0000000000000000 R12: ffffffff8022056b
R13: ffffffff806baf58 R14: ffffffff806a9800 R15: ffff81003fc7e580
FS:  0000000000000000(0000) GS:ffffffff805ff000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000037947000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80658000, task ffffffff80565640)
Stack:  ffffffff883406e4 0000000000000000 0000000000000082 ffff8100020445a0
 ffffffff883402b1 0000000000000000 ffff810025d1e000 ffffffff88345b60
 ffffffff80264376 0000000000000000 ffffffff806baf18 0000000000000002
Call Trace:
 [<ffffffff883406e4>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:probe_1495+0x128/0x1c4
 [<ffffffff883402b1>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:enter_kprobe_probe+0xf3/0x18d
 [<ffffffff80264376>] kprobe_handler+0x18f/0x1bf
 [<ffffffff802643e1>] kprobe_exceptions_notify+0x3b/0x72
 [<ffffffff80265094>] notifier_call_chain+0x20/0x32
 [<ffffffff80263c67>] do_int3+0x42/0x83
 [<ffffffff802633c3>] int3+0x93/0xb0
 [<ffffffff8022056c>] netif_receive_skb+0x1/0x3da
 [<ffffffff8810e361>] :e1000:e1000_clean_rx_irq+0x470/0x52f
 [<ffffffff8810d264>] :e1000:e1000_clean+0x8c/0x159
 [<ffffffff8020c37c>] net_rx_action+0xa4/0x1a7
 [<ffffffff80211ee5>] __do_softirq+0x55/0xc4
 [<ffffffff8025d24c>] call_softirq+0x1c/0x30
 [<ffffffff8026aa5a>] do_softirq+0x2c/0x97
 [<ffffffff8026abf5>] do_IRQ+0x130/0x151
 [<ffffffff8025c641>] ret_from_intr+0x0/0xa
 [<ffffffff8026911d>] mwait_idle_with_hints+0x44/0x45
 [<ffffffff80255eee>] mwait_idle+0xc/0x20
 [<ffffffff80247ec6>] cpu_idle+0x8b/0xae
 [<ffffffff806627a0>] start_kernel+0x240/0x245
 [<ffffffff8066215a>] _sinittext+0x15a/0x15e


Code: 48 8b 47 18 65 8b 14 25 24 00 00 00 48 63 d2 48 f7 d0 4c 8b 
RIP  [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
 RSP <ffffffff806bae70>
CR2: 0000000000000018
 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():1, irqs_disabled():1

Call Trace:
 [<ffffffff802699c5>] show_trace+0x34/0x47
 [<ffffffff802699ea>] dump_stack+0x12/0x17
 [<ffffffff8029cc94>] down_read+0x15/0x23
 [<ffffffff80294cb3>] blocking_notifier_call_chain+0x13/0x36
 [<ffffffff8021505d>] do_exit+0x20/0x97d
 [<ffffffff80264ff9>] do_page_fault+0x7a1/0x81c
 [<ffffffff8026307d>] error_exit+0x0/0x84
 [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
 [<ffffffff883406e4>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:probe_1495+0x128/0x1c4
 [<ffffffff883402b1>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:enter_kprobe_probe+0xf3/0x18d
 [<ffffffff80264376>] kprobe_handler+0x18f/0x1bf
 [<ffffffff802643e1>] kprobe_exceptions_notify+0x3b/0x72
 [<ffffffff80265094>] notifier_call_chain+0x20/0x32
 [<ffffffff80263c67>] do_int3+0x42/0x83
 [<ffffffff802633c3>] int3+0x93/0xb0
 [<ffffffff8022056c>] netif_receive_skb+0x1/0x3da
 [<ffffffff8810e361>] :e1000:e1000_clean_rx_irq+0x470/0x52f
 [<ffffffff8810d264>] :e1000:e1000_clean+0x8c/0x159
 [<ffffffff8020c37c>] net_rx_action+0xa4/0x1a7
 [<ffffffff80211ee5>] __do_softirq+0x55/0xc4
 [<ffffffff8025d24c>] call_softirq+0x1c/0x30
 [<ffffffff8026aa5a>] do_softirq+0x2c/0x97
 [<ffffffff8026abf5>] do_IRQ+0x130/0x151
 [<ffffffff8025c641>] ret_from_intr+0x0/0xa
 [<ffffffff8026911d>] mwait_idle_with_hints+0x44/0x45
 [<ffffffff80255eee>] mwait_idle+0xc/0x20
 [<ffffffff80247ec6>] cpu_idle+0x8b/0xae
 [<ffffffff806627a0>] start_kernel+0x240/0x245
 [<ffffffff8066215a>] _sinittext+0x15a/0x15e

BUG: scheduling while atomic: swapper/0x10000100/0

Call Trace:
 [<ffffffff802699c5>] show_trace+0x34/0x47
 [<ffffffff802699ea>] dump_stack+0x12/0x17
 [<ffffffff802604ae>] __sched_text_start+0x5e/0xadc
 [<ffffffff802889fa>] __cond_resched+0x2d/0x55
 [<ffffffff8026104c>] cond_resched+0x2e/0x39
 [<ffffffff8029cc99>] down_read+0x1a/0x23
 [<ffffffff80294cb3>] blocking_notifier_call_chain+0x13/0x36
 [<ffffffff8021505d>] do_exit+0x20/0x97d
 [<ffffffff80264ff9>] do_page_fault+0x7a1/0x81c
 [<ffffffff8026307d>] error_exit+0x0/0x84
 [<ffffffff8833c2f0>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:_stp_stat_add+0x0/0x155
 [<ffffffff883406e4>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:probe_1495+0x128/0x1c4
 [<ffffffff883402b1>]
:stap_d084cedcd06497638f61939c59dd9ce0_807:enter_kprobe_probe+0xf3/0x18d
 [<ffffffff80264376>] kprobe_handler+0x18f/0x1bf
 [<ffffffff802643e1>] kprobe_exceptions_notify+0x3b/0x72
 [<ffffffff80265094>] notifier_call_chain+0x20/0x32
 [<ffffffff80263c67>] do_int3+0x42/0x83
 [<ffffffff802633c3>] int3+0x93/0xb0
 [<ffffffff8022056c>] netif_receive_skb+0x1/0x3da
 [<ffffffff8810e361>] :e1000:e1000_clean_rx_irq+0x470/0x52f
 [<ffffffff8810d264>] :e1000:e1000_clean+0x8c/0x159
 [<ffffffff8020c37c>] net_rx_action+0xa4/0x1a7
 [<ffffffff80211ee5>] __do_softirq+0x55/0xc4
 [<ffffffff8025d24c>] call_softirq+0x1c/0x30
 [<ffffffff8026aa5a>] do_softirq+0x2c/0x97
 [<ffffffff8026abf5>] do_IRQ+0x130/0x151
 [<ffffffff8025c641>] ret_from_intr+0x0/0xa
 [<ffffffff8026911d>] mwait_idle_with_hints+0x44/0x45
 [<ffffffff80255eee>] mwait_idle+0xc/0x20
 [<ffffffff80247ec6>] cpu_idle+0x8b/0xae
 [<ffffffff806627a0>] start_kernel+0x240/0x245
 [<ffffffff8066215a>] _sinittext+0x15a/0x15e

Kernel panic - not syncing: Aiee, killing interrupt handler!
Comment 2 Josh Stone 2007-02-16 22:41:22 UTC
(In reply to comment #0)
> I suspect there's a problem in _stp_stat_init().

The problem is we're not checking the return value.  The initialization code for
"reads" goes:

  global_reads = _stp_stat_init (HIST_LINEAR, 0, 300000000000, 50);
  if (rc) {
    _stp_error ("global variable reads allocation failed");
    goto out;
  }
  rwlock_init (& global_reads_lock);

_stp_stat_init returns NULL if a problem occurs, but the caller is only checking
'rc', which is untouched.

Comment 3 Frank Ch. Eigler 2007-02-17 12:28:46 UTC
Patch committed.  Thanks to Josh for the quick analysis.
This would be an appropriate sort of test to add to the suite coming for bug #3591.