Need to investigate further. rmmod a systemtap module on an SMP system. There is a good chance this will cause a crash. ep 14 09:52:23 dragon kernel: <c044a98a> softlockup_tick+0xad/0xc4 <c042d858> update_process_times+0x39/0x5c Sep 14 09:52:23 dragon kernel: <c0418af3> smp_apic_timer_interrupt+0x5a/0x63 <c040490f> apic_timer_interrupt+0x1f/0x24 Sep 14 09:52:23 dragon kernel: <c042d550> lock_timer_base+0x27/0x2f <c042d569> try_to_del_timer_sync+0x11/0x4a Sep 14 09:52:23 dragon kernel: <c042d5ac> del_timer_sync+0xa/0x10 <f8e62ff7> _stp_kill_time+0x21/0x41 [stap_2872] Sep 14 09:52:23 dragon kernel: <f8e63065> _stp_cleanup_and_exit+0x4e/0x62 [stap_2872] <c04465b4> stop_machine_run+0x2e/0x34 Sep 14 09:52:23 dragon kernel: <f8e63086> _stp_transport_close+0xd/0x5f [stap_2872] <c043eb8b> sys_delete_module+0x192/0x1bb Sep 14 09:52:23 dragon kernel: <c045be81> do_munmap+0x196/0x1af <c0403e3f> syscall_call+0x7/0xb Culprit is in runtime/time.c (_stp_kill_time). I've been successfully running this rewrite of that function, but it is an ugly hack. for_each_online_cpu(cpu) { stp_time_t *time = &per_cpu(stp_time, cpu); int retries = 0; while (!del_timer(&time->timer)) { retries++; if (retries > 1024) { printk("Exceeded retry count in _stp_kill_time\n"); break; } } } Need to cleanup and understand this better. See also possibly related bug http://sources.redhat.com/bugzilla/show_bug.cgi?id=2989
I've checked in some changes to how the timers are initialized and deleted. I also changed the percpu allocations to dynamic so we can run multiple systemtap modules safely. This seems to have fixed all the timer-related problems I was seeing, including this one.