Bug 1813 - crash due to __kmalloc probe / kprobe-registration reentrancy
Summary: crash due to __kmalloc probe / kprobe-registration reentrancy
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: kprobes (show other bugs)
Version: unspecified
: P1 normal
Target Milestone: ---
Assignee: Jim Keniston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-04 22:59 UTC by James Dickens
Modified: 2006-04-10 14:15 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2005-11-08 21:26:28


Attachments
Simple module to probe __kmalloc (420 bytes, text/plain)
2005-11-05 01:24 UTC, Jim Keniston
Details
partial panic screenshot (13.82 KB, image/png)
2005-11-08 21:27 UTC, Frank Ch. Eigler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description James Dickens 2005-11-04 22:59:04 UTC
Patched the kernel to the latest Fedora FC4

Linux localhost.localdomain 2.6.13-1.1532_FC4 #1 Thu Oct 20 01:30:08 EDT 2005
i686 athlon i386 GNU/Linux

this kernel should have support for Multiple kprobes at an address
http://lwn.net/Articles/132787/

running two copies of this systemtap oopes the system.  

global called

probe kernel.function("__kmalloc")
{
       called++;
}

stap version.  
[jamesd@localhost ~]$ stap -V
SystemTap translator/driver (version 0.4.2 built 2005-10-31)
Copyright (C) 2005 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
[jamesd@localhost ~]$
Comment 1 Jim Keniston 2005-11-05 01:24:44 UTC
Created attachment 745 [details]
Simple module to probe __kmalloc

I don't see the problem on my system (i386 SMP), using either stap or raw
kprobes.

Take the attached probe2x.c, make a copy (call it probe1x.c), and in the copy
change all "2x" to "1x".  Compile both and insmod them both.  Do stuff.  rmmod
both.  Works for me.
Comment 2 James Dickens 2005-11-06 17:46:12 UTC
Subject: Fwd:  Multiple kprobes at an address, Doesn't work

---------- Forwarded message ----------
From: James Dickens <jamesd.wi@gmail.com>
Date: Nov 6, 2005 11:03 AM
Subject: Re: [Bug kprobes/1813] Multiple kprobes at an address, Doesn't work
To: sourceware-bugzilla@sourceware.org



On 5 Nov 2005 01:24:44 -0000, jkenisto at us dot ibm dot com
<sourceware-bugzilla@sourceware.org > wrote:
>
> ------- Additional Comments From jkenisto at us dot ibm dot com  2005-11-05 01:24 -------
> Created an attachment (id=745)
>  --> (http://sourceware.org/bugzilla/attachment.cgi?id=745&action=view)
> Simple module to probe __kmalloc
>
> I don't see the problem on my system (i386 SMP), using either stap or raw
> kprobes.
>
> Take the attached probe2x.c, make a copy (call it probe1x.c), and in the copy
> change all "2x" to "1x".  Compile both and insmod them both.  Do stuff.  rmmod
> both.  Works for me.
>
> --
okay i can't find the magic arguments to make it compile at the
command line, me and a friend both see this on there systems. What
kernel versions are you using? gcc version? i'm using  fedora FC4,
installed the latest elfutils  and the latest kernel, the rest is bone
stock.




> http://sourceware.org/bugzilla/show_bug.cgi?id=1813
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>
Comment 3 Frank Ch. Eigler 2005-11-08 21:26:28 UTC
I have reproduced this problem on the RHEL4U2 kernel (22.EL).  The problem
appears to be that __kmalloc is invoked from the registration function of
subsequent kprobe session startup.  This trips the int3 placed from within the
first probe.  Indirectly, this appears to lead to a kprobe reentrancy based
panic.  It would require analysis or experiments to determine whether the RCU
lockless code fares any better.

Unfortunately one can't reasonably kludge around this defect by using the new
translator blacklist to enumerate every area of the kernel possibly used during
a registration.
Comment 4 Frank Ch. Eigler 2005-11-08 21:27:11 UTC
Created attachment 751 [details]
partial panic screenshot
Comment 5 Prasanna S Panchamukhi 2005-11-11 15:25:47 UTC
I dont see either a crash/panic on my i386 smp as well as uni processor box
running vmlinuz-2.6.13-1.1532_FC4. Could you please check if the problem exits
with this kernel. I will check with RHEL4U2 kernel.

-Prasanna
Comment 6 Frank Ch. Eigler 2005-11-12 12:48:05 UTC
The conceptual problem remains, even if one happens to be unable to reproduce
some particular test case.  If any of the kernel services transitively involved
in performing kprobe administration (registration, unregistration, probe
triggering, etc.) are possibly probed by another kprobes/systemtap session, we
get an instant reentrancy situation.

The RCU kprobes may or may not handle this better, but it needs analysis not
experimentation to ascertain.
Comment 7 Jim Keniston 2005-11-19 01:11:22 UTC
Frank's analysis in Comment #3 is correct.  register_kprobe() grabs the
kprobe_lock and then, if there's already a probe at that address, calls
register_aggr_kprobe(), which may call __kmalloc() (via kcalloc()).  This is OK
because it's a GFP_ATOMIC allocation.

In pre-RCU versions of Kprobes, Kprobes runs handlers while holding the
kprobe_lock.  Thus, if there's a probe on __kmalloc(), we deadlock if
register_aggr_kprobe() is called.  So the failure we see here is due to the
probe on __kmalloc() combined with registering two probes at the same address
(ANY address).

This is not a problem in the RCU version of Kprobes (e.g., RHEL4 U3 in recent
days), because Kprobes holds no locks while running handlers.  So the specific
problem in question is fixed, and it's safe to probe *alloc().

If RH is concerned about the general problem of Kprobes behaving badly when you
ask it to probe itself (which the Kprobes documentation specifically advises
against), then the first step would be to pick up Prasanna's
__kprobes-declaration patches from the mainline kernel.
Comment 8 Frank Ch. Eigler 2005-11-21 18:31:05 UTC
Would it be possible to have you (RCU) folks write up a systemtap test case that
attempts to break on several worst-case probe points that are transitively
reachable from kprobes registration/execution/unregistration routines? 
Something more aggressive than just __kmalloc?  While this test would crash at
the moment, it would be a good stress test for the new RCU baseline.
Comment 9 Ananth Mavinakayanahalli 2006-04-10 14:15:26 UTC
This problem no longer exists with RCU. Regression test
http://sources.redhat.com/cgi-bin/cvsweb.cgi/tests/kernel/kzalloc_crash_bz1813/?cvsroot=systemtap
 confirms this. We use kzalloc now and hence the probes are on __kzalloc.
Changing it to __kmalloc too doesn't produce any failures.