This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Multiple kprobes at an address redux (take3)


This patch builds on Suparna's idea of a layered approach to providing
multiple kprobe support at an address.

Salient features:

- The interfaces to register/unregister kprobes remains the same.
- The kprobes infrastructure takes care if a kprobe already exists at
  the requested address.
- No new structures are added.
- Current kprobes code not modified (except to renaming of routines).
- Architecture agnostic approach.
- We now track the currently executing kprobe (in case of multiple
  kprobes at the address). In the case of a fault during handler
  execution, only the current kprobe's fault handler is invoked.
  This insulates other kprobes at the location from the fault.

Details of the design are in the accompanying document.

Patch is against 2.6.12-rc2-mm2.

Draft of 11 April 2005.

1	Multiple kprobes at an address

1.1	Overview

One of the requirements of SystemTAP is the capability of defining 
multiple kprobes per probe address. Current design of kprobes does 
not allow the registration of more than one kprobe at an address.

This is an attempt to come up with a design that will enable a user
to register multiple kprobes at a given address. 

2	Design

2.1	struct kprobe

Struct kprobe will now have an additional "list" field to 
facilitate list.h usage.

struct kprobe {
	struct hlist_head hlist;
	struct list_head list;
	kprobe_opcode_t *addr;
	kprobe_pre_handler_t pre_handler;
	kprobe_post_handler_t post_handler;
	kprobe_fault_handler_t fault_handler;
	kprobe_break_handler_t break_handler;
	kprobe_opcode_t opcode;
	struct arch_specific_insn ainsn;

NOTE: It is always a good idea to set the handler fields not being 
used explicitly to NULL. The design determines if a probe is a 
jprobe by virtue of the fact that only jprobes define break_handlers. 

2.2	Base kprobes infrastructure

With this design, the "current" register/unregister_kprobe() 
routines have been renamed register/unregister_kprobe_single() 
and a wrapper takes care of invoking the _single() versions when 
necessary. The _single() versions are NOT exported. 

With this change:

- The current interfaces (register/unregister_kprobes()) continue
  to be the only ones available
- The interfaces can be used without bothering if a kprobe already 
  existed at the required address. The wrappers take care of that
- The current base kprobes code does not change, except of course,
  the renaming.

2.2.1	Registering a kprobe

1. Kprobes are registered with a call to register_kprobe().
2. The kprobe infrastructure determines if a probe already exists at
   the requested location. 
3. If this is a new kprobe, registration proceeds as it done
4. If a kprobe already exists at the requested address:
   a. If the call is to register a jprobe, the new handler 
      registration fails since we can't have a jprobe and a kprobe 
      at the same address. Whether a registration is for a jprobe
      or not is determined by the presence of a break_handler.
   b. If the call is to register a kprobe and if the pre_handler
      for the existing kprobe is not aggr_pre_handler, we then:
      	1. Allocate a new struct kprobe (manager kprobe).
	2. Populate the manager kprobe with appropriate values for
	   the addr, opcode, ainsn fields
	3. Set the manager kprobe's handlers to custom handlers 
	   with list walk logic
	4. Add the existing kprobe and the new kprobe to be 
	   manager kprobe's "list"
   c. If the call is to register a kprobe and if the pre_handler
      for the existing kprobe is aggr_pre_handler, it means that
      we already have an manager kprobe defined at this address.
      Unless the new probe to register is not a jprobe, it is 
      initialized and added to the manager kprobe's list.
      The registration fails if the new probe is a jprobe.
2.2.2	Unregistering a kprobe

1. Kprobes are unregistered by making a call to unregister_kprobe()
2. If a kprobe exists and its pre_handler is aggr_pre_handler(),
   then, this is one of the possibly many kprobes at the address:
   a. Delete the kprobe from the manager kprobe's list.
   b. If the manager kprobe's "list" is empty, then free the
      manager kprobe.
3. If the kprobe to be unregistered is the only one at the address,
   remove it as is done currently.

2.3	Handler invocation

Multiple kprobes at an address are managed by a "manager kprobe"
allocated and freed by the kernel. All kprobes at the address are
added to the "list" of the manager kprobe. The handlers of the
manager kprobe are aliased to aggregate handlers that have in
them the required list walk logic to invoke the registered
kprobes' handlers.

This is accomplished by aliasing the aggr_probe's kprobe handlers
to be:

        ap->kp.pre_handler = aggr_pre_handler;
        ap->kp.post_handler = aggr_post_handler;
        ap->kp.fault_handler = aggr_fault_handler;

Please refer to the accompanying patch for details about these 

2.4	More points on handler invocation

2.4.1	Pre handlers

We care about the return code for a pre_handler only in case 
of a jprobe. Return code from all other (read kprobe) handlers 
are ignored. But, it is advised that all pre_handlers return 0 
for compatibility sake with future enhancements.

For a multiple kprobes at an address, all pre handlers will be 

2.4.2	Post handlers

All post handlers will be called.

2.4.3	Fault handlers

We now track, using a "curr_kprobe", which kprobe handler 
(pre/post) is currently executing. When a fault occurs and the 
curr_kprobe is not NULL, it means that the fault occured while 
executing the user defined handler. In that case, the 
aggr_fault_handler invokes just the fault_handler that is 
defined for the curr_kprobe

No other fault handlers are called.


a. Fault handlers are meant not just to handle faults during
   the execution of the handlers, but also in cases when we fault
   while single-stepping out of line. This is the most common case
   with user-space probes, but, the design has to keep this in 
   mind so as to accomodate addition of user-space probes at a 
   later date.

2.5	Arch specific implementations

This design does not require any changes to the arch specific 
kprobe implementations.

3	Caveats

- It is assumed that the handlers written behave well. :-)
- Multiple kprobes cannot be registered at a location that already
  has a jprobe registered. In other words, to register a jprobe,
  no other kprobes must be registered at that address and 
- The current design builds on the existing kprobe locking 
  infrastructure. It is hoped that this design (or a variant)
  can be the base for the scalability changes envisaged. 
  Suggestions are welcome.
- There is a small race window while registering a kprobe wherein,
  the registration may fail with an -EEXIST. This limitation can
  be resolved if/when the x86_64 "wart" is removed. It can also
  be tackled when we work on the kprobes scalability issues.


a. Indicate that a kprobe is disabled when disarm_kprobe() is 
b. jprobe/kprobe coexistance.

5	Questions

- Is the design implementable on all architectures kprobes is 
  currently available?
  A: Already done! :-)
     With the new design, no changes are required for the arch
     specific kprobe implementations.

diff -Naurp temp/linux-2.6.12-rc2/include/linux/kprobes.h linux-2.6.12-rc2/include/linux/kprobes.h
--- temp/linux-2.6.12-rc2/include/linux/kprobes.h	2005-04-04 12:38:05.000000000 -0400
+++ linux-2.6.12-rc2/include/linux/kprobes.h	2005-04-11 13:37:10.000000000 -0400
@@ -43,6 +43,9 @@ typedef int (*kprobe_fault_handler_t) (s
 struct kprobe {
 	struct hlist_node hlist;
+	/* list of kprobes for multi-handler support */
+	struct list_head list;
 	/* location of the probe point */
 	kprobe_opcode_t *addr;
diff -Naurp temp/linux-2.6.12-rc2/kernel/kprobes.c linux-2.6.12-rc2/kernel/kprobes.c
--- temp/linux-2.6.12-rc2/kernel/kprobes.c	2005-04-04 12:40:16.000000000 -0400
+++ linux-2.6.12-rc2/kernel/kprobes.c	2005-04-11 14:05:02.000000000 -0400
@@ -44,6 +44,7 @@ static struct hlist_head kprobe_table[KP
 unsigned int kprobe_cpu = NR_CPUS;
 static DEFINE_SPINLOCK(kprobe_lock);
+static struct kprobe *curr_kprobe;
 /* Locks kprobe: irqs must be disabled */
 void lock_kprobes(void)
@@ -73,7 +74,50 @@ struct kprobe *get_kprobe(void *addr)
 	return NULL;
-int register_kprobe(struct kprobe *p)
+int aggr_pre_handler(struct kprobe *p, struct pt_regs *regs)
+	struct kprobe *kp;
+	list_for_each_entry(kp, &p->list, list) {
+		if (kp->pre_handler) {
+			curr_kprobe = kp;
+			kp->pre_handler(kp, regs);
+			curr_kprobe = NULL;
+		}
+	}
+	return 0;
+void aggr_post_handler(struct kprobe *p, struct pt_regs *regs, 
+		unsigned long flags)
+	struct kprobe *kp;
+	list_for_each_entry(kp, &p->list, list) {
+		if (kp->post_handler) {
+			curr_kprobe = kp;
+			kp->post_handler(kp, regs, flags);
+			curr_kprobe = NULL;
+		}
+	}
+	return;
+int aggr_fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr)
+	/* 
+	 * if we faulted "during" the execution of a user specified
+	 * probe handler, invoke just that probe's fault handler
+	 * to see if it can handle the fault.
+	 */ 
+	if (curr_kprobe && curr_kprobe->fault_handler) {
+		if (curr_kprobe->fault_handler(curr_kprobe, regs, trapnr))
+			return 1;
+	}
+	return 0;
+static int register_kprobe_single(struct kprobe *p)
 	int ret = 0;
 	unsigned long flags = 0;
@@ -104,7 +148,7 @@ rm_kprobe:
 	return ret;
-void unregister_kprobe(struct kprobe *p)
+static void unregister_kprobe_single(struct kprobe *p)
 	unsigned long flags;
@@ -116,6 +160,120 @@ void unregister_kprobe(struct kprobe *p)
 	spin_unlock_irqrestore(&kprobe_lock, flags);
+static inline void add_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
+	ap->addr = p->addr;
+	ap->opcode = p->opcode;
+	memcpy(&ap->ainsn, &p->ainsn, 
+			sizeof(struct arch_specific_insn));
+	ap->pre_handler = aggr_pre_handler;
+	ap->post_handler = aggr_post_handler;
+	ap->fault_handler = aggr_fault_handler;
+	INIT_LIST_HEAD(&ap->list);
+	list_add(&p->list, &ap->list);
+	INIT_HLIST_NODE(&ap->hlist);
+	hlist_del(&p->hlist);
+	hlist_add_head(&ap->hlist,
+		&kprobe_table[hash_ptr(ap->addr,
+static int register_aggr_kprobe(struct kprobe *p)
+	int ret = 0;
+	unsigned long flags = 0;
+	struct kprobe *old_p, *ap;
+	ap = kcalloc(1, sizeof(struct kprobe), GFP_KERNEL);
+	if (!ap) 
+		return -ENOMEM;
+	spin_lock_irqsave(&kprobe_lock, flags);
+	old_p = get_kprobe(p->addr);
+	if (old_p) {
+		if (old_p->break_handler || p->break_handler) {
+			ret = -EEXIST;
+			goto free_ap;
+		} else if (old_p->pre_handler == aggr_pre_handler) {
+			list_add(&p->list, &old_p->list);
+			goto free_ap;
+		} else {
+			add_aggr_kprobe(ap, old_p);
+			list_add(&p->list, &ap->list);
+			goto out;
+		}
+	} else {
+		/* 
+		 * kprobe at this addr was deleted between the
+		 * check in register_kprobe() and the get_kprobe()
+		 * call above
+		 */
+		spin_unlock_irqrestore(&kprobe_lock, flags);
+		ret = register_kprobe_single(p);
+		if (ret == -EEXIST) {
+			spin_lock_irqsave(&kprobe_lock, flags);
+			old_p = get_kprobe(p->addr);
+			if (old_p) {
+				add_aggr_kprobe(ap, old_p);
+				list_add(&p->list, &ap->list);
+			}
+			goto out;
+		} else
+			kfree(ap);
+		return ret;
+	}
+	kfree(ap);
+	spin_unlock_irqrestore(&kprobe_lock, flags);
+	return ret;
+static void unregister_aggr_kprobe(struct kprobe *old_p, 
+		struct kprobe *p, unsigned long flags)
+	/* we have taken the spinlock in unregister_kprobe() */
+	list_del(&p->list);
+	if (list_empty(&old_p->list)) {
+		spin_unlock_irqrestore(&kprobe_lock, flags);
+		unregister_kprobe_single(old_p);
+		kfree(old_p);
+		return;
+	} 
+	spin_unlock_irqrestore(&kprobe_lock, flags);
+int register_kprobe(struct kprobe *p)
+	int ret = 0;
+	ret = register_kprobe_single(p);
+	if (ret == -EEXIST) 
+		ret = register_aggr_kprobe(p);
+	return ret;
+void unregister_kprobe(struct kprobe *p)
+	unsigned long flags;
+	struct kprobe *old_p;
+	spin_lock_irqsave(&kprobe_lock, flags);
+	old_p = get_kprobe(p->addr);
+	if (old_p && (old_p->pre_handler == aggr_pre_handler))
+		unregister_aggr_kprobe(old_p, p, flags);
+	else if (old_p == p) {
+		spin_unlock_irqrestore(&kprobe_lock, flags);
+		unregister_kprobe_single(p);
+	}
 static struct notifier_block kprobe_exceptions_nb = {
 	.notifier_call = kprobe_exceptions_notify,
 	.priority = 0x7fffffff /* we need to notified first */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]