This is the mail archive of the
mailing list for the systemtap project.
[Bug uprobes/5509] New: uprobe booster thoughts
- From: "jkenisto at us dot ibm dot com" <sourceware-bugzilla at sourceware dot org>
- To: systemtap at sources dot redhat dot com
- Date: 18 Dec 2007 17:29:28 -0000
- Subject: [Bug uprobes/5509] New: uprobe booster thoughts
- Reply-to: sourceware-bugzilla at sourceware dot org
For consideration down the road... The basic idea is to "boost"
uprobes and uretprobes in the same way that we boost kprobes and
kretprobes (currently for i386 and x86_64 only).
Reviewing Masami's x86_64 k[ret]probe booster patches made me think
about this some more. Masami and I talked about this briefly at
OLS this year, but we didn't discuss details.
Boosting uprobes was not feasible at that time because the current
slot-allocation scheme employs only "public" (stealable) slots and
therefore requires a return to kernel space after the single-step to
do an up_read() on the instruction slot's rwsem.
On the other hand, the scheme proposed in SystemTap bz5275 employs
mostly private slots, which don't need to be locked and so could
conceivably be boosted.
I don't think we can boost uretprobes the way we do kretprobes.
The kretprobe booster involves replacing the int3 at the kretprobe
trampoline with code that saves regs, calls the trampoline handler,
restores regs, and returns to the probed function. But for uretprobes,
we need the int3 to get us into kernel mode.
The idea of a uprobe booster is the same as for a kprobe booster:
in the SSOL slot, append a jump instruction after the copy of the
probed instruction. The jump is to the instruction following the
probed instruction. This allows us to avoid single-stepping the
instruction copy, which should save nearly 50% of the overhead of
a uprobe hit.
We currently hold the uprobe_process read-locked while processing
the probepoint, and don't unlock it 'til after we've single-stepped
the instruction copy and called uprobe_post_ssout() to run any fixups.
Seems like we could just unlock it before returning control to the
instruction copy+jump in the SSOL slot.
Boosting (adding the jump instruction) is done in uprobe_post_ssout().
Serialize this operation with existing ppt->slot_mutex. Need memory
barriers here, since private slots are unlocked?
What happens if the probepoint is unregistered while one or more
threads are executing the instructions in the SSOL slot? We can't free
up the SSOL slot while it's still in use. The uprobe_process->rwsem
no longer explicitly protects us there. But we can take advantage of
the fact that all threads in the probed process are quiesced when we
remove a probepoint. We can detect whether a thead is currently
in the SSOL slot by checking the ip. It could conceivably be stopped
at the instruction-copy or at the jump. If it's stopped at the
instruction-copy, we adjust the ip to point to the (now restored)
original instruction. If it's stopped at the jump, we point the
ip at the next instruction (whose address we know from boosting the
What are the implications for running utask_fake_quiesce() and
uprobe_run_def_regs(), which are currently called (if needed) after
the instruction copy has been single-stepped and the uprobe_process
has been unlocked?
uprobe booster for x86_64
For x86_64, the user address space is very large. In particular, a
jump instruction with a 32-bit offset from the SSOL area won't reach all
(or even most) of the probed process's text areas. However, we can
do an indirect jump of the following form
where next_insn is the address of the instruction to which we want to
jump. (This is an indirect jump to the address stored in the 8 bytes
following the jmpq instruction.)
The above instruction sequence takes 14 bytes: 6 bytes for the jmpq
(always ff 25 00 00 00 00) and 8 bytes for the address. For x86_64,
MAX_UINSN_BYTES=16, which doesn't leave much room for the actual
instruction copy. We seem to have the following choices:
a) Boost only 1-byte and 2-byte instructions. (Ick)
b) Make MAX_UINSN_BYTES larger.
c) Allocate 2 SSOL slots for a boostable instruction.
d) Allocate some big (boostable) slots and some little ones.
I prefer (b). (c) and (d) complicate the slot allocation algorithm,
which so far is architecture-independent. Note that there's no
particular reason we can't allocate more than one 4096-byte page to
the SSOL area.
Summary: uprobe booster thoughts
AssignedTo: systemtap at sources dot redhat dot com
ReportedBy: jkenisto at us dot ibm dot com
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.