Bug 5275 - uprobes: smarter insn slot allocation for SSOL
Summary: uprobes: smarter insn slot allocation for SSOL
Status: RESOLVED WONTFIX
Alias: None
Product: systemtap
Classification: Unclassified
Component: uprobes (show other bugs)
Version: unspecified
: P2 minor
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-06 00:47 UTC by Jim Keniston
Modified: 2011-06-10 19:48 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Keniston 2007-11-06 00:47:07 UTC
Currently, uprobes allocates one page (the SSOL vma) in each probed
process's virtual address space and divides it up into "instruction
slots" for use in single-stepping out of line (SSOL).  Slots are
allocated per-probepoint, and if a probepoint is never hit, it never
gets a slot.  If the number of probepoints ever hit (and currently
registered) exceeds the number of slots in 1 page, probepoints steal
slots from each other on a LRU basis.  To manage slot stealing, uprobes
allocates a struct uprobe_ssol_slot for each slot in the SSOL vma.

For the typical i386 system (4K page / 16-byte slots = 256 slots),
this works pretty well.  For powerpc (64K page / 4-byte slots =
16K slots), the resulting array of uprobe_ssol_slot objects is huge
(too big for kmalloc), and the likelihood that slot stealing will
be needed is tiny.  A workaround currently limits the number of SSOL
slots to 1024, but that wastes 15K slots on powerpc.

Potentially simpler approaches:
1) Forget about slot stealing and just refuse to do SSOL for probepoints
that we can't allocate slots for.  Ugly.
2) Forget about slot stealing and just keep growing the SSOL vma
as more slots are needed.  This is perhaps less ugly, but also more
complicated, and runs the possibility of bumping up against another
vma, bringing us back to (1).

A somewhat more complicated approach: Divide the SSOL vma into private
slots (the majority) and public slots.  A private slot, once allocated
to a probepoint, is owned by that probepoint until the probepoint
goes away.  Public slots are the equivalent of the slots we have now:
each has an associated uprobe_ssol_slot object and so can be stolen.
This prevents us from ever running out of slots.  Public slots are
put into use only when all private slots are used.  So a powerpc
process might have 64 public slots and (16K-64 = 16320) private slots;
an i386 process might have 64 public slots and (256-64 = 192) private
slots.
Comment 1 Ananth Mavinakayanahalli 2007-11-06 11:09:06 UTC
Would it make sense to bypass uprobe_ssol_slot() totally for powerpc (and like
archs) that may potentially never need the lru to kick in?
Comment 2 Jim Keniston 2007-11-06 21:29:43 UTC
(In reply to comment #1)
> Would it make sense to bypass uprobe_ssol_slot() totally for powerpc (and like
> archs) that may potentially never need the lru to kick in?

There's definitely sense in that.  It would impose a theoretical limit on the
number of probepoints a process could hit; but registering 16K uprobes would
probably stretch some practical limits -- e.g., related to the size of struct
uprobe_probept or the lengths of the hash lists in uproc->uprobe_table[].

Unfortunately, making all the slots private on 1-2 architectures doesn't reduce
overall code complexity.
Comment 3 Frank Ch. Eigler 2011-06-10 19:48:17 UTC
Anything like this is unlikely to be changed in stap classic uprobes;
and as to lkml-track uprobes, that's an issue to raise there.