This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: converting audit subsystem to markers for systemtap access


On Sunday 21 October 2007 13:25:38 Frank Ch. Eigler wrote:
> Hi -
>
> With kernel markers now in the Linus tree, we would like to
> investigate using them more broadly.  They could replace and
> generalize existing special-purpose hooks, with two benefits: they can
> reduce average overhead for their current users (for mostly-dormant
> instrumentation); and they can expose the events to new consumers such
> as systemtap.  This can make systemtap probes on such events faster,
> more robust, and more maintainable.  It may not be quite a win-win
> from the point of view of the current instrumentation maintainers, but
> should be one to a user who gains more robust visibility into the
> kernel.
>
> Here is one way marker conversion could be done for the audit
> subsystem.  It aims to retain current audit behavior, and just
> interject hook for others to use too.  This is based on a brief source
> code scan, so it's only a rough outline.  Among the main data for
> audit are the system call entry end exit events (audit_syscall_entry
> and _exit).  These are called near the low-level ptrace-related code
> dealing with syscall dispatching, and look e.g. like this for x86.
>
> arch/x86/kernel/ptrace_32.c:
> __attribute__((regparm(3)))
> int do_syscall_trace(struct pt_regs *regs, int entryexit)
> {
> [...]
>         if (unlikely(current->audit_context) && !entryexit)
>                 audit_syscall_entry(AUDIT_ARCH_I386, regs->orig_eax,
>                                     regs->ebx, regs->ecx, regs->edx,
> regs->esi); [...]
>
> Note that auditing is conditional on a context per-task struct created
> at fork time (audit_alloc), which is done only if an auditing daemon
> is attached to the kernel via netlink.  One could convert this call to
> markers in at least two ways:
>
> (a) within the conditional
>
>         if (unlikely(current->audit_context) && !entryexit)
>            trace_mark (audit_syscall_entry, "%d %d %d %d %d %d",
>               AUDIT_ARCH_I386, regs->orig_eax, regs->ebx,
>               regs->ecx, regs->edx, regs->esi);
>
>     The audit code would use the marker_probe_register/marker_arm to
>     wrap its existing audit_syscall_entry() function; the systemtap
>     user would 'probe kernel.mark("audit_i386_syscall_entry") { $1
>     ... $5 }'.
>
>     This would be a net performance loss to the audit side if auditd
>     was running; a performance tie without auditd; and would allow
>     systemtap to only see already audit-marked processes.
>
> (a) outside the conditional
>
>     if (!entryexit) /* entry as opposed to exit */
>        trace_mark (audit_syscall_entry, "%d %d %d %d %d %d",
>               AUDIT_ARCH_i386, regs->orig_eax, regs->ebx,
>               regs->ecx, regs->edx, regs->esi);
>
>     The audit-side marker backend would then contain the
>         if (unlikely(current->audit_context))
>            audit_syscall_entry(... incoming marker params ...)
>     test and call.
>
>     This could make a performance gain for the kernel if auditd is not
>     running, since a single systemwide dormant marker should be
>     cheaper to bypass than a per-task field fetch!  It also lets
>     systemtap users of the marker see all processes' syscalls, even if
>     auditd is not running, so if the audit context is not set.
>
>     If auditd is running (and it attaches to the marker), it would
>     suffer an additional indignity, er, indirection, but run otherwise
>     unaffected.
>
>
> Note that this is a low-level hook, in that the system call arguments
> are passed onward as simple integers/pointers.  A separate level in
> the audit code (auditsc.c) performs semantic decoding and trace record
> formatting of syscall arguments/results.  It would be nice to somehow
> share some of this code with systemtap, since its result is similar to
> the current tapset argstr computations.  Let's leave this aspect to
> followup work.  In the mean time, the systemtap tapset code can do
> exactly the same decoding as it does now, but based on marker $arg1
> context variables instead of dwarf-level ones.
>
>
> David/Steve, does this sound interesting enough to explore in code?

Not sure - I've never used system tap, so I don't know exactly what it does. 
I'd suggest discussing this on linux-audit mail list since this can impact 
our next CC eval and I'm not the one it could impact the most. Please CC Al 
Viro since he's doing the audit kernel work these days. 

In general, the audit system is not something I'd like to see messed with. We 
are getting pretty close to having most of the needed features complete. I'd 
like to get done with it and not have to worry about it breaking during the 
next CC eval.

-Steve


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]