Bug 5660

Summary: uprobed multithreaded app serializes in signal-handling code
Product: systemtap Reporter: Jim Keniston <jkenisto>
Component: uprobesAssignee: Unassigned <systemtap>
Severity: normal CC: fche
Priority: P1    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Jim Keniston 2008-01-23 00:49:06 UTC
Uprobing a multithreaded app on an x86_64 SMP system shows serious
serialization of the threads in the kernel's signal-handling code.
In the app in question, the child threads just call a dummy function
repeatedly; the uprobes module probes the dummy function's entry point.

Here's a summary of data reported by oprofile.  It shows
that with more than one thread running, utrace_get_signal(),
get_signal_to_deliver(), and force_sig_info() are the top three
consumers of CPU time.  I'm guessing that the threads are serializing
on task_struct->sighand->siglock (which is shared among tasks of the
same process).

#CPUs: 4
                    pct (rank)         pct (rank)             pct (rank)
threads usec/iter** utrace_get_signal  get_signal_to_deliver  force_sig_info
1*        4.4       12.2% (1)           2.4% (13)             < 1%
1         4.0       12.0% (1)           3.5% (7)              < 1%
2         9.2       21.4% (1)          13.2% (2)               5.7% (3)
3        19.0       30.9% (1)          24.4% (2)              13.5% (3)
4        29.7       36.7% (1)          25.6% (2)              14.4% (3)
*single-thread program -- no parent thread
** Divide by #threads to get usec per probe hit.
Percentages are of total kernel+user time.

I have no particular reason to think that this problem is specific
to x86_64.  I've observed poor scaling on multithreaded apps before,
but never got around to pointing oprofile at it.  I was hoping it was
something we could fix in uprobes. :-|
Comment 1 Frank Ch. Eigler 2011-06-10 19:54:25 UTC
Due at least to internal utrace signal-related locking, this problem
may not be correctable.  The lkml-bound uprobes should be evaluated
with multithreaded programs to see whether that is affected.