This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Making the transport layer more robust
Hi,
On Fri, Aug 12, 2011 at 07:43:24PM +0200, Mark Wielaard wrote:
> commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
> Author: Mark Wielaard <mjw@redhat.com>
> Date: Fri Aug 12 19:34:20 2011 +0200
>
> Remove _stp_ctl_work_timer from module transport layer.
>
> The _stp_ctl_work_timer would trigger every 20ms to check whether
> there were cmd messages queued, but not announced yet and to
> check the _stp_exit_flag was set.
>
> This commit makes all control messages announce themselves and
> check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
> is still possibly delayed since the messages are just pushed on
> a wait queue).
And with the timer out of the way it wasn't too hard to add poll
support to the command channel so that we can use a sleeping select
on the channel instead of busy-polling in stapio.
commit a9e19b380f9814630018e79b8cafa3c675dd182c
Author: Mark Wielaard <mjw@redhat.com>
Date: Sun Aug 14 23:07:46 2011 +0200
Implement and use select to wait for cmd channel data.
Add a poll implementation to runtime/transport/control.c
(_stp_ctl_poll_cmd) based on the _stp_ctl_ready_q wait queue.
Check whether select is supported in runtime/staprun/mainloop.c
(stp_main_loop) and use pselect with a sigmask that includes
SIGURG to get EINTR notifications whenever an interruptable
event occurred.
I am not seeing any regressions with this, but the signal code
in runtime/staprun/mainloop.c is pretty, uhm, creative, so some
extra review and testing would certainly be appreciated.
This has a nice effect on the stapio impact during probing.
With stap 1.6:
$ stap -e 'global scs;
probe syscall.* { if (execname() == "stapio") scs[name]++ }' -c 'sleep 10'
scs["read"]=0x5b
scs["fcntl"]=0x52
scs["ppoll"]=0x32
scs["nanosleep"]=0x28
scs["execve"]=0x5
scs["kill"]=0x1
scs["sigreturn"]=0x1
scs["rt_sigaction"]=0x1
scs["rt_sigprocmask"]=0x1
scs["wait4"]=0x1
scs["write"]=0x1
With stap from git trunk:
$ stap -e 'global scs;
probe syscall.* { if (execname() == "stapio") scs[name]++ }' -c 'sleep 10'
scs["read"]=0x34
scs["ppoll"]=0x32
scs["execve"]=0x5
scs["fcntl"]=0x4
scs["kill"]=0x1
scs["pselect6"]=0x1
scs["sigreturn"]=0x1
scs["rt_sigaction"]=0x1
scs["rt_sigprocmask"]=0x1
scs["wait4"]=0x1
scs["write"]=0x1
So in this example one pselect6 replaces ~38 reads, ~80 fcntls and
~40 nanosleeps. The remaining reads and (timeing out) ppolls come
from the relay channel. I haven't investigated yet whether those
can be eliminated too.
Cheers,
Mark