Bug 10189 - STP_START gets lost in a warning flood
Summary: STP_START gets lost in a warning flood
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Frank Ch. Eigler
URL:
Keywords:
Depends on: 12960
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-22 20:45 UTC by Josh Stone
Modified: 2011-07-20 21:01 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Stone 2009-05-22 20:45:05 UTC
This command will stall:
  stap -e 'probe begin { while (++i<100) warn("something is wrong!") }' -c true

Such a warning flood can happen, for example, when a lot of kernel.function or
kprobe.function probes fail to register.  The script is supposed to continue
anyway, but in this case the child process is waiting for a SIGUSR1 before it
begins.  That signal should be sent on an STP_START control message, but
debugging reveals that STP_START is never seen by stapio.

I suspect that the control channel is overflowing with STP_OOB_DATA for the
warnings, and so the STP_START is dropped.
Comment 1 Wenji Huang 2009-11-16 09:47:40 UTC
Dmesg said:
stap_d5ab9c05965c68b7bcab4fa28635aa95_744: systemtap: 1.0/0.143, base: d1ab5000,
memory: 10488+11284+1704+13600 data+text+ctx+net, probes: 1
ctl_send (type=0 len=8) failed: -12

That means -ENOMEM. To increase STP_DEFAULT_BUFFERS can avoid failure.
Not sure the root cause is memory leaking or too small 
allocated memory pool.
Comment 2 Wenji Huang 2009-11-17 08:06:01 UTC
Current stap will keep allocating buffer for _stp_warn in probe until
out of memory. As a result, _stp_ctl_send(STP_START,...) will be failed
due to exhausted memory. So the child process couldn't catch the signal
and remain waiting.

To increase STP_DEFAULT_BUFFERS is not good way, there always be a limit
for it. Maybe it's better to make _stp_warn utilize _stp_print instead of
_stp_ctl_write.

diff --git a/runtime/io.c b/runtime/io.c
index 0136aae..10b6c8a 100644
--- a/runtime/io.c
+++ b/runtime/io.c
@@ -55,7 +55,7 @@ static void _stp_vlog (enum code type, const char *func, int
line, const char *f
                 else if (type == ERROR) printk (KERN_ERR "%s", buf);
                 else printk (KERN_INFO "%s", buf);
 #else
-               if (type != DBUG)
+               if (type != DBUG && type != WARN)
                        _stp_ctl_write(STP_OOB_DATA, buf, start + num + 1);
                else {
                        _stp_print(buf);
Comment 3 Frank Ch. Eigler 2009-11-17 17:31:20 UTC
(In reply to comment #2)
> Current stap will keep allocating buffer for _stp_warn in probe until
> out of memory.

That's wrong; script-accessible constructs like error()/warn() should
not be able to deplete a dynamically allocated resource.  OTOH in this
case it looks like the transport is just using its own _stp_mempool_alloc,
which at least is a private resource.

> Maybe it's better to make _stp_warn utilize _stp_print instead of
> _stp_ctl_write.

Then we'd have the same problem with error().  The OOB messages do serve a
useful purpose at staprun time, so let's find a way of preserving OOB while
ensuring that the basic protocol can proceed.

For example, we can know that some of these kernel->staprun messages only
occur singly (i.e., are not queued en masse).  So for them, we could
preallocate a mem_buffer element, to avoid having them contend.  One way
would be to have separate mempools for each message type, and ensuring
that the singleton messages only allocate one or two elements total.
Comment 4 Mark Wielaard 2011-07-15 22:16:55 UTC
This should be fixed by the fix I am working on for PR12960.