This command will stall: stap -e 'probe begin { while (++i<100) warn("something is wrong!") }' -c true Such a warning flood can happen, for example, when a lot of kernel.function or kprobe.function probes fail to register. The script is supposed to continue anyway, but in this case the child process is waiting for a SIGUSR1 before it begins. That signal should be sent on an STP_START control message, but debugging reveals that STP_START is never seen by stapio. I suspect that the control channel is overflowing with STP_OOB_DATA for the warnings, and so the STP_START is dropped.
Dmesg said: stap_d5ab9c05965c68b7bcab4fa28635aa95_744: systemtap: 1.0/0.143, base: d1ab5000, memory: 10488+11284+1704+13600 data+text+ctx+net, probes: 1 ctl_send (type=0 len=8) failed: -12 That means -ENOMEM. To increase STP_DEFAULT_BUFFERS can avoid failure. Not sure the root cause is memory leaking or too small allocated memory pool.
Current stap will keep allocating buffer for _stp_warn in probe until out of memory. As a result, _stp_ctl_send(STP_START,...) will be failed due to exhausted memory. So the child process couldn't catch the signal and remain waiting. To increase STP_DEFAULT_BUFFERS is not good way, there always be a limit for it. Maybe it's better to make _stp_warn utilize _stp_print instead of _stp_ctl_write. diff --git a/runtime/io.c b/runtime/io.c index 0136aae..10b6c8a 100644 --- a/runtime/io.c +++ b/runtime/io.c @@ -55,7 +55,7 @@ static void _stp_vlog (enum code type, const char *func, int line, const char *f else if (type == ERROR) printk (KERN_ERR "%s", buf); else printk (KERN_INFO "%s", buf); #else - if (type != DBUG) + if (type != DBUG && type != WARN) _stp_ctl_write(STP_OOB_DATA, buf, start + num + 1); else { _stp_print(buf);
(In reply to comment #2) > Current stap will keep allocating buffer for _stp_warn in probe until > out of memory. That's wrong; script-accessible constructs like error()/warn() should not be able to deplete a dynamically allocated resource. OTOH in this case it looks like the transport is just using its own _stp_mempool_alloc, which at least is a private resource. > Maybe it's better to make _stp_warn utilize _stp_print instead of > _stp_ctl_write. Then we'd have the same problem with error(). The OOB messages do serve a useful purpose at staprun time, so let's find a way of preserving OOB while ensuring that the basic protocol can proceed. For example, we can know that some of these kernel->staprun messages only occur singly (i.e., are not queued en masse). So for them, we could preallocate a mem_buffer element, to avoid having them contend. One way would be to have separate mempools for each message type, and ensuring that the singleton messages only allocate one or two elements total.
This should be fixed by the fix I am working on for PR12960.
http://sourceware.org/ml/systemtap/2011-q3/msg00051.html