This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [TAPSETS] Linux Kernel Event Trace Tool


* Guanglei Li (guanglei@cn.ibm.com) wrote:
> 
> ----- Original Message ----- 
> From: "Guanglei Li" <guanglei@cn.ibm.com>
> To: "Mathieu Desnoyers" <compudj@krystal.dyndns.org>
> Cc: <systemtap@sourceware.org>; <jrs@us.ibm.com>
> Sent: Sunday, December 18, 2005 8:59 PM
> Subject: Re: [TAPSETS] Linux Kernel Event Trace Tool
> 
> 
> >> Mathieu:
> >>Just to throw some ideas :
> >>
> >>Which protection against concurrency between multiple data sources 
> >>do you use
> >>(SMP, process vs interrupt) ?
> >>
> >>I have looked at the CVS SystemTAP code : the _stp_printk function 
> >>is all but
> >>atomic.
> >
> >[...]
> >Anyway, I think systemtap itself should handle the concurrency 
> >issue. Am I right?
> >
> >
> after thinking Mathieu's idea again, the touble may happen if a probe 
> handler makes mutiple calls to _stp_printf and during its execution, 
> it is migrated to another CPU.
> _stp_printf will get the current CPU id every time it is called.
> 

This will happend on systems where preemption is enabled if you do not disable
preemption. You could save the cpu_id() once for the current logging, and then
keep the same until the end. The side-effect is that, sometimes, a CPU will
write in another cpu's buffer.

> If calling _stp_printf only one time in the probe handler, such issues 
> should be avoid. Am I right?
> 

If preemption is not disabled or interrupts disabled, you will have issues with
_stp_printf.

RelayFS does specify that it's up to the client to deal with locking. From the
_stp_printf function :

runtime/print.c:

#ifndef STP_RELAYFS
....
#else

#define _stp_printf(args...) _stp_sprintf(_stp_stdout,args)

#endif

runtime/string.c:_stp_sprintf()

Think that :

this function can be migrated between cpus (no preempt_disable)
this function can be reentered by a different thread (no preempt_disable)
this function can be reentered by a softirq handler (no bh_disable).
this function can be reentered by an IRQ handler, if they are probed (no irq
disable)
this function can be reentered by an NMI, if they are instrumented.

void _stp_sprintf (String str, const char *fmt, ...)
{
  int num;
  va_list args;
  if (str == _stp_stdout) {
    int cpu = smp_processor_id();
    char *buf = &_stp_pbuf[cpu][STP_PRINT_BUF_START] + _stp_pbuf_len[cpu];
    int size = STP_PRINT_BUF_LEN -_stp_pbuf_len[cpu] + 1;
                        /* This does not work with multiple writers :
                         * _stp_pbuf_len might have changed. */
    va_start(args, fmt);
    num = vsnprintf(buf, size, fmt, args);
            /* You write to a buffer position.. while doing that, you might be
             * interrupted by the sources listed above. If you didn't increment
             * the _stp_pbuf_len[cpu] before, then you will get corruption. */
              
    va_end(args);
    if (unlikely(num >= size)) {
      /* overflowed the buffer */
      if (_stp_pbuf_len[cpu] == 0) {   /* value could have changed... */
        _stp_pbuf_len[cpu] = STP_PRINT_BUF_LEN;
        _stp_print_flush();
      } else {
        *buf ='\0';
        _stp_print_flush();
        va_start(args, fmt);
        _stp_vsprintf(_stp_stdout, fmt, args);
        va_end(args);
      }
    } else {
      _stp_pbuf_len[cpu] += num;
            /* The position is only incremented after the write. Furthermore,
             * it's not an atomic operation : if it's interrupted between the
             * read and the write, you will lose an event. */
    }

  } else {
    va_start(args, fmt);
    num = vscnprintf(str->buf + str->len, STP_STRING_SIZE - str->len, fmt, args);
        /* once again, not reentrant for str. */
    va_end(args);
    if (likely(num > 0))
      str->len += num;
  }
}


This is just to show that _stp_sprintf is not reentrant as said. Or maybe did I
miss an interrupt disable taken before that ? (it couldn't ensure NMI
reentrancy, but it's better than nothing)

What LTTng actually does is to provide a secure way to log to RelayFS buffers
from _any_ context, without doing anything more than preemption disabling and
atomic cmpxchg buffer space reservation.


> and after looked into the generated c code (stap -p3), I saw that 
> _stp_print_flush() will be called at the end of every probe handler. 
> the comments of _stp_print_flush() says this is a must. But each probe 
> handler of my tapsets will print only a single row of data. calling 
> _stp_print_flush() for every single line of data will be expensive. 
> Any way to avoid this? such as not calling _stp_printf at end of probe 
> handlers. The print buffer of each cpu is 8K which could be able to 
> holds lots of lines of data.
> 
> 

What I see of the _stp_sprintf code shows me that it's flushed when the buffer
is full. I didn't look much further though, so the implementers should answer
better than me to this question.

Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]