This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Interoperability of LTTng and LTTV with SystemTAP


* Marcelo Tosatti (marcelo.tosatti@cyclades.com) wrote:
> 
> It might be interesting to measure the impact of tracing on the
> performance of synthetic workloads (preferably ones which are
> meaningful/mimic behaviour of real loads). 
> 
> Ie. the overhead of tracing under different loads.
> 
> No?
> 

Absolutely. This is what you will find inside the subsequent emails of the
thread.

Usage scenario 2 : 4 CPU x86_64, with 48% workload. Used by Autodesk for
compiling and rendering.

Usage scenario 3 : 1 CPU HT Pentium 4, running a ping flood on the loopback
pseudo interface.


> > However, I would say that probe effect at the call site is much more important
> > than the effect of a much lower priority disk writer daemon. 
> 
> It depends really. The log disk writes could interfere badly with
> sequential disk read/write streams, and workloads dominated by such 
> kind of accesses would suffer more from that than from the CPU 
> effect.
> 

Well, I hope that people will use the right tool for the right job. As the trace
from Autodesk demonstrates, it is only 621MB long for 9m10s running on 4
CPUs.

If the disk I/O is really a problem, one has the choice to use bigger memory
buffers. Someone in this situation could start the tracer with 512MB of memory
buffers dedicated to tracing and only launch the disk writer daemon once the
tracing is over.

LTTng also comes with a "flight recorder mode", which can keep the last bits of
information of a running system. It can be useful to investigate a crash or
simply to keep the information before a specific condition triggers a trace
stop.

We can think of other solutions too : installing disks specifically used for
tracing or sending the data remotely via network.

If someone really wants to investigate logging impact on disk I/O, feel free
to do it, but I think that more important performance issues exist with 
tracing, like the probe effect.


> > For the probe effect, the microbenchmarks I made tells that logging an event takes about 
> > 220 ns.
> 
> And blows a few cachelines? 
> 

There will always be a need of a few cachelines for tracing control information,
writing events to memory and holding the supplementary instructions for
logging.

> > Here are the results. Note that I have taken a short trace (15 seconds) just
> > because I was in a hurry before preparing a presentation at that moment.
> 
> Have you tried to use any sort of PMC hardware to measure the effects
> more precisely (including cache effects)?
> 

The only testing that I have done is to use the cpu TSC to know of much time is
spent where. I use the cpu_khz value to transform that into seconds.

I see that PAPI and perfctr is an interesting investigation field, especially
for cache effects. I will try it as soon as I get the time. It makes me think
that periodically saving these hardware counters with a trace could lead to very
interesting analysis.

Thanks!


Mathieu



OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]