This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: Making the transport layer more robust - Power Management - follow-up

From: "Turgis, Frederic" <f-turgis at ti dot com>
To: Mark Wielaard <mjw at redhat dot com>
Cc: "systemtap at sourceware dot org" <systemtap at sourceware dot org>
Date: Thu, 15 Dec 2011 22:00:41 +0000
Subject: RE: Making the transport layer more robust - Power Management - follow-up
References: <28BE1A38672C8B4481BB423D0FD1F22E050B4FF9@DNCE03.ent.ti.com> <1323948918.3409.24.camel@springer.wildebeest.org>

> I noticed these aren't documented anywhere. I propose to document them as follows:

Well written. Maybe we shall avoid converting into ms in documentation because this is very x86 centric, HZ is 128 on ARM (well, at least on OMAP) giving slightly different results ;-)


> Would it help you if we made the pool reserved memory buffers also tunable? Currently STP_DEFAULT_BUFFERS is defined staticly in either runtime/transport/debugfs.c (40) or runtime/transport/procfs.c (256) depending which backend we use for the control channel.

It would help but currently, my control buffers are flooded because script produces warnings/errors that I didn't have in the past (some NULL pointer backtraced). So my current solution is to correct the script. Control channel does not seem to need much bandwidth.


> That is probably because not all trace data backends really support poll/select. The ring_buffer one seems to, but the relay one doesn't.

I didn't know. I am using relay (without RELAY kernel flag, module misses some functions) and when I set timeout to 5s, I had the impression that I wake-upsometimes before 5s when I have more traces. But I need to dig in. And eventually choose ring_buffer.
I did my homework on this, increasing trace bandwidth with more prints. Playing with STP_RELAY_TIMER_INTERVAL did not help that much, I still had around same number of transport failures. Maybe bottleneck was more emptying buffer than checking if there is a buffer to empty. I guess I shall couple more investigation while digging in more into relay/ring_buffer.


>Could you file a bug report about the systemtap runtime not noticing new cores coming online for bulk mode?

Of course... I also did my homework there. If I force both cpus online before test starts, I get the second trace file. I can then force CPU1 offline then online, everything works fine. So this is really about coming online after start of test, not during test.


> Thanks for the feedback. Please let us know how tuning things differently make your life easier.

Currently, thanks to tunables and previous changes, I only need to tune reader_thread() timeout to make Power Management team happy. I have coded "struct timespec tim = {.tv_sec=5, .tv_nsec=0}". If we can put x * 1000000000 in tv_nsec, it could be the tunable, with default=200000000
I can also double check that if needed.

Regards
Fred

Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920

References:
- Making the transport layer more robust - Power Management - follow-up
  - From: Turgis, Frederic
- Re: Making the transport layer more robust - Power Management - follow-up
  - From: Mark Wielaard

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]