Bug 3916 - improve timing monotony
Summary: improve timing monotony
Status: RESOLVED WORKSFORME
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Masami Hiramatsu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-24 16:09 UTC by Frank Ch. Eigler
Modified: 2010-06-09 13:25 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Initialize cpu_freq by using kernel variables (391 bytes, patch)
2007-08-14 22:51 UTC, Masami Hiramatsu
Details | Diff
improvement of basetime updating and frequency initialization (2.54 KB, patch)
2007-09-20 14:22 UTC, Masami Hiramatsu
Details | Diff
Add a testcase for the timestamp accuracy (906 bytes, patch)
2007-09-20 14:26 UTC, Masami Hiramatsu
Details | Diff
Add a testcase for the timestamp accuracy (988 bytes, patch)
2007-09-27 14:47 UTC, Masami Hiramatsu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2007-01-24 16:09:55 UTC
As per several reports on the mailing list, the newer
gettimeofday_* calls are sometimes indicates time going
backward.  Let's see if that can be made rare.
Comment 1 Josh Stone 2007-01-24 18:52:35 UTC
If I may link to the relevant threads...

The first report was here:
http://sourceware.org/ml/systemtap/2007-q1/msg00174.html

My reply tries to explain *why* the current implementation has this deficiency:
http://sourceware.org/ml/systemtap/2007-q1/msg00175.html

These further replies seem to confirm my hunch that the resync with kernel time
is causing the negative intervals, as the resync happens in a timer callback
(i.e. during softirq-timer).
http://sourceware.org/ml/systemtap/2007-q1/msg00183.html
http://sourceware.org/ml/systemtap/2007-q1/msg00184.html

Possible solutions:
* Find a better estimate of the cpu freq
* Use a clock that's independent of cpu freq (e.g., a motherboard/chipset timer)
* Get the kernel to provide a lockless gettimeofday
* Others?
Comment 2 Josh Stone 2007-01-24 19:13:09 UTC
Also worth reading is one of our own design documents, which was since removed
from CVS:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/src/tapsets/timestamp/Attic/timestamp_tapset.txt?rev=1.1&content-type=text/plain&cvsroot=systemtap
Comment 3 Frank Ch. Eigler 2007-01-24 19:32:23 UTC
(In reply to comment #1)
> Possible solutions:
> * Find a better estimate of the cpu freq
> * Use a clock that's independent of cpu freq (e.g., a motherboard/chipset timer)
> * Get the kernel to provide a lockless gettimeofday
> * Others?

One simple thing is to keep the per-cpu counts from ever decreasing during
the resynchronization phase.  This could let them drift ahead of real time,
but so be it.  If it gets too bad, the runtime could emit a warning.
Also, the resync phase may not be necessary so frequently on stable_tsc
processors.
Comment 4 Josh Stone 2007-06-07 02:15:47 UTC
It seems that I like links.  Here's another relevant mailing-list discussion:
http://sourceware.org/ml/systemtap/2007-q2/msg00505.html
Comment 5 Masami Hiramatsu 2007-08-14 22:49:52 UTC
(In reply to comment #4)
> It seems that I like links.  Here's another relevant mailing-list discussion:
> http://sourceware.org/ml/systemtap/2007-q2/msg00505.html

I updated my patch.
However, I think this still needs to be discussed (especially, 
about Core/Core2 architecture and newer AMD processors).

Thank you,
Comment 6 Masami Hiramatsu 2007-08-14 22:51:54 UTC
Created attachment 1968 [details]
Initialize cpu_freq by using kernel variables

This patch changes systemtap's timer initialization 
to use kernel variable instead of estimating it. 

Masami
Comment 7 Josh Stone 2007-08-16 16:17:49 UTC
(In reply to comment #5)
> However, I think this still needs to be discussed (especially, 
> about Core/Core2 architecture and newer AMD processors).

There is a synthesized cpu feature flag X86_FEATURE_CONSTANT_TSC, added in
2.6.13 for x86_64 and 2.6.16 for i386.  Further, in 2.6.18 there's a tsc_khz
variable as a companion to the cpu_khz, and a check_tsc_unstable().

So hopefully we can make use of these features, though we'll need to gate it
with autoconf to continue supporting older kernels.
Comment 8 Masami Hiramatsu 2007-09-20 14:22:17 UTC
Created attachment 2018 [details]
improvement of basetime updating and frequency initialization

I completely updated my patch.

New patch introduced some fixes below:
- Use tsc_khz instead of cpu_khz if it is available. tsc_khz means the
frequency of TSC(time stamp counter) on i386/x86-64, and this value is updated
automatically by the kernel when the CPU frequency is changed.
- Don't register the CPUFreq handler if the processor has constant frequency
time counter. For this purpose, this patch checks the CPU_FEATURE_CONSTANT_TSC
flag if it is available.
- Use ktime_get_real_ts() instead of do_gettimeofday() for base-time. Since the
do_gettimeofday() has the micro-second resolution, there is a time-difference
of hundreds of nano-seconds between getnstimeofday()(in the kernel) and
gettimeofday_ns()(in the systemtap script). Unfortunately, since the
getnstimeofday() is not exported(but "extern"ed) in older kernels, this patch
uses ktime_get_real_ts() instead of it.

I tested this patch on i386, x86_64 and ia64.
Comment 9 Masami Hiramatsu 2007-09-20 14:26:26 UTC
Created attachment 2019 [details]
Add a testcase for the timestamp accuracy

This patch adds a testcase to check the timestamp accuracy.
Basically, this new testcase compares between the value of gettimeofday 
in the systemtap and in the application.
This includes the short range, the middle range, and the long range tests.
In the long range test, the application sleeps enough long to calm down 
processors and the frequency of the processor will be changed if the 
CPUFreq is enabled.
Comment 10 Frank Ch. Eigler 2007-09-20 17:17:00 UTC
It is also important to test on older kernels.  At worst,
we may need KERNEL_VERSION- or runtime/autoconf-based
#ifdefs to activate this fine-sounding code.
Comment 11 Masami Hiramatsu 2007-09-20 21:00:56 UTC
(In reply to comment #10)
> It is also important to test on older kernels.  At worst,
> we may need KERNEL_VERSION- or runtime/autoconf-based
> #ifdefs to activate this fine-sounding code.

OK, I tested the patch on:
- i386, Fedora7, 2.6.22.5-76.fc7
- i386, RHEL4, 2.6.9-42.EL
- x86-64, Fedora7.91, 2.6.23-0.185.rc6.git7
- ia64, RHEL5, 2.6.22

on RHEL4 kernel, the script compilation was succeeded, 
but the test was failed, because we could not use new 
kernel APIs on old kernel.
It is a known-fail case.

Thanks,
Comment 12 Josh Stone 2007-09-20 23:16:54 UTC
(In reply to comment #8)
> Created an attachment (id=2018)
> improvement of basetime updating and frequency initialization

Thanks, this looks very good.  I only have a couple of concerns:

In __stp_get_freq():
  #ifdef STAPCONF_TSC_KHZ
      return tsc_khz;
  #else /*STAPCONF_TSC_KHZ*/
      return cpu_khz;
  #endif /*STAPCONF_TSC_KHZ*/

It's not correct to assume cpu_khz in the #else case, because you still need to
consider whether this is a fixed or variable TSC.

Also, in _stp_constant_freq(), it's not correct to return 0 when the
X86_FEATURE_CONSTANT_TSC flag isn't available.  We may still be on a processor
that has a constant TSC; the kernel is just not new enough to know.  In that
case we may need to manually check the CPU type, using the same checks that gate
the "set_bit(X86_FEATURE_CONSTANT_TSC, ...)" in the latest kernels.

I also just found that kernel commit 688340ea34c61ad12473ccd837325b59aada9a93
made cycles_2_ns() generally available.  This could be used to simplify our code
even further (i.e., no more manual frequency tracking on our part).  Although,
adding even more version-dependent code may not actually be simpler after all...

Once you're confident that we understand TSC freqs better, we can reduce how
often the timer fires to sync the base time.

I think you should go ahead and commit what you've done, as it's certainly an
improvement over what we have now.  We can continue tying up the loose ends later.
Comment 13 Josh Stone 2007-09-20 23:17:49 UTC
(In reply to comment #11)
> on RHEL4 kernel, the script compilation was succeeded, 
> but the test was failed, because we could not use new 
> kernel APIs on old kernel.
> It is a known-fail case.

Which kernel APIs are the problem?  Is there a bug # on this already?
Comment 14 Frank Ch. Eigler 2007-09-20 23:52:59 UTC
(In reply to comment #11)
> on RHEL4 kernel, the script compilation was succeeded, 
> but the test was failed, because we could not use new 
> kernel APIs on old kernel.
> It is a known-fail case.

OK, but we don't want to break timekeeping for scripts that
run on RHEL4.  So the old code would probably need to be retained
and #ifdef'd in.
Comment 15 Masami Hiramatsu 2007-09-21 15:46:36 UTC
(In reply to comment #13)
> Which kernel APIs are the problem?  Is there a bug # on this already?

Oh, I mean the "test" is the gettimeofday testcase that I developed.
And kernel APIs mean ktime_get_real_ts(), tsc_khz, and so on.

The scripts that run on older kernels(ex. RHEL4) use old code(do_gettimeofday,
cpu_khz, etc.).

IMHO, the main cause of time drifting is the use of do_gettimeofday() for
getting base time, so, on RHEL4, the scripts causes time drifting and the new 
testcase is failed.

Comment 16 Masami Hiramatsu 2007-09-21 17:33:24 UTC
Hi,

Thank you for reviewing and comments.

(In reply to comment #12)
> In __stp_get_freq():
>   #ifdef STAPCONF_TSC_KHZ
>       return tsc_khz;
>   #else /*STAPCONF_TSC_KHZ*/
>       return cpu_khz;
>   #endif /*STAPCONF_TSC_KHZ*/
> 
> It's not correct to assume cpu_khz in the #else case, because you still need to
> consider whether this is a fixed or variable TSC.

I agree that the fixed TSC will cause problem.
I think if we get incorrect frequency here, we can get correct frequency from
CPUFreq again. But the processor has fixed TSC, it is wrong to get frequency
from CPUFreq.

> Also, in _stp_constant_freq(), it's not correct to return 0 when the
> X86_FEATURE_CONSTANT_TSC flag isn't available.  We may still be on a processor
> that has a constant TSC; the kernel is just not new enough to know.  In that
> case we may need to manually check the CPU type, using the same checks that gate
> the "set_bit(X86_FEATURE_CONSTANT_TSC, ...)" in the latest kernels.

Sure, it's good idea.
Only my concern is that the old kernel itself could get incorrect time
with fixed TSC. If so, systemtap's gettimeofday_*() might seem drifting.

> I also just found that kernel commit 688340ea34c61ad12473ccd837325b59aada9a93
> made cycles_2_ns() generally available.  This could be used to simplify our code
> even further (i.e., no more manual frequency tracking on our part).  Although,
> adding even more version-dependent code may not actually be simpler after all...
> 
> Once you're confident that we understand TSC freqs better, we can reduce how
> often the timer fires to sync the base time.

I'm not sure whether we can do that. We might need to consider about ntp, 
adjtimex and so on.

> I think you should go ahead and commit what you've done, as it's certainly an
> improvement over what we have now.  We can continue tying up the loose ends later.

Thanks,

Masami
Comment 17 Josh Stone 2007-09-21 17:48:00 UTC
(In reply to comment #16)
> Only my concern is that the old kernel itself could get incorrect time
> with fixed TSC. If so, systemtap's gettimeofday_*() might seem drifting.

I don't think that's the case.  If older kernels were drifting time by that
much, then there would be many greater problems than just SystemTap.

>> Once you're confident that we understand TSC freqs better, we can reduce how
>> often the timer fires to sync the base time.
> 
> I'm not sure whether we can do that. We might need to consider about ntp, 
> adjtimex and so on.

Well, currently the sync timer fires on every jiffy, on every CPU.  This is
really bad for power.  I hope we can reduce this to maybe once or a few times a
second.

If we got really fancy, the sync could even be adaptive.  When our time is
drifting, we could tweak the 'freq' to compensate, and sync again soon after. 
But if our time is stable, then we can wait a while before the next sync is needed.
Comment 18 Masami Hiramatsu 2007-09-24 19:30:33 UTC
(In reply to comment #14)
> OK, but we don't want to break timekeeping for scripts that
> run on RHEL4.  So the old code would probably need to be retained
> and #ifdef'd in.

Sure, the old code is compiled when the script run on RHEL4 with my patch.
Comment 19 Masami Hiramatsu 2007-09-26 15:13:30 UTC
Hi,

I found that TSC might drift on some machines.
Currently, I found that behavior on Dell Latitude D820(Core2Duo T7200).

I wrote a script for checking this problem.
---
global count=0, lastcycle, lastns

function getnstimeofday:long () %{
        struct timespec ts;
        getnstimeofday(&ts);
        THIS->__retvalue = timespec_to_ns(&ts);
%}

probe timer.us(1000) {
        cycle = get_cycles();
        ns = getnstimeofday();
        if (count > 0) {
                printf("p%d: c:%u,\t ns:%u\t ratio:%d\n", cpu(),
                        cycle - lastcycle, ns - lastns,
                        (cycle - lastcycle)*100 / (ns - lastns));
        }
        lastcycle = cycle;
        lastns = ns
        if (count ++ > 100) exit();
}
---

This script gets TSC and time repeatly with 1ms interval time, and calculates
the elapse TSC, the elapse time and the ratio of it.
Normally, it shows constant(or nearly constant) ratio. But, the ratio and TSC
drifted on the D820.
----(results of D820)----
p0: c:1997340,   ns:999993       ratio:199
p0: c:2011296,   ns:1007047      ratio:199
p0: c:977136,    ns:1078425      ratio:90
p0: c:579372,    ns:1001180      ratio:57
p0: c:841872,    ns:1013193      ratio:83
p0: c:619824,    ns:999434       ratio:62
p0: c:522516,    ns:1000622      ratio:52
p0: c:521232,    ns:999155       ratio:52
p0: c:678576,    ns:1000272      ratio:67
p0: c:523152,    ns:998596       ratio:52
----

However, I also tested it on EPSON NA702(Core2Duo T7200) and Dell Dimension
E520(Core2Duo 6400), and I found TSC did NOT drift on both machines.
I'm not sure that is a rare case or not, and I think it might depend on the BIOS.

And, I think it may be slightly different from "unstable TSC" issue, because
I tested on Athlon64x2 on which the TSC marked unstable but TSC did not drift.

Anyway, there may be some machines whose TSC drifts and we can't get correct
time from TSC on those machines.
Thus, I think we should find other ways to get correct time when we can't use
TSC. For example, HPET timer, or ktime_get_real_ts().
Comment 20 Masami Hiramatsu 2007-09-27 14:47:11 UTC
Created attachment 2023 [details]
Add a testcase for the timestamp accuracy

I updated the testcase patch for gettimeofday accuracy.

This test can check whether the gettimeofday_* work correctly on the machine.
Comment 21 Frank Ch. Eigler 2010-06-09 13:25:21 UTC
this timekeeping code & test case was merged long ago.