As per several reports on the mailing list, the newer gettimeofday_* calls are sometimes indicates time going backward. Let's see if that can be made rare.
If I may link to the relevant threads... The first report was here: http://sourceware.org/ml/systemtap/2007-q1/msg00174.html My reply tries to explain *why* the current implementation has this deficiency: http://sourceware.org/ml/systemtap/2007-q1/msg00175.html These further replies seem to confirm my hunch that the resync with kernel time is causing the negative intervals, as the resync happens in a timer callback (i.e. during softirq-timer). http://sourceware.org/ml/systemtap/2007-q1/msg00183.html http://sourceware.org/ml/systemtap/2007-q1/msg00184.html Possible solutions: * Find a better estimate of the cpu freq * Use a clock that's independent of cpu freq (e.g., a motherboard/chipset timer) * Get the kernel to provide a lockless gettimeofday * Others?
Also worth reading is one of our own design documents, which was since removed from CVS: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/src/tapsets/timestamp/Attic/timestamp_tapset.txt?rev=1.1&content-type=text/plain&cvsroot=systemtap
(In reply to comment #1) > Possible solutions: > * Find a better estimate of the cpu freq > * Use a clock that's independent of cpu freq (e.g., a motherboard/chipset timer) > * Get the kernel to provide a lockless gettimeofday > * Others? One simple thing is to keep the per-cpu counts from ever decreasing during the resynchronization phase. This could let them drift ahead of real time, but so be it. If it gets too bad, the runtime could emit a warning. Also, the resync phase may not be necessary so frequently on stable_tsc processors.
It seems that I like links. Here's another relevant mailing-list discussion: http://sourceware.org/ml/systemtap/2007-q2/msg00505.html
(In reply to comment #4) > It seems that I like links. Here's another relevant mailing-list discussion: > http://sourceware.org/ml/systemtap/2007-q2/msg00505.html I updated my patch. However, I think this still needs to be discussed (especially, about Core/Core2 architecture and newer AMD processors). Thank you,
Created attachment 1968 [details] Initialize cpu_freq by using kernel variables This patch changes systemtap's timer initialization to use kernel variable instead of estimating it. Masami
(In reply to comment #5) > However, I think this still needs to be discussed (especially, > about Core/Core2 architecture and newer AMD processors). There is a synthesized cpu feature flag X86_FEATURE_CONSTANT_TSC, added in 2.6.13 for x86_64 and 2.6.16 for i386. Further, in 2.6.18 there's a tsc_khz variable as a companion to the cpu_khz, and a check_tsc_unstable(). So hopefully we can make use of these features, though we'll need to gate it with autoconf to continue supporting older kernels.
Created attachment 2018 [details] improvement of basetime updating and frequency initialization I completely updated my patch. New patch introduced some fixes below: - Use tsc_khz instead of cpu_khz if it is available. tsc_khz means the frequency of TSC(time stamp counter) on i386/x86-64, and this value is updated automatically by the kernel when the CPU frequency is changed. - Don't register the CPUFreq handler if the processor has constant frequency time counter. For this purpose, this patch checks the CPU_FEATURE_CONSTANT_TSC flag if it is available. - Use ktime_get_real_ts() instead of do_gettimeofday() for base-time. Since the do_gettimeofday() has the micro-second resolution, there is a time-difference of hundreds of nano-seconds between getnstimeofday()(in the kernel) and gettimeofday_ns()(in the systemtap script). Unfortunately, since the getnstimeofday() is not exported(but "extern"ed) in older kernels, this patch uses ktime_get_real_ts() instead of it. I tested this patch on i386, x86_64 and ia64.
Created attachment 2019 [details] Add a testcase for the timestamp accuracy This patch adds a testcase to check the timestamp accuracy. Basically, this new testcase compares between the value of gettimeofday in the systemtap and in the application. This includes the short range, the middle range, and the long range tests. In the long range test, the application sleeps enough long to calm down processors and the frequency of the processor will be changed if the CPUFreq is enabled.
It is also important to test on older kernels. At worst, we may need KERNEL_VERSION- or runtime/autoconf-based #ifdefs to activate this fine-sounding code.
(In reply to comment #10) > It is also important to test on older kernels. At worst, > we may need KERNEL_VERSION- or runtime/autoconf-based > #ifdefs to activate this fine-sounding code. OK, I tested the patch on: - i386, Fedora7, 2.6.22.5-76.fc7 - i386, RHEL4, 2.6.9-42.EL - x86-64, Fedora7.91, 2.6.23-0.185.rc6.git7 - ia64, RHEL5, 2.6.22 on RHEL4 kernel, the script compilation was succeeded, but the test was failed, because we could not use new kernel APIs on old kernel. It is a known-fail case. Thanks,
(In reply to comment #8) > Created an attachment (id=2018) > improvement of basetime updating and frequency initialization Thanks, this looks very good. I only have a couple of concerns: In __stp_get_freq(): #ifdef STAPCONF_TSC_KHZ return tsc_khz; #else /*STAPCONF_TSC_KHZ*/ return cpu_khz; #endif /*STAPCONF_TSC_KHZ*/ It's not correct to assume cpu_khz in the #else case, because you still need to consider whether this is a fixed or variable TSC. Also, in _stp_constant_freq(), it's not correct to return 0 when the X86_FEATURE_CONSTANT_TSC flag isn't available. We may still be on a processor that has a constant TSC; the kernel is just not new enough to know. In that case we may need to manually check the CPU type, using the same checks that gate the "set_bit(X86_FEATURE_CONSTANT_TSC, ...)" in the latest kernels. I also just found that kernel commit 688340ea34c61ad12473ccd837325b59aada9a93 made cycles_2_ns() generally available. This could be used to simplify our code even further (i.e., no more manual frequency tracking on our part). Although, adding even more version-dependent code may not actually be simpler after all... Once you're confident that we understand TSC freqs better, we can reduce how often the timer fires to sync the base time. I think you should go ahead and commit what you've done, as it's certainly an improvement over what we have now. We can continue tying up the loose ends later.
(In reply to comment #11) > on RHEL4 kernel, the script compilation was succeeded, > but the test was failed, because we could not use new > kernel APIs on old kernel. > It is a known-fail case. Which kernel APIs are the problem? Is there a bug # on this already?
(In reply to comment #11) > on RHEL4 kernel, the script compilation was succeeded, > but the test was failed, because we could not use new > kernel APIs on old kernel. > It is a known-fail case. OK, but we don't want to break timekeeping for scripts that run on RHEL4. So the old code would probably need to be retained and #ifdef'd in.
(In reply to comment #13) > Which kernel APIs are the problem? Is there a bug # on this already? Oh, I mean the "test" is the gettimeofday testcase that I developed. And kernel APIs mean ktime_get_real_ts(), tsc_khz, and so on. The scripts that run on older kernels(ex. RHEL4) use old code(do_gettimeofday, cpu_khz, etc.). IMHO, the main cause of time drifting is the use of do_gettimeofday() for getting base time, so, on RHEL4, the scripts causes time drifting and the new testcase is failed.
Hi, Thank you for reviewing and comments. (In reply to comment #12) > In __stp_get_freq(): > #ifdef STAPCONF_TSC_KHZ > return tsc_khz; > #else /*STAPCONF_TSC_KHZ*/ > return cpu_khz; > #endif /*STAPCONF_TSC_KHZ*/ > > It's not correct to assume cpu_khz in the #else case, because you still need to > consider whether this is a fixed or variable TSC. I agree that the fixed TSC will cause problem. I think if we get incorrect frequency here, we can get correct frequency from CPUFreq again. But the processor has fixed TSC, it is wrong to get frequency from CPUFreq. > Also, in _stp_constant_freq(), it's not correct to return 0 when the > X86_FEATURE_CONSTANT_TSC flag isn't available. We may still be on a processor > that has a constant TSC; the kernel is just not new enough to know. In that > case we may need to manually check the CPU type, using the same checks that gate > the "set_bit(X86_FEATURE_CONSTANT_TSC, ...)" in the latest kernels. Sure, it's good idea. Only my concern is that the old kernel itself could get incorrect time with fixed TSC. If so, systemtap's gettimeofday_*() might seem drifting. > I also just found that kernel commit 688340ea34c61ad12473ccd837325b59aada9a93 > made cycles_2_ns() generally available. This could be used to simplify our code > even further (i.e., no more manual frequency tracking on our part). Although, > adding even more version-dependent code may not actually be simpler after all... > > Once you're confident that we understand TSC freqs better, we can reduce how > often the timer fires to sync the base time. I'm not sure whether we can do that. We might need to consider about ntp, adjtimex and so on. > I think you should go ahead and commit what you've done, as it's certainly an > improvement over what we have now. We can continue tying up the loose ends later. Thanks, Masami
(In reply to comment #16) > Only my concern is that the old kernel itself could get incorrect time > with fixed TSC. If so, systemtap's gettimeofday_*() might seem drifting. I don't think that's the case. If older kernels were drifting time by that much, then there would be many greater problems than just SystemTap. >> Once you're confident that we understand TSC freqs better, we can reduce how >> often the timer fires to sync the base time. > > I'm not sure whether we can do that. We might need to consider about ntp, > adjtimex and so on. Well, currently the sync timer fires on every jiffy, on every CPU. This is really bad for power. I hope we can reduce this to maybe once or a few times a second. If we got really fancy, the sync could even be adaptive. When our time is drifting, we could tweak the 'freq' to compensate, and sync again soon after. But if our time is stable, then we can wait a while before the next sync is needed.
(In reply to comment #14) > OK, but we don't want to break timekeeping for scripts that > run on RHEL4. So the old code would probably need to be retained > and #ifdef'd in. Sure, the old code is compiled when the script run on RHEL4 with my patch.
Hi, I found that TSC might drift on some machines. Currently, I found that behavior on Dell Latitude D820(Core2Duo T7200). I wrote a script for checking this problem. --- global count=0, lastcycle, lastns function getnstimeofday:long () %{ struct timespec ts; getnstimeofday(&ts); THIS->__retvalue = timespec_to_ns(&ts); %} probe timer.us(1000) { cycle = get_cycles(); ns = getnstimeofday(); if (count > 0) { printf("p%d: c:%u,\t ns:%u\t ratio:%d\n", cpu(), cycle - lastcycle, ns - lastns, (cycle - lastcycle)*100 / (ns - lastns)); } lastcycle = cycle; lastns = ns if (count ++ > 100) exit(); } --- This script gets TSC and time repeatly with 1ms interval time, and calculates the elapse TSC, the elapse time and the ratio of it. Normally, it shows constant(or nearly constant) ratio. But, the ratio and TSC drifted on the D820. ----(results of D820)---- p0: c:1997340, ns:999993 ratio:199 p0: c:2011296, ns:1007047 ratio:199 p0: c:977136, ns:1078425 ratio:90 p0: c:579372, ns:1001180 ratio:57 p0: c:841872, ns:1013193 ratio:83 p0: c:619824, ns:999434 ratio:62 p0: c:522516, ns:1000622 ratio:52 p0: c:521232, ns:999155 ratio:52 p0: c:678576, ns:1000272 ratio:67 p0: c:523152, ns:998596 ratio:52 ---- However, I also tested it on EPSON NA702(Core2Duo T7200) and Dell Dimension E520(Core2Duo 6400), and I found TSC did NOT drift on both machines. I'm not sure that is a rare case or not, and I think it might depend on the BIOS. And, I think it may be slightly different from "unstable TSC" issue, because I tested on Athlon64x2 on which the TSC marked unstable but TSC did not drift. Anyway, there may be some machines whose TSC drifts and we can't get correct time from TSC on those machines. Thus, I think we should find other ways to get correct time when we can't use TSC. For example, HPET timer, or ktime_get_real_ts().
Created attachment 2023 [details] Add a testcase for the timestamp accuracy I updated the testcase patch for gettimeofday accuracy. This test can check whether the gettimeofday_* work correctly on the machine.
this timekeeping code & test case was merged long ago.