This one-liner: probe kernel.function("do_timer") { gettimeofday_us() } causes "NMI Watchdog detected LOCKUP" on 2.6.16-1.2080_FC5 x86_64. It also hangs on RHEL4 2.6.9-34.ELsmp, without any message on the console.
Created attachment 957 [details] Console dump of crash on 2.6.16-1.2080_FC5 x86_64
do_timer is called by timer_interrupt() function, there is write_seqlock (&xtime_lock) sentence, and in do_gettimeofday sentence there will be do{ read_seqbegin(&xtime_lock); ... } while(read_seqretry(&xtime_lock, seq)); write_seqlock will increase xtime_lock.sequence by 1, interrupt is disabled when calling do_gettimeofday() function, there will be dead loop in do_gettimeofday () function.
Sigh, another printk-like situation. This would only be a partial solution, but could the timer callback functions be made one more level indirect, by interposing a level of work queing? Alternately, is there a lower-level approximate gettimeofday equivalent that does not putz with locks?
(In reply to comment #2) Many thanks Bibo for the detailed analysis! (In reply to comment #3) > Sigh, another printk-like situation. This would only be a partial solution, but > could the timer callback functions be made one more level indirect, by > interposing a level of work queing? The problem was discovered with a kprobe, not a timer probe - are you just extrapolating? Indeed, this will likely be a problem for timer.profile as well, but I think timer.ms and timer.jiffies run in a different context. I will try each variant and make sure though. Work queuing may be ok for ms and jiffies probes if needed, but for timer.profile I don't think that will work. Won't the trapframe be invalid if we delay the probe execution? > Alternately, is there a lower-level approximate gettimeofday equivalent that > does not putz with locks? I don't know of one - nothing in linux/time.h looks promising. The xtime_lock is exported though, so we should be able spin until read_seqretry returns 0, and we can limit it to MAXTRYLOCK and throw an error, just like we do for our own locks. If someone else comes along and locks it after we determine we're clear, that's ok because it won't be anyone in our callstack, so there won't be a deadlock. This should allow safely calling do_gettimeofday, and if we find a non-locking alternative we can use that as a fallback. I will implement this lock-test in timestamp.stp...
This may not be as simple as I'd hoped - xtime_lock is only exported on FC5 kernels if ARCH_HAVE_XTIME_LOCK is not defined. When that is defined, xtime_lock gets mapped to __xtime_lock, which is not exported.
Two other options come to mind: - put any timestamp-related code into the kprobes blacklist (though that may not be enough if e.g. a probed NMI fires in that neighbourhood). - extend the runtime with our own time-estimator function
(In reply to comment #6) > Two other options come to mind: > - put any timestamp-related code into the kprobes blacklist (though > that may not be enough if e.g. a probed NMI fires in that > neighbourhood). This seems like a painful option, since probes in general are fine in those locations. It's only when you call a one of gettimeofday_* that there's a problem. > - extend the runtime with our own time-estimator function Perhaps we can do something based on get_cycles, and relate it back to "real" time with cpufreq_get and cpufreq_register_notifier. Did you have any other ideas of how to approximate time?
(In reply to comment #7) > > - extend the runtime with our own time-estimator function > Perhaps we can do something based on get_cycles, and relate it back to "real" > time with cpufreq_get and cpufreq_register_notifier. Did you have any other > ideas of how to approximate time? That's about it. Maybe lkml has better ideas.
Created attachment 1008 [details] Prototype time-estimation Here's a prototype to implement our own 'safe' gettimeofday. It uses get_cycles(), and syncs back to do_gettimeofday on every system tick. I don't like needing to sync back up so often, but the drift adds up pretty quickly otherwise. We may be able to tune the code to be more accurate, or just resign ourselves to having less accuracy in our measurement.
Created attachment 1010 [details] Comparison of new & old gettimeofday This script runs a comparison between the old and new methods of finding the time-of-day. It will print a histogram of the microsecond delta between the two calls. Note that this is really a worst-case comparison, because it probes on timer.profile -- this gets fired during the HW timer interrupt, and the sync happens during a timer softIRQ immediately after. Note also that calling the old function twice in a row results in ~4us difference, so that should give an idea for a reasonable noise threshold.
> This script runs a comparison between the old and new methods of finding the > time-of-day. [...] What results does this show on your test host? IMO commit your code together with a pass-5 test case that exercises it from contexts that would crash the old version.
(In reply to comment #11) > > This script runs a comparison between the old and new methods of finding the > > time-of-day. [...] > > What results does this show on your test host? On 2.6.16-1.2133_FC5 x86_64: value |-------------------------------------------------- count 0 |@@ 95 1 |@@@@@@@@@@@@@@@@@@@@@@@ 762 2 |@@@@@@@@@@@@@@@@@@@@@@ 704 4 | 0 8 | 0 16 | 0 32 | 1 64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1559 128 | 1 256 | 0 512 | 0 On 2.6.9-39.ELsmp i686: value |-------------------------------------------------- count 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3327 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5040 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4697 4 |@ 139 8 |@ 103 16 | 3 32 | 1 64 | 0 128 | 0 > IMO commit your code together with a pass-5 test case that exercises it > from contexts that would crash the old version. OK, I will do so...
Committed src/runtime/time.c r1.1 src/runtime/transport/transport.c r1.25 src/tapset/timestamp.stp r1.9 tests/testsuite/systemtap.base/timeofday.exp r1.1