2525 – NMI Watchdog lockup with gettimeofday and do_timer probe

Bug 2525 - NMI Watchdog lockup with gettimeofday and do_timer probe

Summary: NMI Watchdog lockup with gettimeofday and do_timer probe

Status:	RESOLVED FIXED

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	translator (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Josh Stone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-04-06 18:10 UTC by Josh Stone
Modified:	2006-06-22 00:58 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Console dump of crash on 2.6.16-1.2080_FC5 x86_64 (1.40 KB, text/plain) 2006-04-06 18:14 UTC, Josh Stone	Details
Prototype time-estimation (2.50 KB, patch) 2006-05-08 18:00 UTC, Josh Stone	Details \| Diff
Comparison of new & old gettimeofday (231 bytes, text/plain) 2006-05-08 20:40 UTC, Josh Stone	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Josh Stone 2006-04-06 18:10:27 UTC

This one-liner:
  probe kernel.function("do_timer") { gettimeofday_us() }

causes "NMI Watchdog detected LOCKUP" on 2.6.16-1.2080_FC5 x86_64.  It also
hangs on RHEL4 2.6.9-34.ELsmp, without any message on the console.

Comment 1 Josh Stone 2006-04-06 18:14:01 UTC

Created attachment 957 [details]
Console dump of crash on 2.6.16-1.2080_FC5 x86_64

Comment 2 bibo,mao 2006-04-07 03:43:50 UTC

do_timer is called by timer_interrupt() function, there is write_seqlock
(&xtime_lock) sentence, and in do_gettimeofday sentence there will be 
  do{
       read_seqbegin(&xtime_lock);
       ...
    } while(read_seqretry(&xtime_lock, seq));

write_seqlock will increase xtime_lock.sequence by 1, interrupt is disabled 
when calling do_gettimeofday() function, there will be dead loop in 
do_gettimeofday () function.

Comment 3 Frank Ch. Eigler 2006-04-07 16:11:18 UTC

Sigh, another printk-like situation.  This would only be a partial solution, but
could the timer callback functions be made one more level indirect, by
interposing a level of work queing?

Alternately, is there a lower-level approximate gettimeofday equivalent that
does not putz with locks?

Comment 4 Josh Stone 2006-04-07 17:51:40 UTC

(In reply to comment #2)
Many thanks Bibo for the detailed analysis!

(In reply to comment #3)
> Sigh, another printk-like situation.  This would only be a partial solution, but
> could the timer callback functions be made one more level indirect, by
> interposing a level of work queing?

The problem was discovered with a kprobe, not a timer probe - are you just
extrapolating?  Indeed, this will likely be a problem for timer.profile as well,
but I think timer.ms and timer.jiffies run in a different context.  I will try
each variant and make sure though.

Work queuing may be ok for ms and jiffies probes if needed, but for
timer.profile I don't think that will work.  Won't the trapframe be invalid if
we delay the probe execution?

> Alternately, is there a lower-level approximate gettimeofday equivalent that
> does not putz with locks?

I don't know of one - nothing in linux/time.h looks promising.

The xtime_lock is exported though, so we should be able spin until read_seqretry
returns 0, and we can limit it to MAXTRYLOCK and throw an error, just like we do
for our own locks.  If someone else comes along and locks it after we determine
we're clear, that's ok because it won't be anyone in our callstack, so there
won't be a deadlock.

This should allow safely calling do_gettimeofday, and if we find a non-locking
alternative we can use that as a fallback.  I will implement this lock-test in
timestamp.stp...

Comment 5 Josh Stone 2006-04-07 19:44:51 UTC

This may not be as simple as I'd hoped - xtime_lock is only exported on FC5
kernels if ARCH_HAVE_XTIME_LOCK is not defined.  When that is defined,
xtime_lock gets mapped to __xtime_lock, which is not exported.

Comment 6 Frank Ch. Eigler 2006-05-01 17:00:37 UTC

Two other options come to mind:
- put any timestamp-related code into the kprobes blacklist (though
  that may not be enough if e.g. a probed NMI fires in that
  neighbourhood).
- extend the runtime with our own time-estimator function

Comment 7 Josh Stone 2006-05-04 00:57:24 UTC

(In reply to comment #6)
> Two other options come to mind:
> - put any timestamp-related code into the kprobes blacklist (though
>   that may not be enough if e.g. a probed NMI fires in that
>   neighbourhood).
This seems like a painful option, since probes in general are fine in those
locations.  It's only when you call a one of gettimeofday_* that there's a problem.

> - extend the runtime with our own time-estimator function
Perhaps we can do something based on get_cycles, and relate it back to "real"
time with cpufreq_get and cpufreq_register_notifier.  Did you have any other
ideas of how to approximate time?

Comment 8 Frank Ch. Eigler 2006-05-08 15:03:09 UTC

(In reply to comment #7)
> > - extend the runtime with our own time-estimator function
> Perhaps we can do something based on get_cycles, and relate it back to "real"
> time with cpufreq_get and cpufreq_register_notifier.  Did you have any other
> ideas of how to approximate time?

That's about it.  Maybe lkml has better ideas.

Comment 9 Josh Stone 2006-05-08 18:00:59 UTC

Created attachment 1008 [details]
Prototype time-estimation

Here's a prototype to implement our own 'safe' gettimeofday.  It uses
get_cycles(), and syncs back to do_gettimeofday on every system tick.

I don't like needing to sync back up so often, but the drift adds up pretty
quickly otherwise.  We may be able to tune the code to be more accurate, or
just resign ourselves to having less accuracy in our measurement.

Comment 10 Josh Stone 2006-05-08 20:40:40 UTC

Created attachment 1010 [details]
Comparison of new & old gettimeofday

This script runs a comparison between the old and new methods of finding the
time-of-day.  It will print a histogram of the microsecond delta between the
two calls.  Note that this is really a worst-case comparison, because it probes
on timer.profile -- this gets fired during the HW timer interrupt, and the sync
happens during a timer softIRQ immediately after.

Note also that calling the old function twice in a row results in ~4us
difference, so that should give an idea for a reasonable noise threshold.

Comment 11 Frank Ch. Eigler 2006-06-21 15:01:22 UTC

> This script runs a comparison between the old and new methods of finding the
> time-of-day. [...]

What results does this show on your test host?
IMO commit your code together with a pass-5 test case that exercises it
from contexts that would crash the old version.

Comment 12 Josh Stone 2006-06-22 00:24:17 UTC

(In reply to comment #11)
> > This script runs a comparison between the old and new methods of finding the
> > time-of-day. [...]
> 
> What results does this show on your test host?

On 2.6.16-1.2133_FC5 x86_64:
value |-------------------------------------------------- count
    0 |@@                                                   95
    1 |@@@@@@@@@@@@@@@@@@@@@@@                             762
    2 |@@@@@@@@@@@@@@@@@@@@@@                              704
    4 |                                                      0
    8 |                                                      0
   16 |                                                      0
   32 |                                                      1
   64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   1559
  128 |                                                      1
  256 |                                                      0
  512 |                                                      0

On 2.6.9-39.ELsmp i686:
value |-------------------------------------------------- count
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                   3327
    1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  5040
    2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     4697
    4 |@                                                   139
    8 |@                                                   103
   16 |                                                      3
   32 |                                                      1
   64 |                                                      0
  128 |                                                      0

> IMO commit your code together with a pass-5 test case that exercises it
> from contexts that would crash the old version.

OK, I will do so...

Comment 13 Josh Stone 2006-06-22 00:58:56 UTC

Committed

src/runtime/time.c  r1.1
src/runtime/transport/transport.c  r1.25
src/tapset/timestamp.stp  r1.9

tests/testsuite/systemtap.base/timeofday.exp  r1.1