Bug 2989 - Systemtap generated code triggering locking correctness validator on x86_64
Summary: Systemtap generated code triggering locking correctness validator on x86_64
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Josh Stone
URL:
Keywords:
: 5130 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-08-01 14:28 UTC by William Cohen
Modified: 2007-10-06 03:40 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
PoC: lockdep-annotated seqlock_init (532 bytes, patch)
2006-09-27 00:13 UTC, Josh Stone
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description William Cohen 2006-08-01 14:28:45 UTC
Systemtap generated code on x86_64 is triggering the locking
correctness validator
(http://people.redhat.com/mingo/lockdep-patches/lockdep-design.txt).
On the rawhide machine getting the following message in
/var/log/messages:

Aug  1 10:06:07 dhcp59-158 kernel: INFO: trying to register non-static key.
Aug  1 10:06:07 dhcp59-158 kernel: the code is fine but needs lockdep annotation.
Aug  1 10:06:07 dhcp59-158 kernel: turning off the locking correctness validator.
Aug  1 10:06:07 dhcp59-158 kernel: 
Aug  1 10:06:07 dhcp59-158 kernel: Call Trace:
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff8026e269>] show_trace+0xaa/0x23d
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff8026e411>] dump_stack+0x15/0x17
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff802a788e>] __lock_acquire+0x135/0xa54
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff802a874e>] lock_acquire+0x4b/0x69
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff8026723b>] _spin_lock+0x25/0x31
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff885d714b>]
:stap_23370:_stp_init_time+0xaa/0xe3
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff885d80ed>]
:stap_23370:_stp_handle_start+0x11/0x55
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff885d81b8>]
:stap_23370:_stp_proc_write_cmd+0x87/0xd1
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff80217106>] vfs_write+0xcf/0x175
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff802179ee>] sys_write+0x47/0x70
Aug  1 10:06:07 dhcp59-158 kernel:  [<ffffffff8025ff0e>] system_call+0x7e/0x83

Looking through the nightly testing stap_23370.c ran much earlier (about
4:30am). From the testsuite's systemap.log.

Running ./systemtap.base/add.exp ...
Pass 1: parsed user script and 40 library script(s) in 240usr/10sys/258real ms.

Pass 2: analyzed script: 2 probe(s), 1 function(s), 3 global(s) in
10usr/10sys/4real ms.

Pass 3: translated to C into "/tmp/stapF3LstW/stap_23370.c" in
50usr/90sys/154real ms.

Pass 4: compiled C into "stap_23370.ko" in 1880usr/510sys/2424real ms.

Pass 5: starting run.

systemtap starting probe

PASS: ./systemtap.base/add.stp startup
PASS: ./systemtap.base/add.stp load generation
systemtap ending probe

systemtap test success

PASS: ./systemtap.base/add.stp shutdown and output
Pass 5: run completed in 0usr/60sys/443real ms.

metric:	./systemtap.base/add.stp 	240	10	258	10	10	4	50	90	154	1880	510
2424	0	60	443
testcase ./systemtap.base/add.exp completed in 4 seconds
Comment 1 William Cohen 2006-08-01 14:31:13 UTC
System tap was a cvs snapshot from 4:30am EDT 8/1/2006

$ uname -a
Linux dhcp59-158.rdu.redhat.com 2.6.17-1.2462.fc6 #1 SMP Thu Jul 27 11:27:24 EDT
2006 x86_64 x86_64 x86_64 GNU/Linux
Comment 2 Josh Stone 2006-09-27 00:12:59 UTC
The problem is that seqlock_init doesn't apply lockdep annotations on dynamic
locks like all of the other lock types do.  Really a seqlock is just a spinlock
and a sequence number, so if the spinlock were initialized using spin_lock_init
then things would be fine.  However, this seems like something that should be
fixed in the upstream kernel, not in our runtime.

Another potential lockdep problem that I uncovered is that every SystemTap
script generates unique lock-classes.  The lockdep code has a fixed number of
classes it can track, and if you exceed that then lockdep will be disabled. 
There are other 'fixed' resources in lockdep as well, and they only get reset on
reboot.  So on a long-running machine with active SystemTap users, it is very
likely that we will exhaust those fixed resources.
Comment 3 Josh Stone 2006-09-27 00:13:23 UTC
Created attachment 1328 [details]
PoC: lockdep-annotated seqlock_init

A bit of a hack to make our use of seqlock_init create the necessary lockdep
annotations.  This is just to demonstrate that the problem can be solved, as it
would probably be better for the upstream kernel to fix seqlock_init.
Comment 4 Josh Stone 2007-01-20 00:32:34 UTC
This patch fixed the problem in the kernel mainline:

  LKML: Ingo Molnar: [patch] lockdep: fix seqlock_init():
  http://lkml.org/lkml/2006/12/12/72

... and was committed as of 2.6.20-rc1.
Comment 5 Josh Stone 2007-10-06 03:40:03 UTC
*** Bug 5130 has been marked as a duplicate of this bug. ***