14463 – "scheduling while atomic" bug on rawhide

Bug 14463 - "scheduling while atomic" bug on rawhide

Summary: "scheduling while atomic" bug on rawhide

Status:	RESOLVED FIXED

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	runtime (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	David Smith

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-08-13 16:35 UTC by David Smith
Modified:	2012-08-17 18:54 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Smith 2012-08-13 16:35:32 UTC

On rawhide (3.6.0-0.rc1.git2.1.fc18.i686.PAE), I'm seeing the occasional scheduling while atomic bug while running the utrace_p5.exp testcase.  Unfortunately, this bug doesn't happen reliably.  I suspect this is related to the new task_work changes.

====
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.072070] BUG: scheduling while atomic: stapio/18575/0x00000002
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092802] INFO: lockdep is turned off.
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092805] Modules linked in: stap_5bf878fef954c812443ccb479a24218_18575(U) nfs3 nfs_acl nfs dns_resolver fscache lockd sunrpc bnep bluetooth rfkill ip6table_filter ip6_tables ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi tpm_bios snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd virtio_net virtio_balloon soundcore i2c_piix4 uinput virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: stap_0e17f6a7ecf0e4370a154d9317d048e_18554]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092861] Pid: 18575, comm: stapio Tainted: G S      W    3.6.0-0.rc1.git2.1.fc18.i686.PAE #1
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092863] Call Trace:
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092873]  [<c0a125f0>] __schedule_bug+0x69/0x79
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092879]  [<c0a1b822>] __schedule+0x8c2/0x9a0
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092885]  [<c06e75ff>] ? debug_object_activate+0x5f/0x170
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092888]  [<c06e7692>] ? debug_object_activate+0xf2/0x170
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092893]  [<c04b728b>] ? trace_hardirqs_off+0xb/0x10
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092897]  [<c0a1d2e5>] ? _raw_spin_unlock_irqrestore+0x65/0x70
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092900]  [<c06e7692>] ? debug_object_activate+0xf2/0x170
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092904]  [<c0a1946c>] ? schedule_timeout+0x10c/0x2a0
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092907]  [<c0a1bba3>] schedule+0x23/0x60
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092910]  [<c0a19471>] schedule_timeout+0x111/0x2a0
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092915]  [<c0464e30>] ? __internal_add_timer+0xc0/0xc0
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092924]  [<f85eb6d5>] ? utrace_barrier+0x55/0x80 [stap_5bf878fef954c812443ccb479a24218_18575]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092928]  [<c0a1965a>] schedule_timeout_interruptible+0x1a/0x20
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092933]  [<f85eb6df>] utrace_barrier+0x5f/0x80 [stap_5bf878fef954c812443ccb479a24218_18575]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092939]  [<f85edd51>] __stp_utrace_attach+0x131/0x1f0 [stap_5bf878fef954c812443ccb479a24218_18575]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092945]  [<f85f0881>] ? _stp_ctl_write_cmd+0xbe1/0xeb0 [stap_5bf878fef954c812443ccb479a24218_18575]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092953]  [<f85f0908>] _stp_ctl_write_cmd+0xc68/0xeb0 [stap_5bf878fef954c812443ccb479a24218_18575]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092958]  [<f85f0800>] ? _stp_ctl_write_cmd+0xb60/0xeb0 [stap_5bf878fef954c812443ccb479a24218_18575]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092967]  [<f85efca0>] ? _stp_build_id_check.constprop.63+0xf0/0xf0 [stap_5bf878fef954c812443ccb479a24218_18575]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092973]  [<c058968c>] vfs_write+0x8c/0x160
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092978]  [<f85efca0>] ? _stp_build_id_check.constprop.63+0xf0/0xf0 [stap_5bf878fef954c812443ccb479a24218_18575]
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092984]  [<c058996d>] sys_write+0x3d/0x70
Aug 13 11:29:41 kvm-rawhide-32-1 kernel: [10430.092989]  [<c0a24a9f>] sysenter_do_call+0x12/0x38
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.251082] BUG: scheduling while atomic: stapio/20296/0x00000002
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272046] INFO: lockdep is turned off.
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272050] Modules linked in: stap_0e17f6a7ecf0e4370a154d9317d048e_20296(U) nfs3 nfs_acl nfs dns_resolver fscache lockd sunrpc bnep bluetooth rfkill ip6table_filter ip6_tables ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi tpm_bios snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd virtio_net virtio_balloon soundcore i2c_piix4 uinput virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: stap_f02064cb371081465dfe69106058bb8_20273]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272115] Pid: 20296, comm: stapio Tainted: G S      W    3.6.0-0.rc1.git2.1.fc18.i686.PAE #1
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272117] Call Trace:
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272128]  [<c0a125f0>] __schedule_bug+0x69/0x79
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272134]  [<c0a1b822>] __schedule+0x8c2/0x9a0
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272140]  [<c06e75ff>] ? debug_object_activate+0x5f/0x170
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272143]  [<c06e7692>] ? debug_object_activate+0xf2/0x170
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272148]  [<c04b728b>] ? trace_hardirqs_off+0xb/0x10
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272152]  [<c0a1d2e5>] ? _raw_spin_unlock_irqrestore+0x65/0x70
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272155]  [<c06e7692>] ? debug_object_activate+0xf2/0x170
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272159]  [<c0a1946c>] ? schedule_timeout+0x10c/0x2a0
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272162]  [<c0a1bba3>] schedule+0x23/0x60
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272166]  [<c0a19471>] schedule_timeout+0x111/0x2a0
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272171]  [<c0464e30>] ? __internal_add_timer+0xc0/0xc0
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272180]  [<f860df65>] ? utrace_barrier+0x55/0x80 [stap_0e17f6a7ecf0e4370a154d9317d048e_20296]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272184]  [<c0a1965a>] schedule_timeout_interruptible+0x1a/0x20
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272190]  [<f860df6f>] utrace_barrier+0x5f/0x80 [stap_0e17f6a7ecf0e4370a154d9317d048e_20296]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272196]  [<f8610ae1>] __stp_utrace_attach+0x131/0x1f0 [stap_0e17f6a7ecf0e4370a154d9317d048e_20296]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272202]  [<f8613af4>] ? _stp_ctl_write_cmd+0xce4/0xfc0 [stap_0e17f6a7ecf0e4370a154d9317d048e_20296]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272208]  [<f8613c0c>] _stp_ctl_write_cmd+0xdfc/0xfc0 [stap_0e17f6a7ecf0e4370a154d9317d048e_20296]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272216]  [<f8613a77>] ? _stp_ctl_write_cmd+0xc67/0xfc0 [stap_0e17f6a7ecf0e4370a154d9317d048e_20296]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272227]  [<f8612e10>] ? _stp_build_id_check.constprop.86+0xf0/0xf0 [stap_0e17f6a7ecf0e4370a154d9317d048e_20296]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272232]  [<c058968c>] vfs_write+0x8c/0x160
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272237]  [<f8612e10>] ? _stp_build_id_check.constprop.86+0xf0/0xf0 [stap_0e17f6a7ecf0e4370a154d9317d048e_20296]
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272243]  [<c058996d>] sys_write+0x3d/0x70
Aug 13 11:30:59 kvm-rawhide-32-1 kernel: [10508.272247]  [<c0a24a9f>] sysenter_do_call+0x12/0x38
====

Comment 1 David Smith 2012-08-13 19:45:20 UTC

I believe I know what's going on here.  When we find a new task, we call __stp_utrace_attach() to attach to the child. __stp_utrace_attach() calls utrace_set_events() to set the right events to look for.  However, if utrace_set_events() returns -EINPROGRESS we call utrace_barrier() to wait until a current callback is finished.

__stp_utrace_attach() isn't safe enough be called from a tracepoint handler since we're in atomic context there.  We'll need to use the task_work feature in a few more places to make it safe to call sleepy functions.

Comment 2 David Smith 2012-08-13 19:54:24 UTC

Note that this problem only happens when the target task we're attaching to isn't 'current' (since utrace_barrier() always succeeds when target == current).  I believe this only happens during task_finder2's clone event, when we're trying to attach to the child (the parent task is current).  We're ok during the exec event, since the target is 'current'.

Comment 3 David Smith 2012-08-14 15:42:00 UTC

(In reply to comment #2)
> Note that this problem only happens when the target task we're attaching to
> isn't 'current' (since utrace_barrier() always succeeds when target ==
> current).  I believe this only happens during task_finder2's clone event, when
> we're trying to attach to the child (the parent task is current).  We're ok
> during the exec event, since the target is 'current'.

Commit da35c58 fixes the clone handler to not sleep by using task_work_add() to do the "sleepy" operations later.

However, this doesn't fully fix this problem.  During a run of 58 iterations of the utrace_p5.exp testcase, I saw a different "scheduling while atomic" bug once - only once.  So, this flavor of this "scheduling while atomic" bug seems even more intermittent than the first.

====
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159288] BUG: scheduling while atomic: stapio/30419/0x00000002
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159803] INFO: lockdep is turned off.
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159807] Modules linked in: stap_96ab44954e013c8737876c23edda184_30419(U) nfs3 nfs_acl nfs dns_resolver fscache lockd sunrpc bnep bluetooth rfkill ip6table_filter ip6_tables ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi tpm_bios snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd virtio_net virtio_balloon soundcore i2c_piix4 uinput virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: stap_628cd27b6341c3e1668f09f6ff23c83_30396]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159906] Pid: 30419, comm: stapio Tainted: G S      W    3.6.0-0.rc1.git2.1.fc18.i686.PAE #1
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159910] Call Trace:
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159925]  [<c0a125f0>] __schedule_bug+0x69/0x79
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159935]  [<c0a1b822>] __schedule+0x8c2/0x9a0
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159945]  [<c06e75ff>] ? debug_object_activate+0x5f/0x170
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159953]  [<c06e7692>] ? debug_object_activate+0xf2/0x170
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159961]  [<c04b728b>] ? trace_hardirqs_off+0xb/0x10
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159969]  [<c0a1d2e5>] ? _raw_spin_unlock_irqrestore+0x65/0x70
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159976]  [<c06e7692>] ? debug_object_activate+0xf2/0x170
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159984]  [<c0a1946c>] ? schedule_timeout+0x10c/0x2a0
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159992]  [<c0a1bba3>] schedule+0x23/0x60
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.159999]  [<c0a19471>] schedule_timeout+0x111/0x2a0
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160014]  [<c0464e30>] ? __internal_add_timer+0xc0/0xc0
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160081]  [<f85e7e45>] ? utrace_barrier+0x55/0x80 [stap_96ab44954e013c8737876c23edda184_30419]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160090]  [<c0a1965a>] schedule_timeout_interruptible+0x1a/0x20
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160102]  [<f85e7e4f>] utrace_barrier+0x5f/0x80 [stap_96ab44954e013c8737876c23edda184_30419]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160114]  [<f85ea9f1>] __stp_utrace_attach+0x131/0x1f0 [stap_96ab44954e013c8737876c23edda184_30419]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160126]  [<f85edae4>] ? _stp_ctl_write_cmd+0xce4/0xfc0 [stap_96ab44954e013c8737876c23edda184_30419]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160138]  [<f85edbfc>] _stp_ctl_write_cmd+0xdfc/0xfc0 [stap_96ab44954e013c8737876c23edda184_30419]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160178]  [<f85eda67>] ? _stp_ctl_write_cmd+0xc67/0xfc0 [stap_96ab44954e013c8737876c23edda184_30419]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160190]  [<f85ece00>] ? _stp_build_id_check.constprop.86+0xf0/0xf0 [stap_96ab44954e013c8737876c23edda184_30419]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160199]  [<c058968c>] vfs_write+0x8c/0x160
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160213]  [<f85ece00>] ? _stp_build_id_check.constprop.86+0xf0/0xf0 [stap_96ab44954e013c8737876c23edda184_30419]
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160242]  [<c058996d>] sys_write+0x3d/0x70
Aug 14 09:58:56 kvm-rawhide-32-1 kernel: [ 2971.160252]  [<c0a24a9f>] sysenter_do_call+0x12/0x38
====

Comment 4 David Smith 2012-08-16 16:30:45 UTC

(In reply to comment #3)
> (In reply to comment #2)
> However, this doesn't fully fix this problem.  During a run of 58 iterations of
> the utrace_p5.exp testcase, I saw a different "scheduling while atomic" bug
> once - only once.  So, this flavor of this "scheduling while atomic" bug seems
> even more intermittent than the first.

I believe I know what is going on here.  First, some history.  There have been 3 major versions of utrace:

- Original in-kernel utrace (RHEL5)
- Newer in-kernel utrace (> RHEL5)
- Current "fake" utrace (implemented via tracepoints)

The original version of utrace had a bug (the details of which I've forgotten), which caused an error when attaching utrace engines to a task.  To get around this, we stop the task, then attach the utrace engines.  Here's a related comment from task_finder.c:

/*
 * All "interesting" threads get an engine with
 * __STP_ATTACHED_TASK_EVENTS events attached to it.  After the thread
 * quiesces, we reset the events to __STP_ATTACHED_TASK_BASE_EVENTS
 * events.
 */
#define __STP_ATTACHED_TASK_EVENTS (UTRACE_EVENT(DEATH)		\
				    | UTRACE_EVENT(QUIESCE))

This extra stop fixed the problem for the original in-kernel utrace and worked fine for the newer in-kernel utrace.  (Note that there is actually some permutation of the task here - the "extra" stop can cause syscalls to be interrupted and return ERESTART.  A well written user program should handle that, but all programs aren't well written.)

So, the current "fake" utrace inherited this extra stop as well, even though it doesn't really need it.

This "scheduling while atomic" bug happens during the following scenario:

stap_start_task_finder() gets called.  At this point we're in a non-atomic context, with irqs enabled.  stap_start_task_finder() needs to loop over every 
running user task, so it does the following:

	rcu_read_lock();
	do_each_thread(grp, tsk) {
                // ... attach to each user task and stop it
		rc = __stp_utrace_attach(tsk, &tgt->ops, tgt,
					 __STP_ATTACHED_TASK_EVENTS,
					 UTRACE_STOP);

	} while_each_thread(grp, tsk);
	rcu_read_unlock();

The problem happens when it calls rcu_read_lock().  At that point stap_start_task_finder() gets changed to an atomic context (since we've grabbed the lock).  But, we need to call rcu_read_lock() to make sure the task list doesn't change while we're iterating over the running tasks.

During the attach and stop process, __stp_utrace_attach() tries to stop the user task.  If it has trouble, it calls utrace_barrier(), which can sleep.  Since you can't sleep in an atomic context, we get this bug.  (I'm surprised we don't hit this bug more often - we must be getting lucky that normally the stop process succeeds.)

I think the best solution here is to get rid of the extra stop for current "fake" utrace.

Comment 5 David Smith 2012-08-17 18:54:00 UTC

Fixed in commit 3fb24a3, although not like I had discussed in comment #4.

After studying/working on this a bit more, I discovered that that "extra" stop is actually needed with "fake" utrace.  The actual user-specified callbacks get called when an "interesting" task is discovered, and since the user's probe might read memory (which can sleep), a real stop is needed.

After looking at this a bit more, I discovered the real problem.  When the target task was requested to stop via a utrace_control() call, if the return value was EINPROGRESS the code used utrace_barrier(), which can sleep.  The original code looked like this:

    rc = utrace_control(tsk, engine, UTRACE_STOP);
    if (rc == -EINPROGRESS)
        /* EINPROGRESS means we must wait for
         * a callback, which is what we want. */
        do {
            rc = utrace_barrier(tsk, engine);
        } while (rc == -ERESTARTSYS);
        if (rc != 0)
            _stp_error("utrace_control returned error %d on pid %d",
                       rc, (int)tsk->pid);

When utrace_control() is called with UTRACE_STOP and it returns EINPROGRESS, that means that the task hasn't actually stopped yet.  Since we don't need to wait for the task to stop (it will call our handler whenever it does), there is no reason to call utrace_barrier() there.

So, the new code looks like this:

    rc = utrace_control(tsk, engine, UTRACE_STOP);
    /* If utrace_control(..., UTRACE_STOP)
     * returns EINPROGRESS, that means the
     * task hasn't stopped quite yet, but
     * will soon.  Ignore this error. */
    if (rc != 0 && rc != -EINPROGRESS) {
        _stp_error("utrace_control returned error %d on pid %d",
                   rc, (int)tsk->pid);
    }
    rc = 0;

I've run utrace_p5.exp over 150 times with "fake" utrace without seeing any "scheduling while atomic" bugs.  I also made this change for the original 2 in-kernel versions of utrace, where over 125 runs of utrace_p5.exp showed no regressions.