[Bug runtime/26846] New: task_finder2: kernel panics by due to unreliable in_atomic() usage

Wed Nov 4 23:49:39 GMT 2020

https://sourceware.org/bugzilla/show_bug.cgi?id=26846

            Bug ID: 26846
           Summary: task_finder2: kernel panics by due to unreliable
                    in_atomic() usage
           Product: systemtap
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: agentzh at gmail dot com
  Target Milestone: ---

With non-PREEMPT kernels (i.e., kernels with CONFIG_PREEMPT=n),
in_atomic() cannot detect when the current context is within a spin lock
or RCU read-side critical section. Since the syscall tracepoints are
executed from within an RCU read-side critical section (see
__DO_TRACE()), this means that in_atomic() won't know that the current
context doesn't allow sleeping. When this happens, we see kernel panics
occurring in stap's registered tracepoints, like this one:

kernel tried to execute NX-protected page - exploit attempt? (uid: 99)
BUG: unable to handle kernel paging request at ffffffffc1ea7040
IP: [<ffffffffc1ea7040>] _stp_module_3+0x0/0xffffffffffed9fc0
[orxray_c_fgraph_XX_3673]
PGD 1c1814067 PUD 1c1816067 PMD 486e4067 PTE 8000000164606063
Oops: 0011 [#1] SMP
CPU: 39 PID: 6934 Comm: sh Kdump: loaded Tainted: G           OE  ------------
T 3.10.0-1062.4.2.el7.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30
04/01/2014
task: ffff943dc3d5b150 ti: ffff943dc27d4000 task.ti: ffff943dc27d4000
RIP: 0010:[<ffffffffc1ea7040>]  [<ffffffffc1ea7040>]
_stp_module_3+0x0/0xffffffffffed9fc0 [orxray_c_fgraph_XX_3673]
RSP: 0018:ffff943dc27d7ea8  EFLAGS: 00010282
RAX: ffffffffc1ea7040 RBX: ffff943dc3d5b150 RCX: ffff943d537f4300
RDX: 0000000000001b16 RSI: ffff943dc3d5b150 RDI: 0000000000000000
RBP: ffff943dc27d7f28 R08: 0000000000000000 R09: 0000000180490016
R10: ffff943d537f4300 R11: ffff943d5cd62930 R12: ffff943dc4e38000
R13: 0000000000001b16 R14: 0000000000001b16 R15: ffff943e519351d0
FS:  0000000000000000(0000) GS:ffff943f76fc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc1ea7040 CR3: 000000016d4b8000 CR4: 0000000000340fe0
Call Trace:
 [<ffffffffa6e52c64>] ? do_execve_common.isra.24+0x7e4/0x880
 [<ffffffffa6e52f99>] SyS_execve+0x29/0x30
 [<ffffffffa738d478>] stub_execve+0x48/0x80

Note that the panic occurs from the execve syscall, where stap has a
tracepoint registered:
rc = STP_TRACE_REGISTER(sched_process_exec, utrace_report_exec);

Panics like this occur in all of stap's registered tracepoints.

This bug is reproducible when running stap's own test suite in parallel (-j16).

Thanks Sultan Alsawaf for the report and investigation.

-- 
You are receiving this mail because:
You are the assignee for the bug.