Bug 2453 - kernel panic when probe elv_dequeue_request
Summary: kernel panic when probe elv_dequeue_request
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: kprobes (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Ananth Mavinakayanahalli
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-13 08:34 UTC by Li Guanglei
Modified: 2011-03-16 21:19 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Single-step out of line even if probed insn is a trap variant (419 bytes, patch)
2006-03-16 08:26 UTC, Ananth Mavinakayanahalli
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Li Guanglei 2006-03-13 08:34:49 UTC
I am running ppc64/2.6.15.4, and the kernel will panic if I put probe on entry
of elv_dequeue_request

output of xmon:
4:mon> e
cpu 0x4: Vector: 700 (Program Check) at [c00000000ffdf660]
    pc: c0000000001f15bc: .elv_dequeue_request+0x10/0x58
    lr: c0000000002fe764: .scsi_request_fn+0x164/0x3b8
    sp: c00000000ffdf8e0
   msr: 8000000000021032
  current = 0xc00000005c33c040
  paca    = 0xc00000000056ac00
    pid   = 15159, comm = 05-wait_for_sys
kernel BUG in elv_dequeue_request at block/elevator.c:514!


the stap file:
probe kernel.function("elv_dequeue_request")
{
        log("here")
}

the generated c file shows that systemtap put the probe point at:

static struct kprobe dwarf_kprobe_0[1]= {
  {.addr= (void *) 0xc0000000001f15bc}
};


objdump of the assembly codes:

c0000000001f15ac <.elv_dequeue_request>:
c0000000001f15ac:       e8 04 00 00     ld      r0,0(r4)
c0000000001f15b0:       7c 00 22 78     xor     r0,r0,r4
c0000000001f15b4:       21 20 00 00     subfic  r9,r0,0
c0000000001f15b8:       7c 09 01 14     adde    r0,r9,r0
c0000000001f15bc:       0b 00 00 00     tdnei   r0,0
c0000000001f15c0:       e9 24 00 00     ld      r9,0(r4)
c0000000001f15c4:       e9 64 00 08     ld      r11,8(r4)
c0000000001f15c8:       f9 2b 00 00     std     r9,0(r11)
c0000000001f15cc:       f9 69 00 08     std     r11,8(r9)
c0000000001f15d0:       60 00 00 00     nop
c0000000001f15d4:       e8 04 00 10     ld      r0,16(r4)
c0000000001f15d8:       f8 84 00 00     std     r4,0(r4)
c0000000001f15dc:       f8 84 00 08     std     r4,8(r4)
Comment 1 Frank Ch. Eigler 2006-03-13 14:49:19 UTC
To my powerpc-inexperienced eyes, it looks like the kprobe breakpoint ended up
on top of a tdnei (conditional trap) instruction.
Comment 2 Ananth Mavinakayanahalli 2006-03-14 04:27:26 UTC
Subject: Re:  kernel panic when probe elv_dequeue_request

On Mon, Mar 13, 2006 at 02:49:19PM -0000, fche at redhat dot com wrote:
> 
> ------- Additional Comments From fche at redhat dot com  2006-03-13 14:49 -------
> To my powerpc-inexperienced eyes, it looks like the kprobe breakpoint ended up
> on top of a tdnei (conditional trap) instruction.

Well, that tdnei is what BUG() uses internally to cause a trap. Methinks,
we have hit a (possibly legitimate) BUG() while using the testcase.

Ananth
Comment 3 Jim Keniston 2006-03-16 00:30:35 UTC
Some questions come to mind:
a. Do we see the BUG only when there's a probepoint on elv_dequeue_request?
b. Is single-stepping the tdnei instruction somehow causing the BUG?
c. Should we be allowing probepoints on tdnei instructions?
d. Is the tdnei instruction really the first instruction after the function
prolog?  (This would make a difference if (b) or (c) comes into play.)
Comment 4 michael@ellerman.id.au 2006-03-16 03:18:52 UTC
Subject: Re:  kernel panic when probe elv_dequeue_request

On 16 Mar 2006 00:30:35 -0000, jkenisto at us dot ibm dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
> ------- Additional Comments From jkenisto at us dot ibm dot com  2006-03-16 00:30 -------
> Some questions come to mind:
> a. Do we see the BUG only when there's a probepoint on elv_dequeue_request?
> b. Is single-stepping the tdnei instruction somehow causing the BUG?
> c. Should we be allowing probepoints on tdnei instructions?
> d. Is the tdnei instruction really the first instruction after the function
> prolog?  (This would make a difference if (b) or (c) comes into play.)
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=2453

Well I can answer d), that isn't the first instruction after the
prolog, there is no prolog. The probe should really be sitting on the
first ld instruction.

The first four instructions are evaluating
"list_empty(&rq->queuelist)", so the probe is in the wrong place
AFAICT.

cheers
Comment 5 Ananth Mavinakayanahalli 2006-03-16 05:07:47 UTC
Guanglei,

1. Where does a kallsyms_lookup_name(".elv_dequeue_request") resolve to?
2. Does a plain kprobe at elv_dequeue_request() work?
3. I'd like to see the xmon instruction disassembly when you hit this bug.
Kprobes uses a different trap variant (opcode 0x7fe00008) and not tdnei. That
can tell us if we are goofing up probing at locations with other trap variants.

Michael Ellerman has pointed to the possible cause. list_empty() has to be first
calculated and that is what objdump shows - r4 has struct request * upon entry:
                                                           |
0000000000000000 <.elv_dequeue_request>:                   |
       0:       e8 04 00 00     ld      r0,0(r4) <---------/
       4:       7c 00 22 78     xor     r0,r0,r4
       8:       21 20 00 00     subfic  r9,r0,0
       c:       7c 09 01 14     adde    r0,r9,r0
      10:       0b 00 00 00     tdnei   r0,0
      14:       e9 24 00 00     ld      r9,0(r4)

Frank, Roland,

Could this be an issue with dwarf not finding the correct entry location?
Comment 6 Li Guanglei 2006-03-16 07:33:23 UTC
Hi,
  I checked the register when kernel panic, the r0 register is still 0 when the
kernel is panic:
7:mon> r
R00 = 0000000000000000   R16 = 00000000ffffffff
R01 = c00000000ffc78e0   R17 = 00000000ffffffff
R02 = c000000000749c80   R18 = 0000000000000000

  The instruction dump around the halting instruction:
7:mon> di c0000000001f15ac
c0000000001f15ac  e8040000      ld      r0,0(r4)
c0000000001f15b0  7c002278      xor     r0,r0,r4
c0000000001f15b4  21200000      subfic  r9,r0,0
c0000000001f15b8  7c090114      adde    r0,r9,r0
c0000000001f15bc  7fe00008      trap
c0000000001f15c0  e9240000      ld      r9,0(r4)
c0000000001f15c4  e9640008      ld      r11,8(r4)
c0000000001f15c8  f92b0000      std     r9,0(r11)
c0000000001f15cc  f9690008      std     r11,8(r9)
c0000000001f15d0  60000000      nop

  I used a stp script to show the result of
kallsyms_lookup_name(".elv_dequeue_request")

function foo()
%{
        _stp_printf("%llx\n", kallsyms_lookup_name(".elv_dequeue_request"));
%}

probe begin 
{
        foo()
}

the result is:
c0000000001f15ac
 
  I used "probe kernel.statement(address)", and it works well if address !=
0xc0000000001f15bc.

  So maybe the cause is the wrong handling when a probe is put on tdnei.
Comment 7 Ananth Mavinakayanahalli 2006-03-16 07:56:09 UTC
Subject: Re:  kernel panic when probe elv_dequeue_request

On Thu, Mar 16, 2006 at 07:33:23AM -0000, guanglei at cn dot ibm dot com wrote:
> 
> ------- Additional Comments From guanglei at cn dot ibm dot com  2006-03-16 07:33 -------
> Hi,
>   I checked the register when kernel panic, the r0 register is still 0 when the
> kernel is panic:
> 7:mon> r
> R00 = 0000000000000000   R16 = 00000000ffffffff
> R01 = c00000000ffc78e0   R17 = 00000000ffffffff
> R02 = c000000000749c80   R18 = 0000000000000000
> 
>   The instruction dump around the halting instruction:
> 7:mon> di c0000000001f15ac
> c0000000001f15ac  e8040000      ld      r0,0(r4)
> c0000000001f15b0  7c002278      xor     r0,r0,r4
> c0000000001f15b4  21200000      subfic  r9,r0,0
> c0000000001f15b8  7c090114      adde    r0,r9,r0
> c0000000001f15bc  7fe00008      trap

So, we are replacing the tdnei correctly...

> c0000000001f15c0  e9240000      ld      r9,0(r4)
> c0000000001f15c4  e9640008      ld      r11,8(r4)
> c0000000001f15c8  f92b0000      std     r9,0(r11)
> c0000000001f15cc  f9690008      std     r11,8(r9)
> c0000000001f15d0  60000000      nop
> 
>   I used a stp script to show the result of
> kallsyms_lookup_name(".elv_dequeue_request")
> 
> function foo()
> %{
>         _stp_printf("%llx\n", kallsyms_lookup_name(".elv_dequeue_request"));
> %}
> 
> probe begin 
> {
>         foo()
> }
> 
> the result is:
> c0000000001f15ac

Which means we are getting a different function entry address when using
systemtap

>   I used "probe kernel.statement(address)", and it works well if address !=
> 0xc0000000001f15bc.
> 
>   So maybe the cause is the wrong handling when a probe is put on tdnei.

Yes.

Thinking of the original idea of single-stepping on a breakpoint
instruction, I now don't see how it'd work even if we single-stepped
inline. Will investigate.

Ananth
Comment 8 Ananth Mavinakayanahalli 2006-03-16 08:26:59 UTC
Created attachment 922 [details]
Single-step out of line even if probed insn is a trap variant

Guanglei,

Does this patch help? It is against 2.6.16-rc6, but should apply on other
kernels too.

Basically, we would single-step inline if the probed insn was a trap variant.
That would mean executing "trap" (0x7fe00008) instead of the variant. On
PowerPC, whether a trap is taken or not is dependent on the particular variant
as also the values in certain registers at the time of the trap (determined by
the bits set in the opcode). However, the variant we use is an unconditional
trap, which obviously is not correct.

Please let me know if this fixes the problem.

Also, I guess you'll need to open another bug to track why the systemtap
address resolution was incorrect.
Comment 9 Li Guanglei 2006-03-16 09:24:03 UTC
Hi, Ananth 
  Thanks. I manually applied the patch against 2.6.15.4 and it worked really
well for me. 
Comment 10 Ananth Mavinakayanahalli 2006-03-16 09:39:21 UTC
Guanglei, please open another bug for the address resolution issue.

I will be posting this patch for upstream inclusion
Comment 11 Ananth Mavinakayanahalli 2006-03-16 14:24:41 UTC
Patch posted to ppc64 list
(http://ozlabs.org/pipermail/linuxppc64-dev/2006-March/008466.html) for upstream
inclusion
Comment 12 Frank Ch. Eigler 2006-04-08 17:40:53 UTC
(In reply to comment #5)
> [...] Could this be an issue with dwarf not finding the correct entry location?

Would you mind trying the new prologue heuristic code in CVS?  Run it at -vvv
for more information.
Comment 13 Li Guanglei 2006-04-10 03:02:38 UTC
(In reply to comment #12)
> (In reply to comment #5)
> > [...] Could this be an issue with dwarf not finding the correct entry location?
> 
> Would you mind trying the new prologue heuristic code in CVS?  Run it at -vvv
> for more information.
> 

First I tested using an old copy of codes(about on 04/01) on 2.6.16.1/ppc64, and
got the same result as before, which meant that stap will still put probes on
"tdnei"

Then I tested using latest codes from CVS on 2.6.16.1/ppc64. This time it seemed
to work well:

objdump's output:
c0000000001f4f48 <.elv_dequeue_request>:
c0000000001f4f48:       e8 04 00 00     ld      r0,0(r4)
c0000000001f4f4c:       7c 00 22 78     xor     r0,r0,r4
c0000000001f4f50:       21 20 00 00     subfic  r9,r0,0
c0000000001f4f54:       7c 09 01 14     adde    r0,r9,r0
c0000000001f4f58:       0b 00 00 00     tdnei   r0,0
c0000000001f4f5c:       e9 24 00 00     ld      r9,0(r4)
c0000000001f4f60:       e9 64 00 08     ld      r11,8(r4)
c0000000001f4f64:       f9 2b 00 00     std     r9,0(r11)

the .c file generated by -p3:
...
static struct kprobe dwarf_kprobe_0[1]= {
  {.addr= (void *) 0xc0000000001f4f48}
};
...

The output of -vvv:

-bash-3.00# stap a.stp -p3  -vvv > a.c
Created temporary directory "/tmp/stapx5387z"
Searched '/usr/local/share/systemtap/tapset/2.6.16.1/ppc64/*.stp', match count 0
Searched '/usr/local/share/systemtap/tapset/2.6.16.1/*.stp', match count 0
Searched '/usr/local/share/systemtap/tapset/2.6.16/ppc64/*.stp', match count 0
Searched '/usr/local/share/systemtap/tapset/2.6.16/*.stp', match count 0
Searched '/usr/local/share/systemtap/tapset/2.6/ppc64/*.stp', match count 0
Searched '/usr/local/share/systemtap/tapset/2.6/*.stp', match count 0
Searched '/usr/local/share/systemtap/tapset/ppc64/*.stp', match count 1
Searched '/usr/local/share/systemtap/tapset/*.stp', match count 14
Pass 1: parsed user script and 15 library script(s) in 230usr/0sys/230real ms.
parsed 'elv_dequeue_request' -> func 'elv_dequeue_request'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel' = [c000000000000000-c0000000007b5548, bias 0]
pattern 'elv_dequeue_request' matches function 'elv_dequeue_request'
selected function elv_dequeue_request
prologue searching function 'elv_dequeue_request'
0xc0000000001f4f48@block/elevator.c:527
checking line record 0xc0000000001f4f48@include/linux/list.h:257
prologue found function 'elv_dequeue_request' (naked) = 0xc0000000001f4f48
probe elv_dequeue_request@block/elevator.c:527 pc=0xc0000000001f4f48
pattern 'kernel' matches module 'kernel'
Eliding unused function warn
Eliding unused function exit
Eliding unused function error
Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 global(s) in
450usr/80sys/524real ms.
Probe #0 locks nothing
Running grep " [tT] " /proc/kallsyms | sort -k 1,8 -s -o
/tmp/stapx5387z/symbols.sorted
Pass 3: translated to C into "/tmp/stapx5387z/stap_6175.c" in
130usr/130sys/271real ms.
Running rm -rf /tmp/stapx5387z