This is sources Bugzilla
Bugzilla Version 2.17.5
Bugzilla Bug 5483
  Putting probe on __init functions causes kernel crash on x86_64 Last modified: 2008-01-18 09:17:20
     Query page      Enter new bug
Bug#: 5483   Hardware:   Reporter: Srinivasa DS <srinivasa@in.ibm.com>
Host: Target: Build:
Product:     Add CC:
Component:   Version:   CC:
Status: RESOLVED   Priority:  
Resolution: FIXED   Severity:  
Assigned To: Srinivasa DS <srinivasa@in.ibm.com>   Target Milestone:  
Summary:
Keywords:

Attachment Description Type Created Actions
kprobe_init.patch Patch to avoid probing __init functions patch 2007-12-13 15:24 Edit | Diff
section.patch Patch to fix the probelm. patch 2007-12-21 05:54 Edit | Diff
section1.patch Patch for systematp patch 2007-12-21 06:41 Edit | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 5483 depends on: Show dependency tree
Show dependency graph
Bug 5483 blocks:

Additional Comments:


Leave as RESOLVED FIXED
Reopen bug
Mark bug as VERIFIED

View Bug Activity   |   Format For Printing


Description:   Last confirmed: 0000-00-00 00:00 Opened: 2007-12-13 15:21
Environment: 2.6.24-rc4 kernel, elfutils-0.131, systemtap-20071208 snapshot.

Executing % stap -e 'probe kernel.function("migration_init"){}'
causes oops on x86_64.
===================================================
Unable to handle kernel paging request at ffffffff8086ccb3 RIP: 
 [<ffffffff804739c5>] arch_prepare_kprobe+0x22/0x217
PGD 203067 PUD 207063 PMD 7e0da163 PTE 86c000
Oops: 0000 [1] SMP 
last sysfs file:
/sys/module/stap_35adaae6e718a71673316d7b16a93286_356228/sections/.bss
CPU 1 
Modules linked in: stap_35adaae6e718a71673316d7b16a93286_356228
systemtap_test_module1 systemtap_test_module2 ipv6 autofs4 hidp rfcomm l2cap
bluetooth sunrpc dm_multipath video output sbs sbshc battery acpi_memhotplug ac
power_supply lp sg tg3 ide_cd cdrom floppy serio_raw parport_pc button
e752x_edac parport edac_core i2c_i801 shpchp i2c_core pcspkr dm_snapshot dm_zero
dm_mirror dm_mod ata_piix libata aic79xx scsi_transport_spi sd_mod scsi_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 28478, comm: stapio Tainted: GF       2.6.24-rc4-mm1 #4
RIP: 0010:[<ffffffff804739c5>]  [<ffffffff804739c5>] arch_prepare_kprobe+0x22/0x217
RSP: 0018:ffff810067055e48  EFLAGS: 00010286
RAX: ffffffff8086ccb3 RBX: ffffffff88464130 RCX: ffffffff8842af30
RDX: 0000000000000f30 RSI: 6600000000000000 RDI: ffffffff88464130
RBP: ffffffff88464130 R08: ffff81000d4d6000 R09: ffff81007f834000
R10: ffffffff8024bf9c R11: 0000000000000000 R12: 00000000000036b0
R13: 0000000000000000 R14: ffffffff8843b3b2 R15: 0000000000000000
FS:  00002aebec1e2b00(0000) GS:ffff81007fbac840(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff8086ccb3 CR3: 0000000075c02000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process stapio (pid: 28478, threadinfo ffff810067054000, task ffff81007d5f4e30)
Stack:  0000000000000000 ffffffff80474c8b 0000000000000000 ffffffff88464130
 0000000000000000 00000000000036b0 00000000000001d2 ffffffff8843185b
 00000000000036b0 00000000000a7fc4 ffff810067055ee8 0000000000000008
Call Trace:
 [<ffffffff80474c8b>] __register_kprobe+0x1f0/0x2e8
 [<ffffffff8843185b>]
:stap_35adaae6e718a71673316d7b16a93286_356228:systemtap_module_init+0x202/0x45f
 [<ffffffff88431ac1>]
:stap_35adaae6e718a71673316d7b16a93286_356228:probe_start+0x9/0x12
 [<ffffffff88431aeb>]
:stap_35adaae6e718a71673316d7b16a93286_356228:_stp_handle_start+0x21/0x7c
 [<ffffffff88431bb8>]
:stap_35adaae6e718a71673316d7b16a93286_356228:_stp_ctl_write_cmd+0x72/0xc3
 [<ffffffff80265748>] audit_syscall_entry+0x141/0x174
 [<ffffffff80296349>] vfs_write+0xc6/0x14f
 [<ffffffff8029689f>] sys_write+0x45/0x6e
 [<ffffffff8020c0dc>] tracesys+0xdc/0xe1


Code: 48 8b 10 48 89 11 48 8b 40 08 48 89 41 08 48 8b 53 70 8a 02 
RIP  [<ffffffff804739c5>] arch_prepare_kprobe+0x22/0x217
 RSP <ffff810067055e48>
CR2: ffffffff8086ccb3
[root@llm42 ~]# w
 11:33:56 up 57 min,  2 users,  load average: 0.00, 0.02, 0.26
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    srinivasa-009124 10:40   52:57  23.82s  0.00s /bin/bash ./tes
root     pts/13   srinivasa.in.ibm 11:22    0.00s  0.02s  0.00s w
===================
[root@llm42 ~]# cat /proc/kallsyms | grep ffffffff8086ccb3
ffffffff8086ccb3 T migration_init
==================
[root@llm42 ~]# cat /root/linux-2.6.24-rc4/System.map | grep ffffffff8086ccb3
ffffffff8086ccb3 T migration_init
======================

------- Additional Comment #1 From Srinivasa DS 2007-12-13 15:24 -------
Created an attachment (id=2135)
Patch to avoid probing __init functions

Please let me know your comments.

Thanks
 Srinivasa DS

------- Additional Comment #2 From Frank Ch. Eigler 2007-12-13 15:45 -------
> Patch to avoid probing __init functions

The translator should already know not to allow probing FOO_init functions.
See tapsets.cxx:dwarf_query::blacklisted_p.  It would be good to find out
why it is not working here.

------- Additional Comment #3 From Srinivasa DS 2007-12-19 09:48 -------
(In reply to comment #2)
> > Patch to avoid probing __init functions
> 
> The translator should already know not to allow probing FOO_init functions.
> See tapsets.cxx:dwarf_query::blacklisted_p.  It would be good to find out
> why it is not working here.
> 

Frank 

Cause for the problem 
  lack of section information to skip probing in to __init section.

Solution 
 After discussing with Roland, I figured out the problem. Problem is with
elfutils -0.131 in which libdwfl thinks that kernel is always relocatable and
dwfl_module_relocation_info returns 1 with relocation info as "".
So our systemtap should take care of this. 

I mean to say(Particluar to this problem), If reloc_section == "" &&
dwfl_module_relocations (dw.module) == 1) iterate through the section headers
for that address and find out the section information. Thats how we can say that
address comes in __init section and need to skip the probe.

something like this....


--- tapsets.cxx.orig    2007-12-19 15:11:16.000000000 +0530
+++ tapsets.cxx 2007-12-19 15:10:34.000000000 +0530
@@ -2572,6 +2572,34 @@ dwarf_query::add_probe_point(const strin
       if (r_s)
         reloc_section = r_s;
       blacklist_section = reloc_section;
+
+     if(reloc_section == "" && dwfl_module_relocations (dw.module) == 1)
+     {
+
+       Dwarf_Addr baseaddr;
+       Elf* elf = dwfl_module_getelf (dw.module, & baseaddr);
+       Dwarf_Addr offset = addr - baseaddr;
+       if (elf)
+        {
+          Elf_Scn* scn = 0;
+          size_t shstrndx;
+          dw.dwfl_assert ("getshstrndx", elf_getshstrndx (elf, &shstrndx));
+          while ((scn = elf_nextscn (elf, scn)) != NULL)
+            {
+              GElf_Shdr shdr_mem;
+              GElf_Shdr *shdr = gelf_getshdr (scn, &shdr_mem);
+              if (! shdr) continue; // XXX error?
+
+              GElf_Addr start = shdr->sh_addr;
+              GElf_Addr end = start + shdr->sh_size;
+              if (! (offset >= start && offset < end))
+                continue;
+
+              blacklist_section =  elf_strptr (elf, shstrndx, shdr->sh_name);
+              break;
+            }
+         }
+      }
     }



------- Additional Comment #4 From Srinivasa DS 2007-12-21 05:54 -------
Created an attachment (id=2153)
Patch to fix the probelm.

I have polished the above code little bit. Please let me know your comments.

Thanks
 Srinivasa DS

------- Additional Comment #5 From Srinivasa DS 2007-12-21 06:40 -------
Another issue with elfutils-0.131 on systemtap is failure of buildok/seventeen.stp
==========================================
[root@llm27lp1 obj]# ./stap -vvvv ../src/testsuite/buildok/seventeen.stp
SystemTap translator/driver (version 0.6/0.131 built 2007-12-21)
Copyright (C) 2005-2007 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
Created temporary directory "/tmp/stap8fs5Mk"
Searched
'/home/systemtap/tmp/stap_testing_200712210556/install/share/systemtap/tapset/ppc64/*.stp',
found 1
Searched
'/home/systemtap/tmp/stap_testing_200712210556/install/share/systemtap/tapset/*.stp',
found 37
Pass 1: parsed user script and 38 library script(s) in 950usr/10sys/1069real ms.
control symbols: kts: 0xc0000000003924d0 kte: 0xc000000000395bc4 stext:
0xc000000000000000
parsed 'pipe_write' -> func 'pipe_write'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel = [0xc000000000000000-0xc00000000079f515, bias 0x0]
file /boot/vmlinux-2.6.24-rc5 ELF machine ppc64 (code 21)
pattern 'pipe_write' matches function 'pipe_write'
selected function pipe_write
probe pipe_write@fs/pipe.c:396 kernel pc=0xc0000000000fcddc
finding location for local 'write_fifo_fops' near address c0000000000fcddc,
module bias 0
dwarf_builder releasing dwflpp
semantic error: libdwfl failure (dwfl_module_relocation_info): Operation not
permitted: identifier '$write_fifo_fops' at
../src/testsuite/buildok/seventeen.stp:8:19
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in
460usr/90sys/654real ms.
Pass 2: analysis failed.  Try again with more '-v' (verbose) options.
Running rm -rf /tmp/stap8fs5Mk
===========================


Cause for the problem is being the same, Iam attaching the patch here. Please
let me know the comments.

Thanks
 Srinivasa Ds


------- Additional Comment #6 From Srinivasa DS 2007-12-21 06:41 -------
Created an attachment (id=2154)
Patch for systematp

------- Additional Comment #7 From Frank Ch. Eigler 2007-12-27 18:52 -------
Patch looks fine, thank you!

------- Additional Comment #8 From Srinivasa DS 2008-01-07 10:53 -------
(In reply to comment #7)
> Patch looks fine, thank you!

I didn't see this patch in latest weekly drop. Shall I commit these patches?

Thanks
 Srinivasa DS

------- Additional Comment #9 From Srinivasa DS 2008-01-18 09:17 -------
commited the patch.

Thanks
 Srinivasa DS

     Query page      Enter new bug
Actions: New | Query | bug # | Reports | Requests   New Account | Log In