when running dbench, I use systemtap to probe all syscalls on ppc64/2.6.15.4, the system will crash shortly. the error given by xmon: Unable to handle kernel paging request for data at address 0x00000010 Faulting instruction address: 0xd000000000270ee4 cpu 0x1: Vector: 300 (Data Access) at [c00000005d95a8c0] pc: d000000000270ee4: ._stp_print_flush+0xb8/0x164 [stap_13972] lr: d000000000272a94: .probe_1+0x374/0x400 [stap_13972] sp: c00000005d95ab40 msr: 8000000000001032 dar: 10 dsisr: 40000000 current = 0xc000000020739040 paca = 0xc000000000538400 pid = 25259, comm = hotplug enter ? for help 1:mon> t [c00000005d95abf0] d000000000272a94 .probe_1+0x374/0x400 [stap_13972] [c00000005d95ac90] d000000000272cf4 .dwarf_kprobe_1_enter+0x13c/0x1d8 [stap_13972] [c00000005d95ad10] c00000000041959c .kprobe_exceptions_notify+0x334/0x5e8 [c00000005d95add0] c00000000041a134 .notifier_call_chain+0x68/0x98 [c00000005d95ae60] c000000000418834 .program_check_exception+0x114/0x5d0 [c00000005d95af00] c000000000004348 program_check_common+0xc8/0x100 --- Exception: 700 (Program Check) at c0000000000b0b94 .__find_get_block_slow+0x0/0x174 [link register ] c0000000000b1940 .__find_get_block+0x110/0x278 [c00000005d95b1f0] c00000000027c6b0 .put_device+0x1c/0x30 (unreliable) [c00000005d95b2d0] c0000000000b5184 .__getblk+0x44/0x2cc [c00000005d95b390] c00000000013d678 .__ext3_get_inode_loc+0x1b0/0x42c [c00000005d95b450] c00000000013e568 .ext3_reserve_inode_write+0x58/0x11c [c00000005d95b500] c00000000013e650 .ext3_mark_inode_dirty+0x24/0x5c [c00000005d95b5b0] c000000000140df0 .ext3_dirty_inode+0x8c/0xbc [c00000005d95b640] c0000000000ddcb4 .__mark_inode_dirty+0x70/0x1e8 [c00000005d95b6e0] c0000000000d105c .update_atime+0xa4/0xbc [c00000005d95b770] c0000000000802e8 .do_generic_mapping_read+0x41c/0x474 [c00000005d95b8c0] c000000000082b4c .__generic_file_aio_read+0x1b4/0x21c [c00000005d95b990] c000000000082d5c .generic_file_aio_read+0x44/0x54 [c00000005d95ba20] c0000000000ae520 .do_sync_read+0xcc/0x124 [c00000005d95bba0] c0000000000ae65c .vfs_read+0xe4/0x1b8 [c00000005d95bc40] c0000000000bd7a4 .kernel_read+0x34/0x58 [c00000005d95bce0] c0000000000e87b4 .compat_do_execve+0x15c/0x2c8 [c00000005d95bd90] c000000000012744 .compat_sys_execve+0x7c/0xf8 [c00000005d95be30] c000000000008600 syscall_exit+0x0/0x18 --- Exception: c01 (System Call) at 000000000fef6004 SP (ffc403c0) is in userspace
If I read this correctly, .__find_get_block_slow suffered some kind of fault. Could you disassemble your kernel in its neighbourhood to figure out which part of that function triggered it? Also, I don't understand how the kprobe was entered. The exception notification stuff should not result in launching into a kprobe. Systemtap does not set any "kp_fault_handler" at the present. Does the "stap -p3" source code suggest any linkage of dwarf_kprobe_1_enter to kprobe_exception_notify? Might there simply be a structure initialization issue?
The following is the disassembly given by objdump: Disassambly inside __find_get_block: c0000000000b1934: mr r31,r6 c0000000000b1938: bne- cr7,c0000000000b1a68 <.__find_get_block+0x238> c0000000000b193c: bl c0000000000b0b94 <.__find_get_block_slow> c0000000000b1940: mr. r31,r3 c0000000000b1944: beq- c0000000000b1a68 <.__find_get_block+0x238> c0000000000b1948: li r27,0 c0000000000b194c: mfmsr r0 disassambly around __find_get_block_slow: c0000000000b0b8c <.sys_fdatasync>: c0000000000b0b8c: li r4,1 c0000000000b0b90: b c0000000000b0a10 <.do_fsync> c0000000000b0b94 <.__find_get_block_slow>: c0000000000b0b94: mflr r0 c0000000000b0b98: std r24,-64(r1) c0000000000b0b9c: std r25,-56(r1) c0000000000b0ba0: std r28,-32(r1) c0000000000b0ba4: std r29,-24(r1) c0000000000b0ba8: mr r24,r4 But I wonder whether such info given by xmon is useful. I tried several times, and it will crash every time and showed a different exception & backtrace. And I noticed that all of these errors will have: Unable to handle kernel paging request for data at address ... --------------- Testing One --------------------------------- Unable to handle kernel paging request for data at address 0x00000010 Faulting instruction address: 0xd000000000270ee4 cpu 0x1: Vector: 300 (Data Access) at [c000000040dab3f0] pc: d000000000270ee4: ._stp_print_flush+0xb8/0x164 [stap_7259] lr: d000000000273cb4: .probe_4+0x374/0x400 [stap_7259] sp: c000000040dab670 msr: 8000000000001032 dar: 10 dsisr: 40000000 current = 0xc00000002a351040 paca = 0xc000000000538400 pid = 9179, comm = dbench enter ? for help 1:mon> t [c000000040dab720] d000000000273cb4 .probe_4+0x374/0x400 [stap_7259] [c000000040dab7c0] d000000000273e6c .dwarf_kprobe_4_enter+0x12c/0x1c8 [stap_7259] [c000000040dab840] c000000000419164 .trampoline_probe_handler+0xb0/0x150 [c000000040dab8e0] c00000000041959c .kprobe_exceptions_notify+0x334/0x5e8 [c000000040dab9a0] c00000000041a134 .notifier_call_chain+0x68/0x98 [c000000040daba30] c000000000418834 .program_check_exception+0x114/0x5d0 [c000000040dabad0] c000000000004348 program_check_common+0xc8/0x100 --- Exception: 700 (Program Check) at c00000000002a3bc kretprobe_trampoline+0x0/ 0x8 [c000000040dabe30] c00000000002a3bc kretprobe_trampoline+0x0/0x8 --- Exception: c01 (System Call) at 000000000ff201b8 SP (ff9000b0) is in userspace 1:mon> ----------- Testing Two ----------------------------------- localhost.localdomain login: Unable to handle kernel paging request for data at address 0x00000010 Faulting instruction address: 0xd000000000270ee4 cpu 0x1: Vector: 300 (Data Access) at [c000000066eeb500] pc: d000000000270ee4: ._stp_print_flush+0xb8/0x164 [stap_3949] lr: d0000000002736dc: .probe_3+0x374/0x400 [stap_3949] sp: c000000066eeb780 msr: 8000000000001032 dar: 10 dsisr: 40000000 current = 0xc000000002423040 paca = 0xc000000000538400 pid = 17224, comm = env enter ? for help 1:mon> t [c000000066eeb830] d0000000002736dc .probe_3+0x374/0x400 [stap_3949] [c000000066eeb8d0] d0000000002738a4 .dwarf_kprobe_3_enter+0x13c/0x1d8 [stap_3949] [c000000066eeb950] c00000000041959c .kprobe_exceptions_notify+0x334/0x5e8 [c000000066eeba10] c00000000041a134 .notifier_call_chain+0x68/0x98 [c000000066eebaa0] c000000000418834 .program_check_exception+0x114/0x5d0 [c000000066eebb40] c000000000004348 program_check_common+0xc8/0x100 --- Exception: 700 (Program Check) at c00000000000ae38 .ppc_newuname+0x14/0x120 [link register ] c00000000002a3bc kretprobe_trampoline+0x0/0x8 [c000000066eebe30] c000000000004760 .handle_page_fault+0x20/0x54 (unreliable) --- Exception: c01 (System Call) at 000000000ffe2958 SP (fff6a970) is in userspace 1:mon> ---------------------------------------------------------- kprobe_exceptions_notify could be triggered by breakpoint or singstep trap. kprobe_exceptions_notify will check and if it was triggered by BreadkPoint, it will invoke kprobe_handler which will then invoke kprobe->pre_handler, i.e. the probe handlers. and the stap -p3 shows: dwarf_kprobe_1[i].pre_handler = &dwarf_kprobe_1_enter; So I think the exception notification stuff *could* result in launching into a kprobe. Am I wrong with something?
I tried the 2.6.15.1-2.6.15.4 and 2.6.16-rc5 kernels, and all of them gave almost the same error like: Unable to handle kernel paging request for data at address ... And if I don't use -b option of systemtap, it seemed that it could run for a long time without kernel panic. And I also noticed that the kernel reported the I/O error even when I wasn't running systemtap and only did some simple writing operations: end_request: I/O error, dev sda, sector 17445 end_request: I/O error, dev sda, sector 17447 end_request: I/O error, dev sda, sector 17449 Aborting journal on device sda2. ext3_abort called. EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only The same version of systemtap could run very well with 2.6.9-30EL, so it is a bug of the mainline kernel.
If you are seen problem even when not using SystemTap the this is probably something outside of SystemTap. I suggest following this up on the linux-kernel and linuxppc64-dev mailing list to see if the problems is located in the kernel. We should mark this bug as rejected until its proven that it is a SystemTap problem.
(In reply to comment #4) > If you are seen problem even when not using SystemTap the this is probably > something outside of SystemTap. I suggest following this up on the linux-kernel > and linuxppc64-dev mailing list to see if the problems is located in the kernel. > > We should mark this bug as rejected until its proven that it is a SystemTap problem. the error : end_request: I/O error, dev sda, sector 17445 ... will happen without running systemtap. It will occur after I copied something into that partition. But I am not sure if it is the reason of causing kernel panic when running systemtap. The error: Unable to handle kernel paging request for data at address will happed when running stap with -b option. But I agree with Jose that it may not be a systemtap bug, because systemtap could work quite well on the redhat shipped kernels(2.6.9-30.EL, 2.6.9-27.EL). It should not be a hardware failure because I tried it on different machines, and even after reformat the partition. all of them have the same error. The 2.6.15 kernel has some changes about power arch(move ppc64 to powerpc directory), and the relayfs diffs a lot from RH shipped kernel. I tried not to compile relayfs in 2.6.15* and want systemtap compile it, but failed. the relayfs shipped with systemtap can't be compiled. some function signatures has changed, and if I have time I'll try to replace relayfs.
(In reply to comment #5) > (In reply to comment #4) > > If you are seen problem even when not using SystemTap the this is probably > > something outside of SystemTap. I suggest following this up on the linux-kernel > > and linuxppc64-dev mailing list to see if the problems is located in the kernel. > > > > We should mark this bug as rejected until its proven that it is a SystemTap > problem. > > the error : end_request: I/O error, dev sda, sector 17445 ... > will happen without running systemtap. It will occur after I copied something > into that partition. But I am not sure if it is the reason of causing kernel > panic when running systemtap. > > The error: > Unable to handle kernel paging request for data at address > will happed when running stap with -b option. > But I agree with Jose that it may not be a systemtap bug, because systemtap > could work quite well on the redhat shipped kernels(2.6.9-30.EL, 2.6.9-27.EL). > > It should not be a hardware failure because I tried it on different machines, > and even after reformat the partition. all of them have the same error. > > The 2.6.15 kernel has some changes about power arch(move ppc64 to powerpc > directory), and the relayfs diffs a lot from RH shipped kernel. I tried not to > compile relayfs in 2.6.15* and want systemtap compile it, but failed. the > relayfs shipped with systemtap can't be compiled. some function signatures has > changed, and if I have time I'll try to replace relayfs. > > > > To get systemtap to use the relayfs in the 2.6.15 kernel, try putting #define RELAYFS_VERSION_GE_4 at the top of src/runtime/transport/relayfs.h. Tom
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > If you are seen problem even when not using SystemTap the this is probably > > > something outside of SystemTap. I suggest following this up on the linux-kernel > > > and linuxppc64-dev mailing list to see if the problems is located in the kernel. > > > > > > We should mark this bug as rejected until its proven that it is a SystemTap > > problem. > > > > the error : end_request: I/O error, dev sda, sector 17445 ... > > will happen without running systemtap. It will occur after I copied something > > into that partition. But I am not sure if it is the reason of causing kernel > > panic when running systemtap. > > > > The error: > > Unable to handle kernel paging request for data at address > > will happed when running stap with -b option. > > But I agree with Jose that it may not be a systemtap bug, because systemtap > > could work quite well on the redhat shipped kernels(2.6.9-30.EL, 2.6.9-27.EL). > > > > It should not be a hardware failure because I tried it on different machines, > > and even after reformat the partition. all of them have the same error. > > > > The 2.6.15 kernel has some changes about power arch(move ppc64 to powerpc > > directory), and the relayfs diffs a lot from RH shipped kernel. I tried not to > > compile relayfs in 2.6.15* and want systemtap compile it, but failed. the > > relayfs shipped with systemtap can't be compiled. some function signatures has > > changed, and if I have time I'll try to replace relayfs. > > > > > > > > > > To get systemtap to use the relayfs in the 2.6.15 kernel, try putting #define > RELAYFS_VERSION_GE_4 at the top of src/runtime/transport/relayfs.h. > > Tom I don't know if this is or isn't the cause of the problem, since I'm not seeing it on my x86 test machine, but I do see that the wrong relayfs_fs.h header file (the one in runtime/relayfs/linux/ rather than the one in the installed kernel sources) is being used to generate the probe module, when running a 2.6.15 kernel without the RELAYFS_VERSION_GE_4 define in relayfs.h. Can you go ahead and try adding that define and see if it helps? i.e. add #define RELAYFS_VERSION_GE_4 to src/runtime/transport/relayfs.h and then do a 'make install' to get it installed. Also make sure you have relayfs configured into your kernel. If that's the problem, then this bug could probably be closed and would be fixed by 2406, which deals with autodetecting the proper relayfs version, including this one.
> I don't know if this is or isn't the cause of the problem, since I'm not seeing > it on my x86 test machine, but I do see that the wrong relayfs_fs.h header file > (the one in runtime/relayfs/linux/ rather than the one in the installed kernel > sources) is being used to generate the probe module, when running a 2.6.15 > kernel without the RELAYFS_VERSION_GE_4 define in relayfs.h. > > Can you go ahead and try adding that define and see if it helps? i.e. add > #define RELAYFS_VERSION_GE_4 to src/runtime/transport/relayfs.h and then do a > 'make install' to get it installed. Also make sure you have relayfs configured > into your kernel. > > If that's the problem, then this bug could probably be closed and would be fixed > by 2406, which deals with autodetecting the proper relayfs version, including > this one. I tried, and it worked. Thanks. It seems not crash any more. But there is some errors(in fact, warnings) when stap is compiling the module, I bypassed it by delete the -Werror in buildrun.cxx: Running grep " [tT] " /proc/kallsyms | sort -k 1,8 -s -o /tmp/stap2iLdUc/symbols.sorted Pass 3: translated to C into "/tmp/stap2iLdUc/stap_6318.c" in 280usr/1000sys/1294real ms. Running make -C "/lib/modules/2.6.9-30.EL/build" M="/tmp/stap2iLdUc" modules V=1 make: Entering directory `/usr/src/kernels/2.6.9-30.EL-ppc64' mkdir -p /tmp/stap2iLdUc/.tmp_versions make -f scripts/Makefile.build obj=/tmp/stap2iLdUc gcc -m64 -Wp,-MD,/tmp/stap2iLdUc/.stap_6318.o.d -nostdinc -iwithprefix include -D__KERNEL__ -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -g -Wdeclaration-after-statement -msoft-float -pipe -mminimal-toc -mtraceback=none -mcall-aixdesc -mtune=power4 -fno-unit-at-a-time -Wno-unused -Werror -I "/usr/local/share/systemtap/runtime" -I "/usr/local/share/systemtap/runtime/relayfs" -DMODULE -DKBUILD_BASENAME=stap_6318 -DKBUILD_MODNAME=stap_6318 -c -o /tmp/stap2iLdUc/.tmp_stap_6318.o /tmp/stap2iLdUc/stap_6318.c In file included from /usr/local/share/systemtap/runtime/transport/transport.c:20, from /usr/local/share/systemtap/runtime/io.c:14, from /usr/local/share/systemtap/runtime/print.c:16, from /usr/local/share/systemtap/runtime/runtime.h:61, from /tmp/stap2iLdUc/stap_6318.c:30: /usr/local/share/systemtap/runtime/transport/relayfs.c: In function `_stp_subbuf_start': /usr/local/share/systemtap/runtime/transport/relayfs.c:33: warning: implicit declaration of function `relay_buf_full' /usr/local/share/systemtap/runtime/transport/relayfs.c:39: warning: implicit declaration of function `subbuf_start_reserve' /usr/local/share/systemtap/runtime/transport/relayfs.c: At top level: /usr/local/share/systemtap/runtime/transport/relayfs.c:77: warning: initialization from incompatible pointer type /usr/local/share/systemtap/runtime/transport/relayfs.c: In function `_stp_relayfs_open': /usr/local/share/systemtap/runtime/transport/relayfs.c:129: warning: passing arg 5 of `relay_open' makes integer from pointer without a cast /usr/local/share/systemtap/runtime/transport/relayfs.c:129: error: too few arguments to function `relay_open' In file included from /usr/local/share/systemtap/runtime/transport/transport.c:45, from /usr/local/share/systemtap/runtime/io.c:14, from /usr/local/share/systemtap/runtime/print.c:16, from /usr/local/share/systemtap/runtime/runtime.h:61, from /tmp/stap2iLdUc/stap_6318.c:30: /usr/local/share/systemtap/runtime/transport/procfs.c: In function `_stp_proc_read': /usr/local/share/systemtap/runtime/transport/procfs.c:35: error: incompatible types in assignment /usr/local/share/systemtap/runtime/transport/procfs.c:36: error: incompatible types in assignment In file included from /usr/local/share/systemtap/runtime/io.c:14, from /usr/local/share/systemtap/runtime/print.c:16, from /usr/local/share/systemtap/runtime/runtime.h:61, from /tmp/stap2iLdUc/stap_6318.c:30: /usr/local/share/systemtap/runtime/transport/transport.c: In function `_stp_handle_buf_info': /usr/local/share/systemtap/runtime/transport/transport.c:86: error: incompatible types in assignment /usr/local/share/systemtap/runtime/transport/transport.c:87: error: incompatible types in assignment make[1]: *** [/tmp/stap2iLdUc/stap_6318.o] Error 1 make: *** [_module_/tmp/stap2iLdUc] Error 2 make: Leaving directory `/usr/src/kernels/2.6.9-30.EL-ppc64' Pass 4: compiled C into "stap_6318.ko" in 2820usr/220sys/2893real ms. Pass 4: compilation failed. Try again with more '-v' (verbose) options. Running rm -rf /tmp/stap2iLdUc
> I tried, and it worked. Thanks. It seems not crash any more. > But there is some errors(in fact, warnings) when stap is compiling the module, I > bypassed it by delete the -Werror in buildrun.cxx: The error on 2.6.15.3 kernel will be(with -Werror in buildrun.cxx): Running grep " [tT] " /proc/kallsyms | sort -k 1,8 -s -o /tmp/stap5mvGWl/symbols.sorted Pass 3: translated to C into "/tmp/stap5mvGWl/stap_12492.c" in 220usr/90sys/313real ms. Running make -C "/lib/modules/2.6.15.3/build" M="/tmp/stap5mvGWl" modules V=1 make: Entering directory `/usr/src/linux-2.6.15.3' mkdir -p /tmp/stap5mvGWl/.tmp_versions make -f scripts/Makefile.build obj=/tmp/stap5mvGWl gcc -m64 -Wp,-MD,/tmp/stap5mvGWl/.stap_12492.o.d -nostdinc -isystem /usr/lib/gcc/ppc64-redhat-linux/3.4.5/include -D__KERNEL__ -Iinclude -include include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -Os -fomit-frame-pointer -g -msoft-float -pipe -mminimal-toc -mtraceback=none -mcall-aixdesc -mtune=power4 -mno-altivec -funit-at-a-time -mstring -Wa,-maltivec -Wdeclaration-after-statement -Wno-unused -Werror -I "/usr/local/share/systemtap/runtime" -I "/usr/local/share/systemtap/runtime/relayfs" -DMODULE -DKBUILD_BASENAME=stap_12492 -DKBUILD_MODNAME=stap_12492 -c -o /tmp/stap5mvGWl/.tmp_stap_12492.o /tmp/stap5mvGWl/stap_12492.c In file included from /usr/local/share/systemtap/runtime/transport/transport.c:20, from /usr/local/share/systemtap/runtime/io.c:14, from /usr/local/share/systemtap/runtime/print.c:16, from /usr/local/share/systemtap/runtime/runtime.h:61, from /tmp/stap5mvGWl/stap_12492.c:30: /usr/local/share/systemtap/runtime/transport/relayfs.c:77: warning: initialization from incompatible pointer type make[1]: *** [/tmp/stap5mvGWl/stap_12492.o] Error 1 make: *** [_module_/tmp/stap5mvGWl] Error 2 make: Leaving directory `/usr/src/linux-2.6.15.3' Pass 4: compiled C into "stap_12492.ko" in 2210usr/250sys/2104real ms. Pass 4: compilation failed. Try again with more '-v' (verbose) options. Running rm -rf /tmp/stap5mvGWl So we need to do some explicit type cast to eliminate such warnings?
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > If you are seen problem even when not using SystemTap the this is probably > > > something outside of SystemTap. I suggest following this up on the linux-kernel > > > and linuxppc64-dev mailing list to see if the problems is located in the kernel. > > > > > > We should mark this bug as rejected until its proven that it is a SystemTap > > problem. > > > > the error : end_request: I/O error, dev sda, sector 17445 ... > > will happen without running systemtap. It will occur after I copied something > > into that partition. But I am not sure if it is the reason of causing kernel > > panic when running systemtap. > > > > The error: > > Unable to handle kernel paging request for data at address > > will happed when running stap with -b option. > > But I agree with Jose that it may not be a systemtap bug, because systemtap > > could work quite well on the redhat shipped kernels(2.6.9-30.EL, 2.6.9-27.EL). > > > > It should not be a hardware failure because I tried it on different machines, > > and even after reformat the partition. all of them have the same error. > > > > The 2.6.15 kernel has some changes about power arch(move ppc64 to powerpc > > directory), and the relayfs diffs a lot from RH shipped kernel. I tried not to > > compile relayfs in 2.6.15* and want systemtap compile it, but failed. the > > relayfs shipped with systemtap can't be compiled. some function signatures has > > changed, and if I have time I'll try to replace relayfs. > > > > > > > > > > To get systemtap to use the relayfs in the 2.6.15 kernel, try putting #define > RELAYFS_VERSION_GE_4 at the top of src/runtime/transport/relayfs.h. > > Tom I don't know if this is or isn't the cause of the problem, since I'm not seeing it on my x86 test machine, but I do see that the wrong relayfs_fs.h header file (the one in runtime/relayfs/linux/ rather than the one in the installed kernel sources) is being used to generate the probe module, when running a 2.6.15 kernel without the RELAYFS_VERSION_GE_4 define in relayfs.h. Can you go ahead and try adding that define and see if it helps? i.e. add #define RELAYFS_VERSION_GE_4 to src/runtime/transport/relayfs.h and then do a 'make install' to get it installed. Also make sure you have relayfs configured into your kernel. If that's the problem, then this bug could probably be closed and would be fixed by 2406, which deals with autodetecting the proper relayfs version, including this one.(In reply to comment #8) > > I don't know if this is or isn't the cause of the problem, since I'm not seeing > > it on my x86 test machine, but I do see that the wrong relayfs_fs.h header file > > (the one in runtime/relayfs/linux/ rather than the one in the installed kernel > > sources) is being used to generate the probe module, when running a 2.6.15 > > kernel without the RELAYFS_VERSION_GE_4 define in relayfs.h. > > > > Can you go ahead and try adding that define and see if it helps? i.e. add > > #define RELAYFS_VERSION_GE_4 to src/runtime/transport/relayfs.h and then do a > > 'make install' to get it installed. Also make sure you have relayfs configured > > into your kernel. > > > > If that's the problem, then this bug could probably be closed and would be fixed > > by 2406, which deals with autodetecting the proper relayfs version, including > > this one. > > I tried, and it worked. Thanks. It seems not crash any more. > But there is some errors(in fact, warnings) when stap is compiling the module, I > bypassed it by delete the -Werror in buildrun.cxx: > > Running grep " [tT] " /proc/kallsyms | sort -k 1,8 -s -o > /tmp/stap2iLdUc/symbols.sorted > Pass 3: translated to C into "/tmp/stap2iLdUc/stap_6318.c" in > 280usr/1000sys/1294real ms. > Running make -C "/lib/modules/2.6.9-30.EL/build" M="/tmp/stap2iLdUc" modules V=1 > make: Entering directory `/usr/src/kernels/2.6.9-30.EL-ppc64' > mkdir -p /tmp/stap2iLdUc/.tmp_versions > make -f scripts/Makefile.build obj=/tmp/stap2iLdUc > gcc -m64 -Wp,-MD,/tmp/stap2iLdUc/.stap_6318.o.d -nostdinc -iwithprefix include > -D__KERNEL__ -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs > -fno-strict-aliasing -fno-common -Os -g -Wdeclaration-after-statement > -msoft-float -pipe -mminimal-toc -mtraceback=none -mcall-aixdesc > -mtune=power4 -fno-unit-at-a-time -Wno-unused -Werror -I > "/usr/local/share/systemtap/runtime" -I > "/usr/local/share/systemtap/runtime/relayfs" -DMODULE > -DKBUILD_BASENAME=stap_6318 -DKBUILD_MODNAME=stap_6318 -c -o > /tmp/stap2iLdUc/.tmp_stap_6318.o /tmp/stap2iLdUc/stap_6318.c > In file included from /usr/local/share/systemtap/runtime/transport/transport.c:20, > from /usr/local/share/systemtap/runtime/io.c:14, > from /usr/local/share/systemtap/runtime/print.c:16, > from /usr/local/share/systemtap/runtime/runtime.h:61, > from /tmp/stap2iLdUc/stap_6318.c:30: > /usr/local/share/systemtap/runtime/transport/relayfs.c: In function > `_stp_subbuf_start': > /usr/local/share/systemtap/runtime/transport/relayfs.c:33: warning: implicit > declaration of function `relay_buf_full' > /usr/local/share/systemtap/runtime/transport/relayfs.c:39: warning: implicit > declaration of function `subbuf_start_reserve' > /usr/local/share/systemtap/runtime/transport/relayfs.c: At top level: > /usr/local/share/systemtap/runtime/transport/relayfs.c:77: warning: > initialization from incompatible pointer type > /usr/local/share/systemtap/runtime/transport/relayfs.c: In function > `_stp_relayfs_open': > /usr/local/share/systemtap/runtime/transport/relayfs.c:129: warning: passing arg > 5 of `relay_open' makes integer from pointer without a cast > /usr/local/share/systemtap/runtime/transport/relayfs.c:129: error: too few > arguments to function `relay_open' > In file included from /usr/local/share/systemtap/runtime/transport/transport.c:45, > from /usr/local/share/systemtap/runtime/io.c:14, > from /usr/local/share/systemtap/runtime/print.c:16, > from /usr/local/share/systemtap/runtime/runtime.h:61, > from /tmp/stap2iLdUc/stap_6318.c:30: > /usr/local/share/systemtap/runtime/transport/procfs.c: In function `_stp_proc_read': > /usr/local/share/systemtap/runtime/transport/procfs.c:35: error: incompatible > types in assignment > /usr/local/share/systemtap/runtime/transport/procfs.c:36: error: incompatible > types in assignment > In file included from /usr/local/share/systemtap/runtime/io.c:14, > from /usr/local/share/systemtap/runtime/print.c:16, > from /usr/local/share/systemtap/runtime/runtime.h:61, > from /tmp/stap2iLdUc/stap_6318.c:30: > /usr/local/share/systemtap/runtime/transport/transport.c: In function > `_stp_handle_buf_info': > /usr/local/share/systemtap/runtime/transport/transport.c:86: error: incompatible > types in assignment > /usr/local/share/systemtap/runtime/transport/transport.c:87: error: incompatible > types in assignment > make[1]: *** [/tmp/stap2iLdUc/stap_6318.o] Error 1 > make: *** [_module_/tmp/stap2iLdUc] Error 2 > make: Leaving directory `/usr/src/kernels/2.6.9-30.EL-ppc64' > Pass 4: compiled C into "stap_6318.ko" in 2820usr/220sys/2893real ms. > Pass 4: compilation failed. Try again with more '-v' (verbose) options. > Running rm -rf /tmp/stap2iLdUc Hmm, where did you put the #define? I get these warnings if I put it at the bottom of relayfs.h, but putting it at the top, just above #ifdef RELAYFS_VERSION_GE_4 #include <linux/relayfs_fs.h> ... it works fine for me...
> Hmm, where did you put the #define? > > I get these warnings if I put it at the bottom of relayfs.h, but putting it at > the top, just above > > #ifdef RELAYFS_VERSION_GE_4 > #include <linux/relayfs_fs.h> > ... > > it works fine for me... the file I used: #ifndef _TRANSPORT_RELAYFS_H_ /* -*- linux-c -*- */ #define _TRANSPORT_RELAYFS_H_ #define RELAYFS_VERSION_GE_4 /** @file relayfs.h * @brief Header file for relayfs transport */ #ifdef RELAYFS_VERSION_GE_4 #include <linux/relayfs_fs.h> #else #include "../relayfs/linux/relayfs_fs.h" #endif /* RELAYFS_VERSION_GE_4 */ struct rchan *_stp_relayfs_open(unsigned n_subbufs, unsigned subbuf_size, int pid, struct dentry **outdir); void _stp_relayfs_close(struct rchan *chan, struct dentry *dir); #endif /* _TRANSPORT_RELAYFS_H_ */ So is it due to the gcc version? My gcc is: gcc version 3.4.5 20051201 (Red Hat 3.4.5-2) I checked the codes, and it is just a warning of the assignment: int *ptr <--- static int *ptr But I met another problem, I use my testcase to stress test systemtap: -bash-3.00# ./test.sh -f lgl.cfg -I tapsets/tapsets1/ The tapsets is tapsets/tapsets1/ don't probe app : dbench TIMES : 1 TIMES : 2 probe app : dbench TIMES : 1 TIMES : 2 error opening file stpd_cpu0. ERROR: couldn't unlink percpu file stpd_cpu0: errcode = No such file or directory Do you have any ideas of such errors? I never met it before. I raise the MAXDSKIPPED when running my testcases
(In reply to comment #11) > > Hmm, where did you put the #define? > > > > I get these warnings if I put it at the bottom of relayfs.h, but putting it at > > the top, just above > > > > #ifdef RELAYFS_VERSION_GE_4 > > #include <linux/relayfs_fs.h> > > ... > > > > it works fine for me... > > the file I used: > > #ifndef _TRANSPORT_RELAYFS_H_ /* -*- linux-c -*- */ > #define _TRANSPORT_RELAYFS_H_ > #define RELAYFS_VERSION_GE_4 > > /** @file relayfs.h > * @brief Header file for relayfs transport > */ > > #ifdef RELAYFS_VERSION_GE_4 > #include <linux/relayfs_fs.h> > #else > #include "../relayfs/linux/relayfs_fs.h" > #endif /* RELAYFS_VERSION_GE_4 */ > > struct rchan *_stp_relayfs_open(unsigned n_subbufs, > unsigned subbuf_size, > int pid, > struct dentry **outdir); > void _stp_relayfs_close(struct rchan *chan, struct dentry *dir); > > #endif /* _TRANSPORT_RELAYFS_H_ */ > > So is it due to the gcc version? My gcc is: > gcc version 3.4.5 20051201 (Red Hat 3.4.5-2) > I checked the codes, and it is just a warning of the assignment: > int *ptr <--- static int *ptr > I'm using gcc 4.1.0 > But I met another problem, I use my testcase to stress test systemtap: > > -bash-3.00# ./test.sh -f lgl.cfg -I tapsets/tapsets1/ > The tapsets is tapsets/tapsets1/ > don't probe app : dbench > TIMES : 1 > TIMES : 2 > probe app : dbench > TIMES : 1 > TIMES : 2 > error opening file stpd_cpu0. > ERROR: couldn't unlink percpu file stpd_cpu0: errcode = No such file or directory > > Do you have any ideas of such errors? I never met it before. > I raise the MAXDSKIPPED when running my testcases No, I haven't seen that before either.
> -bash-3.00# ./test.sh -f lgl.cfg -I tapsets/tapsets1/ > The tapsets is tapsets/tapsets1/ > don't probe app : dbench > TIMES : 1 > TIMES : 2 > probe app : dbench > TIMES : 1 > TIMES : 2 > error opening file stpd_cpu0. > ERROR: couldn't unlink percpu file stpd_cpu0: errcode = No such file or directory > > Do you have any ideas of such errors? I never met it before. > I raise the MAXDSKIPPED when running my testcases It may due to my testcase. I run stap in background and when benchmark tools finished running, I just: kill -s SIGINT -- stappid stpdpid I should terminate stap & stpd in a right order. I think this is the cause. I think this bug could be closed. *** This bug has been marked as a duplicate of 2406 ***