tapset/x86_64/syscalls.stp and tapset/x86_64/nd_syscalls.stp declare a couple of syscall names that don't actually exist. syscall.pipe32, syscall.mmap32 and maybe some others. These seem to be added to probe 32on64 executables calling a particular syscall. But it seems strange to expose these as different syscall probe names. If one would like to filter on such syscall usage one could use the probing_32bit_app() function in the script. If we do expose them as syscall probe variants then a different naming scheme (syscall.32on64.<name>) might be better. And some guidance when we (also) expose a 32bit variant on a 64bit kernel. Since currently we only seem to expose some.
On x86_64, the 64-bit pipe syscall (sys_pipe) has a syscall number of 22. On x86_64, the 32-bit pipe syscall (sys32_pipe) has a syscall number of 42. unistd_32.h:#define __NR_pipe 42 unistd_64.h:#define __NR_pipe 22 Admittedly, most of the 32-bit variants on x86_64 are just wrappers around the 64-bit functions (with some argument modification). But, they are still different system calls. If you did a process.syscall probe looking for a '$syscall' of 22, you'll never see it when executing a 32-bit exe that calls pipe(). Here's what I get when running systemtap against a small C program that opens a pipe and closes it (compiled for both 64-bit and 32-bit): # stap -ve 'probe syscall.* { printf("%s\n", probefunc()) }' -c pipe64 ... sys_pipe sys_close sys_close sys_exit_group do_exit sys_wait4 sys_write # stap -ve 'probe syscall.* { printf("%s\n", probefunc()) }' -c pipe32 ... sys32_pipe sys_close sys_close sys_exit_group do_exit sys_wait4 sys_write That output looks reasonable to me. I'm confused as what the problem is here.
The "problem" to me is that we don't do this splitting of 32on64 versus "pure 64" bit syscalls consistently. Why do we have syscall.pipe32 and syscall.mmap32, but not syscall.fstat32 for example? syscall.fstat is a nice example since we make it match any syscall variant that is called "fstat" whether it is the "plain" one or the compat/32on64 version.
(In reply to comment #2) > The "problem" to me is that we don't do this splitting of 32on64 versus "pure > 64" bit syscalls consistently. Why do we have syscall.pipe32 and syscall.mmap32, > but not syscall.fstat32 for example? > > syscall.fstat is a nice example since we make it match any syscall variant that > is called "fstat" whether it is the "plain" one or the compat/32on64 version. I believe the reasons are mostly historical. I'd guess that when there were argument differences between the 32-bit and 64-bit syscall, the '32' probe variant would be created. (I realize this doesn't account for the syscall.pipe/syscall.pipe32 case, but who said we were 100% consistent?) If this tapset would be written today, it would probably be written like (untested): ==== probe _syscall.foo = kernel.function("sys_foo") { # handle arguments... } probe _syscall.foo32 = kernel.function("sys_foo32") ? { # handle 32-bit arguments... } probe syscall.foo = _syscall.foo, _syscall.foo32 ==== Now the question we have to think about is: If we refactor the syscall.foo32 probes, how many existing scripts do we break?
Another wrinkle since 2.6.33: commit f8b7256096a20436f6d0926747e3ac3d64c81d24 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Mon Nov 30 17:37:04 2009 -0500 Unify sys_mmap* New helper - sys_mmap_pgoff(); switch syscalls to using it. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> This means everything is gated through sys_mmap_pgoff now on both x86_64 and i386. But that also makes it hard to distinguish syscall.mmap (what we call sys_mmap on x86_64) and syscall.mmap2 (what we call sys32_mmap on i386). I don't immediately see how we can keep providing syscall.mmap2 without also triggering syscall.mmap for the user.
This has been fixed, as much as is possible. For newer syscalls and new code for old syscalls, we try to squash the differences between the 32-bit and 64-bit syscalls. Even on old probes (that we need to keep around for compatibility reasons), the "name" convenience variable now returns "foo" instead of "foo32".