Bug 11263 - exposing foo32 syscalls
Summary: exposing foo32 syscalls
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: tapsets (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-09 12:59 UTC by Mark Wielaard
Modified: 2015-06-19 17:06 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2010-02-09 12:59:24 UTC
tapset/x86_64/syscalls.stp and tapset/x86_64/nd_syscalls.stp declare a couple of
syscall names that don't actually exist. syscall.pipe32, syscall.mmap32 and
maybe some others.

These seem to be added to probe 32on64 executables calling a particular syscall.
But it seems strange to expose these as different syscall probe names. If one
would like to filter on such syscall usage one could use the probing_32bit_app()
function in the script.

If we do expose them as syscall probe variants then a different naming scheme
(syscall.32on64.<name>) might be better. And some guidance when we (also) expose
a 32bit variant on a 64bit kernel. Since currently we only seem to expose some.
Comment 1 David Smith 2010-02-09 15:18:39 UTC
On x86_64, the 64-bit pipe syscall (sys_pipe) has a syscall number of 22.  On
x86_64, the 32-bit pipe syscall (sys32_pipe) has a syscall number of 42.

unistd_32.h:#define __NR_pipe		 42
unistd_64.h:#define __NR_pipe				22

Admittedly, most of the 32-bit variants on x86_64 are just wrappers around the
64-bit functions (with some argument modification).  But, they are still
different system calls.

If you did a process.syscall probe looking for a '$syscall' of 22, you'll never
see it when executing a 32-bit exe that calls pipe().

Here's what I get when running systemtap against a small C program that opens a
pipe and closes it (compiled for both 64-bit and 32-bit):

# stap -ve 'probe syscall.* { printf("%s\n", probefunc()) }' -c pipe64
...
sys_pipe
sys_close
sys_close
sys_exit_group
do_exit
sys_wait4
sys_write

# stap -ve 'probe syscall.* { printf("%s\n", probefunc()) }' -c pipe32
...
sys32_pipe
sys_close
sys_close
sys_exit_group
do_exit
sys_wait4
sys_write

That output looks reasonable to me.  I'm confused as what the problem is here.
Comment 2 Mark Wielaard 2010-02-09 15:43:14 UTC
The "problem" to me is that we don't do this splitting of 32on64 versus "pure
64" bit syscalls consistently. Why do we have syscall.pipe32 and syscall.mmap32,
but not syscall.fstat32 for example?

syscall.fstat is a nice example since we make it match any syscall variant that
is called "fstat" whether it is the "plain" one or the compat/32on64 version.
Comment 3 David Smith 2010-02-09 19:02:46 UTC
(In reply to comment #2)
> The "problem" to me is that we don't do this splitting of 32on64 versus "pure
> 64" bit syscalls consistently. Why do we have syscall.pipe32 and syscall.mmap32,
> but not syscall.fstat32 for example?
> 
> syscall.fstat is a nice example since we make it match any syscall variant that
> is called "fstat" whether it is the "plain" one or the compat/32on64 version.

I believe the reasons are mostly historical.  I'd guess that when there were
argument differences between the 32-bit and 64-bit syscall, the '32' probe
variant would be created.  (I realize this doesn't account for the
syscall.pipe/syscall.pipe32 case, but who said we were 100% consistent?)

If this tapset would be written today, it would probably be written like (untested):

====
probe _syscall.foo = kernel.function("sys_foo") {
  # handle arguments...
}
probe _syscall.foo32 = kernel.function("sys_foo32") ? {
  # handle 32-bit arguments...
}
probe syscall.foo = _syscall.foo, _syscall.foo32
====

Now the question we have to think about is: If we refactor the syscall.foo32
probes, how many existing scripts do we break?
Comment 4 Mark Wielaard 2010-02-14 10:44:22 UTC
Another wrinkle since 2.6.33:


commit f8b7256096a20436f6d0926747e3ac3d64c81d24
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Mon Nov 30 17:37:04 2009 -0500

    Unify sys_mmap*
    
    New helper - sys_mmap_pgoff(); switch syscalls to using it.
    
    Acked-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

This means everything is gated through sys_mmap_pgoff now on both x86_64 and
i386. But that also makes it hard to distinguish syscall.mmap (what we call
sys_mmap on x86_64) and syscall.mmap2 (what we call sys32_mmap on i386). I don't
immediately see how we can keep providing syscall.mmap2 without also triggering
syscall.mmap for the user.
Comment 5 David Smith 2015-06-19 17:06:27 UTC
This has been fixed, as much as is possible. For newer syscalls and new code for old syscalls, we try to squash the differences between the 32-bit and 64-bit syscalls. Even on old probes (that we need to keep around for compatibility reasons), the "name" convenience variable now returns "foo" instead of "foo32".