This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug runtime/22847] ARM OABI syscall tracing issues


https://sourceware.org/bugzilla/show_bug.cgi?id=22847

--- Comment #21 from David Smith <dsmith at redhat dot com> ---
(In reply to Gustavo Moreira from comment #20)
> (In reply to David Smith from comment #19)
> > (sorry for the delay in responding)
> > 
> > (In reply to Gustavo Moreira from comment #18)
> > > I ended up modifying the kernel to update thread_info struct with the
> > > syscall number. Then I just call the original kernel syscall_get_nr()
> > > function from SystemTap, which is working like a charm.
> > 
> > Good deal. Have you tried getting the kernel patch upstream?

> 
> Not yet. Do you think they could be interested?

Yes. Without getting your patch in the upstream kernel, you work here will only
be useful for you.

> > ... stuff deleted ...
> > 
> > > However, for instance, when it's used with your strace.stp which uses probe
> > > alias, it doesn't work ... it doesn't report the syscalls. Even using an
> > > EABI binary it doesn't report the syscalls. (See staprun_output_eabi.log and
> > > staprun_output_oabi.log)
> > > 
> > > I also noticed that, for instance from tapset/linux/sysc_connect.stp,
> > > __syscall_gate() is called to filter the syscalls, so I've crafted some code
> > > (see syscalls_stpm.patch) to avoid to be filtered in case the syscall number
> > > doesn't match with the constants.
> > > 
> > > I'm not getting what is happening from the SystemTap side, it seems the
> > > syscalls are being filtered somewhere ... could you please help me out?
> > 
> > You'll need to break down the @__syscall_gate macro into smaller pieces and
> > see where it is calling "next". Another idea, perhaps simpler, would be to
> > stick printf calls in that macro (and all that it calls) to let you know
> > which macro is calling "next". My guess would be that the
> > @__syscall_gate_compat_simple macro is doing the filtering, but you'll need
> > to test that theory.
> 
> Actually, the patches are fully working. The probes wasn't being called due
> to the MAXSKIPPED limit:
> So, I've suppressed the time limits checks (--suppress-time-limits). I could
> also increase the limit to a specific value but anyway I wonder why it's
> happening now after these changes.
> 
> What do you think about the changes in syscalls.stpm? Do they look good?

I've got some problems with the changes to syscalls.stpm. Besides having debug
printf's present, your changes bypass the filtering if you've got a OABI
executable. You'll end up with syscall nesting that way, something we
definitely try to avoid. Also, you'd need similar changes in the other macros -
__syscall_gate2, __syscall_compat_gate, etc.

Earlier, you said: In OABI the syscall convention is svc 0x900000 + SYSCALL_NR.
If that is true, couldn't your changes be simplified to:

    %( CONFIG_OABI_COMPAT == "y" %?
        # If _stp_syscall_nr() fails, that means we aren't in user
        # context. So, skip this call.
        try { __nr = _stp_syscall_nr() } catch { next }

        # In ARM, if it is an OABI call, the syscalls are >
__NR_OABI_SYSCALL_BASE
        if (__nr > @const("__NR_OABI_SYSCALL_BASE")) {
             __nr = __nr - @const("__NR_OABI_SYSCALL_BASE")
        }
        if (__nr != @syscall_nr) next
     %:
...

And then the next thing I wonder is there has got to be more difference than
just syscall numbers between the two ABIs. I assume structures are laid out
differently along with perhaps other changes. You'll have to account for that.

Poking around the arch/arm directory I'd guess you might need to probe the
sys_oabi_* functions and implement a way of knowing if we're in an OABI
executable (like setting a thread flag).


> It also shows two warnings in the output:
> 
> WARNING: Skipped due to missed kretprobe/2 on
> 'kprobe.function("sys_readlink").return?': 1
> WARNING: Skipped due to missed kprobe on 'kprobe.function("sys_readlink")?':
> 1
> 
> I don't think it would be important but anyway it would be nice if we could
> fix it as well. Any clue?

Actually, it is important (and probably why MAXSKIPPED is being hit). Let's
start with the definition of MAXSKIPPED: "Maximum number of skipped probes
before an exit is triggered, default 100."

So, the first question to answer is "why are you getting so many skipped
probes?". You might start by seeing if the kernel outputs any messages when
this happens.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]