Bug 10454 - Raw number statement probes won't work without dwarf info
Summary: Raw number statement probes won't work without dwarf info
Status: RESOLVED WONTFIX
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-28 13:32 UTC by Mark Wielaard
Modified: 2015-06-19 16:50 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
uprobe_stmt_num.exp (818 bytes, text/plain)
2009-07-29 11:25 UTC, Mark Wielaard
Details
uprobe_stmt_num.c (144 bytes, text/plain)
2009-07-29 11:26 UTC, Mark Wielaard
Details
uprobe_stmt_num.stp (96 bytes, text/plain)
2009-07-29 11:26 UTC, Mark Wielaard
Details
uprobes_stmt_num.exp (971 bytes, text/plain)
2009-07-29 12:11 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2009-07-28 13:32:31 UTC
Something like the following on a function without dwarf info won't work:

$ nm /lib64/libpthread-2.10.1.so | grep -w __lll_lock_wait
00000030ac20d910 t __lll_lock_wait

$ stap -d /lib64/libpthread-2.10.1.so -g -e 'probe
process("/lib64/libpthread-2.10.1.so").statement(0x00000030ac20d910) { log
("__lll_lock_wait called"); }'

Note that we need to use -g otherwise we get:

semantic error: address 0x30ac20d910 does not match the beginning of a statement
(no line info found for '../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S',
in module '/lib64/libpthread-2.10.1.so')

We hit this part of tapsets.cxx (query_cu):

          // Verify that a raw address matches the beginning of a
          // statement. This is a somewhat lame check that the address
          // is at the start of an assembly instruction.  Mark probes are in the
          // middle of a macro and thus not strictly at a statement beginning.
          // Guru mode may override this check.

It might be an idea to allow "misplaced statement expressions" for user space
probes always since it isn't very "dangerous". Also note that "assembly
instruction" isn't really what is being tested. It is really whether there is
dwarf line info that says the statement given starts on a line.

Given guru mode, the next part of query_cu () then calls
dwlpp.iterate_over_functions(), with query_dwarf_func() as callback.
query_dwarf_func() will then populate the dwarf_query filtered_functions vector.
But since there is no dwarf info for the covered statement, the dwarf_query
callback is never called, so filtered_functions stays empty.

So this last part of query_cu() "// Otherwise, simply probe all resolved
functions." will not call query_func_info () so query_statement () is also never
called, so no probe will be registered.
Comment 1 Mark Wielaard 2009-07-28 13:40:37 UTC
There is an oddity that I don't understand with this code (when it does work).
It looks like if a probe containing a statement_num does have dwarf info then it
gets registered through query_func_info (i->entrypc, *i, q); which calls
query_statement() as follows:

          if (fi.prologue_end != 0)
            {
              query_statement (fi.name, fi.decl_file, fi.decl_line,
                               &fi.die, fi.prologue_end, q);
            }
          else
            {
              query_statement (fi.name, fi.decl_file, fi.decl_line,
                               &fi.die, entrypc, q);
            }

But that seems wrong. Why is it using the prologue_end or entrypc of the
function in which the statement is instead of the value of the statement num itself?
Comment 2 Josh Stone 2009-07-28 19:30:39 UTC
(In reply to comment #0)
> We hit this part of tapsets.cxx (query_cu):
> 
>           // Verify that a raw address matches the beginning of a
>           // statement. This is a somewhat lame check that the address
>           // is at the start of an assembly instruction.  Mark probes are in the
>           // middle of a macro and thus not strictly at a statement beginning.
>           // Guru mode may override this check.
> 
> It might be an idea to allow "misplaced statement expressions" for user space
> probes always since it isn't very "dangerous". Also note that "assembly
> instruction" isn't really what is being tested. It is really whether there is
> dwarf line info that says the statement given starts on a line.

Well, being the beginning of a statement is a sufficient condition for being the
beginning of an assembly instruction, right?  Sure, that's not a necessary
condition, but that's probably why this is a "somewhat lame check".

As for danger -- a misplaced probe in a user app won't bring down the system,
but it will likely break or even crash the application, which still violates our
philosophy of harmless probing.
Comment 3 Josh Stone 2009-07-28 19:34:26 UTC
(In reply to comment #1)
> There is an oddity that I don't understand with this code (when it does work).
> It looks like if a probe containing a statement_num does have dwarf info then it
> gets registered through query_func_info (i->entrypc, *i, q); which calls
> query_statement() as follows:

I think the problem is in the conditional before that, which says:

  if (q->has_statement_str)
    query_statement (i->name, i->decl_file,
                     lineno, // NB: not q->line !
                     &(i->die), addr, q);
  else
    query_func_info (i->entrypc, *i, q);

We really want the first case for has_statement_num too.  The else case is for
function probes, where we want to clamp the address up to the function entry.
Comment 4 Mark Wielaard 2009-07-29 11:21:57 UTC
(In reply to comment #3)
> (In reply to comment #1)
> > There is an oddity that I don't understand with this code (when it does work).
> > It looks like if a probe containing a statement_num does have dwarf info then it
> > gets registered through query_func_info (i->entrypc, *i, q); which calls
> > query_statement() as follows:
> 
> I think the problem is in the conditional before that, which says:
> 
>   if (q->has_statement_str)
>     query_statement (i->name, i->decl_file,
>                      lineno, // NB: not q->line !
>                      &(i->die), addr, q);
>   else
>     query_func_info (i->entrypc, *i, q);
> 
> We really want the first case for has_statement_num too.  The else case is for
> function probes, where we want to clamp the address up to the function entry.

I wrote a testcase for this and it seems we do get this right (a statement probe
with address on a process/function with debuginfo gets set exactly on the given
statement). So, I must be reading the code wrong.

I'll attach the testcase. Not checked in yet, since it depends on resolving bug
#10458 first, so we get predictable uaddr() results.
Comment 5 Mark Wielaard 2009-07-29 11:25:40 UTC
Created attachment 4096 [details]
uprobe_stmt_num.exp

This testcase puts probes on all lines in a function, collects the addresses
and then creates a script using statement probes for all these addresses. The
runs of both these scripts should be the same.

The idea is to extend this testcase to do the same after we stripped away the
debuginfo (and use -g) to simulate the original issue.

Note that this currently fails because of bug #10458 since the result of
uaddr() is one off. If you work around that by always substracting one for
uaddr() the test passes.
Comment 6 Mark Wielaard 2009-07-29 11:26:10 UTC
Created attachment 4097 [details]
uprobe_stmt_num.c
Comment 7 Mark Wielaard 2009-07-29 11:26:35 UTC
Created attachment 4098 [details]
uprobe_stmt_num.stp
Comment 8 Mark Wielaard 2009-07-29 12:11:13 UTC
Created attachment 4099 [details]
uprobes_stmt_num.exp

This is the full test. Still depends on a resolution of bug #10458.

The testcase that only strips away the lineinfo would pass if only the WARNING
messages about the lineinfo missing could be suppressed:
WARNING: address 0x4004d1 does not match the beginning of a statement (no line
info found for
'/home/mark/src/systemtap/testsuite/systemtap.context/uprobe_stmt_num.c', in
module '/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num')

The last test fails because of two issues. The WARNING message cannot be
suppressed. And now the statement probes aren't set at all.
WARNING: cannot find module
/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num debuginfo: No DWARF
information found
semantic error: no match while resolving probe point
process("uprobe_stmt_num").statement(4195600)
semantic error: no probes found
Comment 9 Mark Wielaard 2009-07-29 14:48:08 UTC
(In reply to comment #8)

> The testcase that only strips away the lineinfo would pass if only the WARNING
> messages about the lineinfo missing could be suppressed:
> WARNING: address 0x4004d1 does not match the beginning of a statement (no line
> info found for
> '/home/mark/src/systemtap/testsuite/systemtap.context/uprobe_stmt_num.c', in
> module '/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num')

This is now bug #10459
Comment 10 Mark Wielaard 2009-07-31 18:21:33 UTC
Testcase added:

commit 793e611dc277b6943100a5f81274961d526f9b02
Author: Mark Wielaard <mjw@redhat.com>
Date:   Fri Jul 31 19:54:53 2009 +0200

    Add testcase for PR10458, PR10459 and PR10454.
    
    Last test currently disabled because PR10454 is still open.
    
    * testsuite/systemtap.context/uprobe_stmt_num.exp: New file.
    * testsuite/systemtap.context/uprobe_stmt_num.stp: Likewise.
    * testsuite/systemtap.context/uprobe_stmt_num.c: Likewise.

The last test is:

# XXX Last test still fails. PR10454.
return

# Now strip away all debuginfo. Since the script doesn't need any it
# should still work.
set strip_cmd [list "strip" "-g"]
lappend strip_cmd "$testexe"
send_log "Executing: $strip_cmd\n"
eval exec $strip_cmd

stap_run3 $test-run-statements-nodebuginfo -w -g $test-run-statements.stp -c
$testexe

Running this test by hand, removing the -w and adding -vv shows:

stap -vv -g uprobe_stmt_num-run-statements.stp -c
/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_numSystemTap
translator/driver (version 0.9.8/0.141 commit release-0.9.8-218-gf1312b2 + changes)
Copyright (C) 2005-2009 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
Session arch: x86_64 release: 2.6.29.6-213.fc11.x86_64
Created temporary directory "/tmp/stapsGOXP5"
Searched '/usr/local/systemtap/share/systemtap/tapset/x86_64/*.stp', found 3
Searched '/usr/local/systemtap/share/systemtap/tapset/*.stp', found 56
Pass 1: parsed user script and 59 library script(s) in 90usr/0sys/98real ms.
WARNING: cannot find module
/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num debuginfo: No DWARF
information found
semantic error: no match while resolving probe point
process("uprobe_stmt_num").statement(4195537)
WARNING: cannot find module
/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num debuginfo: No DWARF
information found
semantic error: no match while resolving probe point
process("uprobe_stmt_num").statement(4195548)
WARNING: cannot find module
/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num debuginfo: No DWARF
information found
semantic error: no match while resolving probe point
process("uprobe_stmt_num").statement(4195555)
WARNING: cannot find module
/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num debuginfo: No DWARF
information found
semantic error: no match while resolving probe point
process("uprobe_stmt_num").statement(4195570)
WARNING: cannot find module
/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num debuginfo: No DWARF
information found
semantic error: no match while resolving probe point
process("uprobe_stmt_num").statement(4195588)
WARNING: cannot find module
/usr/local/build/systemtap-obj/testsuite/uprobe_stmt_num debuginfo: No DWARF
information found
semantic error: no match while resolving probe point
process("uprobe_stmt_num").statement(4195600)
semantic error: no probes found
Pass 2: analyzed script: 0 probe(s), 0 function(s), 0 embed(s), 0 global(s) in
10usr/70sys/85real ms.
Pass 2: analysis failed.  Try again with another '--vp 01' option.
Running rm -rf /tmp/stapsGOXP5

So, when no dwarf info is found the statement probe will just fail even though
no dwarfinfo is needed by the probe body. All probes look like:
probe process("uprobe_stmt_num").statement(0x4004d1)
{
  printf("0x%x\n", uaddr());
}
Comment 11 Frank Ch. Eigler 2009-09-08 21:00:41 UTC
The consensus appears to be that we should permit this situation, by
one or both of these methods: (transcribing from irc):

(a)
fche does query_cu_containing_address(addr) return 0  (without throwing an
exception) here?
fche if so, we could change query_cu to accept a null cudie
fche and have it perform the #if-0 PR5787 #endif block -- conditional on guru_mode

(b)
implementing the documented but nonexistent process(*).statement(NUM).absolute
probe point
Comment 12 Josh Stone 2009-09-08 21:15:14 UTC
(In reply to comment #11)
> (b)
> implementing the documented but nonexistent process(*).statement(NUM).absolute
> probe point

I think this is the better choice.  It does exist already for process(PID), as
documented, just not for process(NAME).  Because of relocations, allowing NAME
has a high potential to do the wrong thing, especially if NAME is a system library.

Roland suggested that the address might be interpreted as relative to the
relocated module base.  I think that's would be more useful, though perhaps that
semantic should be under ".relative".
Comment 13 Frank Ch. Eigler 2015-06-19 16:50:09 UTC
no recent need; virtually-addressed uprobes not as easily available in inode-uprobes land anyway