3672 – Support formatted dump of struct $pointers

Bug 3672 - Support formatted dump of struct $pointers

Summary: Support formatted dump of struct $pointers

Status:	RESOLVED FIXED

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	translator (show other bugs)
Version:	unspecified

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Josh Stone

URL:
Keywords:

Duplicates (2):	5954 6837 (view as bug list)
Depends on:
Blocks:

Reported:	2006-12-07 01:47 UTC by Mike Mason
Modified:	2010-10-25 18:17 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
devel snapshot (7.81 KB, patch) 2010-04-14 22:09 UTC, Josh Stone	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Mason 2006-12-07 01:47:05 UTC

Add a function or extend printf to dump a kernel structure in a human readable
format, similar to what gdb can do.  Does dwarf give us enough info to do this
without requiring the script to specify the structure type?

For example, something like this:

print_struct($sock)

where $sock is a pointer to a struct socket could output this:

{
  state = 1,
  flags = 6,
  ops = 0x83337444,
  fasync_list = 0x84435355,
  file = 0x54338234,
  sk = 0x73322556,
  wait = {
    lock = 1,
    task_list = {
      next = 0x822334455,
      prev = 0x855443322
    },
  },
  type = 0
}

Comment 1 Frank Ch. Eigler 2008-03-18 01:39:23 UTC

*** Bug 5954 has been marked as a duplicate of this bug. ***

Comment 2 Frank Ch. Eigler 2008-08-15 22:28:32 UTC

Another possible syntax for this is inspired by the $$vars introduced
recently.  Expanding struct contents could be represented like so:

    $var$  =>  a string representation of $var's fields: like
               0xfoo
          or   {.foo=0xbeef, .zoo=0xp00}

To control the depth of nesting expansion, we could add extra "$"s at the end:

    $var$$ =>  {.foo=0xbeef, .bar={.so=0x44, .po=0x848}}

This could compose with the $$ variables too:

  $$vars$  =>  var1=0xdead var2={.foo=0xbeef, .zoo=0xp00}

Comment 3 Masami Hiramatsu 2008-08-15 23:35:24 UTC

(In reply to comment #2)
> Another possible syntax for this is inspired by the $$vars introduced
> recently.  Expanding struct contents could be represented like so:
> 

How about $$$var syntax?
It also could compose $$vars as $$$vars, or $$$parms.
and depth also be increased by adding $. ($$$$var)

>     $var$  =>  a string representation of $var's fields: like
>                0xfoo
>           or   {.foo=0xbeef, .zoo=0xp00}

I like latter format :-)

Thanks,

Comment 4 Josh Stone 2009-11-12 00:26:56 UTC

(I'm dumping some thoughts as I look at implementing this...)

I feel like we should have some token separation instead of a single token
$foo$.  My first idea is to list it as a trailing dereference, like so:

        $foo->$
        $foo->$$
        $$parms->$
        $foo->bar[i]->$
        @cast(foo, "foo_t")->$

Treating it as a new dereferencing component makes it clearer to me that it's
digging into the structure, and also makes it clearer IMO to connect to arrays
and @casts.  Otherwise we have to do a token peek on things like $foo[i]$ or
@cast(foo, "foo_t")$ to see that they're followed by a dollar sign.  Maybe
that's not so bad though...

Should we traverse pointers as well?  Maybe that is what is meant by "depth of
nesting expansion".  So, $ would print the entire struct, including any nested
structs.  Then $$ would expand a single level of pointers beyond that, and so on.

Unions are another open question, especially if pointers are involved.
I could see us quickly getting bad derefs by walking down a wrong union
branch.  My first inclination is to skip over unions and print them as a "{...}"
black box.  Or, maybe we can print them, but then skip them in pointer traversal
as a special case.

Comment 5 Albert Strasheim 2010-03-26 21:45:57 UTC

Recursive structs (linked lists and the like) could also be quite tricky.

Comment 6 Josh Stone 2010-04-14 22:09:15 UTC

Created attachment 4729 [details]
devel snapshot

I haven't worked on this in a bit, but here's the snapshot of where I stopped. 
The function dwarf_pretty_print_target_symbol lacks any working implementation,
but this patch at least has comments there of what I was trying to do.	I
/think/ that function can glue the rest together fairly simply, but that's
where I stalled out...

Comment 7 Frank Ch. Eigler 2010-05-06 23:51:13 UTC

*** Bug 6837 has been marked as a duplicate of this bug. ***

Comment 8 Josh Stone 2010-05-27 23:28:52 UTC

commit 5f36109ef05d8399e6369c0487a0a17d40ad3267
Author: Josh Stone <jistone@redhat.com>
Date:   Thu May 27 15:54:01 2010 -0700

    PR3672: Add pretty-printing for compound types
    
    This adds a new syntax for pretty-printing variables as strings:
    
      $var$         $var$$         $var->$            $var->$$
      $@cast(...)$  $@cast(...)$$  $@cast(...)->$     $@cast(...)->$$
      $var->foo->$  $var[1]->$     $@cast(...)->foo$  $@cast(...)[2]$
    
    This is still a work in progress, but I deemed it now useful enough to
    share.  See PR3672 for discussion of work remaining.


I'll follow up shortly with status & remaining issues...

Comment 9 Josh Stone 2010-05-28 00:05:07 UTC

I think it's basically in good shape, but here are the things that I know are
lacking:

• Determine the size of arrays.  I don't know how to read that from DWARF, so
for now we just print the first element and "..." the rest.  Even when we do
know the full size, we will probably only want the first few anyway.

• Truncate huge types with "...".  Right now I've tested that structs like the
kernel's task_struct and stap's systemtap_session both generate
reasonable-looking code for pass-2, but both are way too big to fit in a normal
string.  Pass-3 will actually reject these for having too many parameters for
the stack anyway.

• Using base10 or base16 -- I've chosen to represent everything with %c, %i, %u,
or %p, but in other parts of our code we tend to use just %#x.  I think that
decimal is generally more human-friendly for numbers that aren't pointers,
although flag variables are nice in hex.  There's no DW_ATE_flag though...

• Hide the dirty laundry of inheritance -- C++ types with virtual functions get
members like "_vptr.foo" to resolve the functions, but that's not really useful
for users to see.  We could automatically skip members with that naming pattern.

• Test test test -- I haven't written any testcases yet; shame on me...


Enhancements TODO:

• Stringify char*/char[] with user_string or kernel_string, as merged here from
PR6837.

• Add magic for STL types.  For example, std::string currently looks like
"{._M_dataplus={._M_p=%p}}", but we could hide the layout and turn it into
user_string on that _M_p instead.


Future implementation cleanups:

• Refactor comp_pretty_print -- given that this "component" only ever makes
sense at the tail of the target_symbol, it may be overkill to be a component at
all.  It might be cleaner to just have a pretty_print_depth member instead.

• Symbol/function referent tracking - I had to directly assign the referents on
my generated code, because later parts of the translator never touched it.  I
think it's just missing the step that walks over new functions to resolve referents.

• Merge various function generators -- tapsets.cxx has a few places now that do
almost the same thing to create a new function for variable access.  It would be
nice to share more code among these.

Comment 10 Josh Stone 2010-05-28 02:05:33 UTC

Here's some clarification on what is currently implemented, as it was a little
confusing and controversial on IRC, and we may want to revise it.

$var$ and $var->$ are identical, and will print the entire flattened structure
and all substructures.  If $var happens to be a pointer to start with, that
pointer is dereferenced for free.  If any members happen to be pointers, they
will be printed as %p and not traversed.

Using $var$$ or $var->$$ digs deeper, which currently means it will expand one
level of pointers/references within the structure.  $var$$$ and $var->$$$ will
expand yet another level of pointers; continue ad nauseam.  A current deficiency
is that pointers are blindly attempted, even if NULL or otherwise bad, which
will error out the script -- we should probably add try-catch for this.


Some proposed modifications I got from hecklers:

- Forget the $var->$ syntax and just go with $var$.

- Or give them separate meaning, e.g. $var$ on a pointer will just print the
pointer value, $var->$ will dereference first.

- Change the "depth" to refer to substructures instead of pointers, and then
never follow pointers at all.  This might even be bimodal, so $ means no
substructures, $$ means fully deep into all substructures, and then don't bother
with $$$...

- If we keep the idea of controlled depth, then offer a more compressed form,
perhaps $var$10$ instead of $var$$$$$$$$$$$.  (The kookiness is apparently
contagious.)

Comment 11 Josh Stone 2010-05-28 23:58:33 UTC

(In reply to comment #9)
> Enhancements TODO:

Another idea is to print enums by name.

Comment 12 Josh Stone 2010-06-03 01:50:11 UTC

(In reply to comment #10)
> - Change the "depth" to refer to substructures instead of pointers, and then
> never follow pointers at all.  This might even be bimodal, so $ means no
> substructures, $$ means fully deep into all substructures, and then don't bother
> with $$$...

I've made this change in commit 7d11d8c9.

Comment 13 Josh Stone 2010-06-17 01:33:55 UTC

Basic docs and tests are added in commit 34af38d.  We can consider the other
discussed enhancements as incremental efforts in the future.