This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH 1/2] Fast tracepoint for powerpc64le

From: "Ulrich Weigand" <uweigand at de dot ibm dot com>
To: cole945 at gmail dot com (Wei-cheng Wang), palves at redhat dot com
Cc: gdb-patches at sourceware dot org
Date: Fri, 27 Feb 2015 20:52:27 +0100 (CET)
Subject: Re: [PATCH 1/2] Fast tracepoint for powerpc64le
Authentication-results: sourceware.org; auth=none
Wei-cheng Wang wrote:

> These patches implement fast tracepoint for PowerPC64.
> 
> The first part includes required porting for PowerPC64 (and 32-bit) target.
> Including
> * Install fast tracepoint jump pad
> * Agent expression bytecode compilation for powerpc64 only.
>    For 32-bit, bytecode interpreter is used instead.
> * IPA (libinproctrace.so)
> * Implement required gdbarch hooks.
> * Enable tracepoint testing for powerpc.

Excellent!  Thanks for working on this.

I'm still looking at the actual patches, but let me reply right now to
the extra questions you raise.

Pedro, I'd also appreciate your comments on some of the gdbserver
tracepoint.c issues ...


> * collection.exp fails are DWARF issues.  x86 failed too.
>    (https://sourceware.org/bugzilla/show_bug.cgi?id=15081)
> * ftrace.exp: x86 has the same issue (KFAIL in x86)
>    (https://sourceware.org/bugzilla/show_bug.cgi?id=13808)
> * no-attach-trace.exp: x86 has the same issue.
>    Tracepoint is not supported when target is `exec'.
>    I think this should be XFAIL?
> * unavailable.exp: x86 has the same issue.

For the time being, I think it's fine to fail on Power if we fail on
x86 too.  Maybe should mark the test KFAIL in that case, however ...

> * tspeed.exp:  This case is used to test whether fast tracepoints
>    are *faster* than regular tracepoints.  The case itself uses
>    sys+user time to find a proper iteration count for measurement.
>    (quote: "Total test time should be between 2 and 5 seconds.")
>    However, in my environment, 2 seconds of sys+user time means
>    2 minutes wall clock, so this case failed due to timeout.

The tspeed.exp file already has:

# Typically we need a little extra time for this test.
set timeout 180

Is that still not enough?

> * entry-values fails: The casea try to backtrace at
>    an inline-asm-inserted symbol without debug information.
>    The prologue analyzer is confused.

OK.  Maybe this can be fixed by enhancing the analyzer (but that
can be done in a separate patch).

> * tfind.exp: One of the tracepoint is inserted at
>    `*gdb_recursion_test'.  It's not hit because local-entry is called
>    instead.  The 18 FAILs are off-by-one error.

That's really a testcase issue, we had similar problems with setting
breakpoints on "*func" on powerpc64le.  This patch contains examples
how I handled it for breakpoints:
https://sourceware.org/ml/gdb-patches/2014-01/msg01102.html

This test case seem a bit more complicated, we may need to split it
in two parts; one that uses a normal "trace gdb_recursion_test"
without the "*", and possibly a second one that specifically tests
that "trace *func" works, using a source file that makes sure to
call func via a function pointers (as in step-bt.c).


> The main reason why PowerPC64 big-endian doesn't work is
> calling convention (function descriptors) issue.
>    When installing a tracepoint in inferior memory, gdbserver
> asks the address of "gdb_collect" (and etc.) using qSymbol packet,
> and it generate a sequence of instructions to calling that address.
>    However, gdb-client "return the start of code instead of
> any data function descriptor."
>    See commenting in remote_check_symbols/remote.c,
> https://sourceware.org/ml/gdb-patches/2007-06/msg00389.html
> and gen_call() in this patch.
>    In order for powerpc64be to work, qSymbol packet should be
> extend for function descriptors.

This is annoying.  This was done to support libthread_db in gdbserver.
Unfortunately, there are a number of components involved here:
- To support debugging multi-threaded inferiors, gdbserver links
  against libthread_db (provided by glibc).
- At startup, gdbserver's thread_db_enable_reporting routine
  calls into libthread_db's td_ta_event_addr.
- td_ta_event_addr calls back into gdbserver's ps_pglobal_lookup
  to retrieve the address of __nptl_create_event in the inferior
- In order to implement ps_pglobal_lookup, gdbserver issues a
  qSymbol packet back to GDB.
- GDB looks the symbol up in the symbol table, and sends a result
  packet to gdbserver.
- ps_pglobal_lookup returns that address to td_ta_event_addr.
- td_ta_event_addr returns that address to thread_db_enable_reporting.
- thread_db_enable_reporting uses set_breakpoint_at to install a
  breakpoint at that address (i.e. the __nptl_create_event routine
  in the inferior).

Now, the equivalent action happens in GDB itself when debugging
natively.  Here, the equivalent enable_thread_event_reporting routine
calls td_ta_event_addr, which calls ps_pglobal_lookup, which does a
regular symbol lookup, and returns the address back.  Now it is
enable_thread_event_reporting itself that translates this address
into the function code address required to set a breakpoint at,
in helper routine enable_thread_event:

  /* Set up the breakpoint.  */
  gdb_assert (exec_bfd);
  (*bp) = (gdbarch_convert_from_func_ptr_addr
           (target_gdbarch (),
            /* Do proper sign extension for the target.  */
            (bfd_get_sign_extend_vma (exec_bfd) > 0
             ? (CORE_ADDR) (intptr_t) notify.u.bptaddr
             : (CORE_ADDR) (uintptr_t) notify.u.bptaddr),
            &current_target));
  create_thread_event_breakpoint (target_gdbarch (), *bp);

With gdbserver, however, remote.c already replies with the code
address to the qSymbol command, and gdbserver passes the code
address through td_ta_event_addr back to itself.

It's really not correct to have ps_pglobal_lookup return different
addresses (code vs. descriptor), depending on whether GDB or
gdbserver calls it.  It happens to work because libthread_db doesn't
really look at the address except for passing it through, but it
still all seems quite weird.


So I guess there's two ways to fix this.   One would be to change
gdbserver to work more like GDB here.  This would involve removing
the descriptor->code address conversion in remote.c, and instead
performing the conversion in gdbserver's thread_db_enable_reporting.
Now, there is no gdbarch_convert_from_func_ptr_addr in gdbserver,
so a similar mechanism would have to be invented there.  (I guess
this would mean a new target hook.)  Fortunately, the only platform
that uses function descriptors *and* supports libthread_db debugging
in gdbserver is ppc64-linux, so we'd only have to add that new
mechanim on this platform.

This has the advantage that qSymbol could now be used to lookup
function symbols and get the descriptor address as expected.
On the other hand, this would mean an incompatible change in the
remote protocol: if you used a new GDB together with an old
gdbserver (or vice versa), thread debugging would stop working.
However, I guess that could be fixed by having gdbserver request
the new behavior from GDB by specifying a feature code.  With old
GDBs gdbserver would have to skip the descriptor->code conversion.


The second alternative would be to extend qSymbol to support
returning two different types of addresses for function symbols:
the symbol value (i.e. function pointer value, i.e. descriptor
on PPC64), and a code address suitable to set a breakpoint on
function entry.  This could be either by having gdbserver
request one or the other via an additional flag on the qSymbol
request, or else by GDB simply always returning both values
in two fields.  Again, this would be an incompatible protocol
change that would need to be guarded by a qFeature check.

In this case, gdbserver would use the "normal" symbol values
for most purposes (e.g. tracepoint routine lookup), but would
use the "code address" values to return from ps_pglobal_lookup.
With old GDBs, gdbserver would disable fast tracepoint support
on powerpc64.

If we do that, then for consistency it might also be useful
to pass code addresses to ps_pglobal_lookup in GDB itself.
In addition, the "code address" could be changed to skip
the local entry point prolog on powerpc64le to ensure that
the breakpoint is set correctly.  (This does not matter in
practice since __nptl_create_event has no local entry point,
but it would seem more fully correct in general.)


Overall, the second alternative seems a bit better to me,
but I'd certainly appreciate feedback on this ...


> For powerpc32 to work, some data structure/function in tracepoint.c
> need to be fixed.  For example,
> 
> * write_inferior_data_ptr should be fixed for big-endian.
>    If sizeof (CORE_ADDR) is larger than sizeof (void*), zeros are written.
>    BTW, I thnink write_inferior_data_pointer provides the same functionality
>    without this issue.  I'm not sure why write_inferior_data_ptr is needed?

This is odd, I don't see the point of this either.   Of course, as the
comment says, much of this stuff will break anyway if gdbserver is
compiled differently than the inferior (e.g. a 64-bit gdbserver
debugging a 32-bit inferior), because it assumes the structure layout
is identical.  However, if we do have a 32-bit gdbserver, then I don't
see why it shouldn't be possible to debug a 32-bit inferior, just
because CORE_ADDR is a 64-bit type ...

> * Data structure layout between gdbserver and IPA is not consistent.
> 
>    There are two versions of tracepoint_action one for gdbserver,
>    and antoher for inferior (IPA side).
> 
>    -    struct tracepoint_action
>    |    {
>    |    #ifndef IN_PROCESS_AGENT
>    |      const struct tracepoint_action_ops *ops;
>    | -  #endif
>    | |    char type;
>    - -  };
> 
>    It is the base object for action objects.
> 
>    struct collect_memory_action
>    {
>      struct tracepoint_action base;  <--
>      {
>        const struct tracepoint_action_ops *ops;
>    -   char type;
>    | }
>    |
>    | ULONGEST addr;
>    | ULONGEST len;
>    - int32_t basereg;
>    };
> 
>    When gdbserver downloading the action object to inferior,
>    it copies the object from offsetof(type) to the end.
>    (See m_tracepoint_action_download/tracepoint.c for example)
>    Howevery, the object layouts may not be consistent between
>    the two versions (with or without ops fields.)
>    It depends the the alignment requirement of addr (first data member
>    after base object), and the padding of tracepoint_action.
> 
>    In this case, the distance from "type" to "addr" changes
> 
>       Wihtout ops           with ops
>       0   1   2   3         0   1   2   3
>     0 type| PADDING...    0 ops-------------|
>     4 ................    4 type|PADDING....|
>     8 addr------------    8 addr-------------
>     c ---------------|    c ----------------|
>    10 len-------------   10 len--------------
>    14 ---------------|   14 ----------------|
>    18 basereg--------|   18 basereg---------|

Ugh.  That's a strange construct, and extremely dependent on alignment
rules (as you noticed).  I'm not really sure what the best way to fix
this would be.  My preference right now would be get rid of "ops" on
the gdbserver side too, and just switch on "type" in the two places
where the ops->send and ops->download routines are called right now.

This makes the data structures the same on gdbserver and IPA, which
simplifies downloading quite a bit.  (Also, it keeps the data structure
identical in IPA, which should avoid compatibility issues between
versions.)

Bye,
Ulrich


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com
References:
- [PATCH 1/2] Fast tracepoint for powerpc64le
  - From: Wei-cheng Wang
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]