[PATCH 1/3 v2] Fast tracepoint for powerpc64le

Wei-cheng Wang cole945@gmail.com
Sun Mar 29 19:27:00 GMT 2015


Hi Ulrich and Pedro,

Thank you for reviewing my patch, the suggestions are really helpful.

Sorry for a very late reply. I wasn't free until this weekend.
I've almost finished the new patches as you suggested, and I probably will propose
the new patch set by tomorrow.

Just a few comments for now.

 >> +static void
 >> +ppc64_emit_reg (int reg)
 >> +{
 >> +  unsigned char buf[10 * 4];
 >> +  unsigned char *p = buf;
 >> +
 >> +  p += GEN_LD (p, 3, 31, bc_framesz - 32);
 >> +  p += GEN_LD (p, 3, 3, 48);	/* offsetof (fast_tracepoint_ctx, regs) */
 >
 > This seems a bit fragile, it would be better to determine the offset
 > automatically ...   (I don't quite understand why the x86 code works
 > either, as it is right now ...)

Hi Predro,

I checked the implementation of x86 emit_reg and it seems the implementation
is wrong.  It assumes the first argument is ctx->regs, but it's actually 'ctx'

if (tpoint->compiled_cond)
   err = ((condfn) (uintptr_t) (tpoint->compiled_cond)) (ctx, &value);

I think probably either we could pass ctx->regs to compiled_cond instead,
or move the declarations of fast_tracepoint_ctx (and others) to tracepoint.h,
so we can use "offsetof (fast_tracepoint_ctx, regs)" instead.
Any suggestion?



The following are specific to PowerPC.

 >> +  if ((imm >> 8) == 0)
 >> +    {
 >> +      /* li	reg, imm[7:0] */
 >> +      p += GEN_LI (p, reg, imm);
 > Actually, you can load values up to 32767 with a single LI.

How about this fix?  So we can load a small negative number with LI.

if ((imm + 32768) < 65536)
   {
     /* li     reg, imm[7:0] */
     p += GEN_LI (p, reg, imm);
   }

 >> p += gen_atomic_xchg (p, lockaddr, 0, 1);
 >> /* Call to collector.  */
 >> p += gen_call (p, collector);
 >> p += gen_atomic_xchg (p, lockaddr, 1, 0);
 > This seems wrong.  Shouldn't *lockaddr be set to the address
 > of a collecting_t object, and not just "1"?

AFAIK, lockaddr only matters to the two lines above,
so simply put '1' for LOCKED should be fine.  Or am I missing anything?

 >> +  /* Now, insert the original instruction to execute in the jump pad.  */
 >> +  *adjusted_insn_addr = buildaddr + (p - buf);
 >> +  *adjusted_insn_addr_end = *adjusted_insn_addr;
 >> +  relocate_instruction (adjusted_insn_addr_end, tpaddr);
 > Hmm.  This calls back to GDB to perform the relocation of the
 > original instruction.  On PowerPC, there are only a handful of
 > instructions that need to be relocated; I'm not sure it is really
 > necessary to call back to GDB.  Can't those just be relocated
 > directly here?   This might even make the code simpler overall.

I just follow the design of Predo.
I could move the code to gdbserver side if you suggest.

 > Note that the stack frame layout as above only applies to ELFv1, but
 > you're actually only supporting ELFv2 at the moment.  For ELFv2, there
 > is no parameter save area (for this specific call), there is no compiler
 > or linker doubleword, and the TOC save area is at SP + 24.  (So this
 > location probably shouldn't be used to save something else ...)
 > This is also a bit bigger than required for ELFv2.  On the other hand,
 > having a larger buffer doesn't hurt.

Oops, I have to admit I looked into the wrong ABI document.
Hopefully we will support ppc64be soon, so I suggest still use 112-byte for
minimual frame size for simplicity?

 >> +/* Return true if ADDR is a valid address for tracepoint.  Set *ISZIE
 >> +   to the number of bytes the target should copy elsewhere for the
 >> +   tracepoint.  */
 >> +
 >> +static int
 >> +ppc_fast_tracepoint_valid_at (struct gdbarch *gdbarch,
 >> +			      CORE_ADDR addr, int *isize, char **msg)
 >> +{
 >> +  if (isize)
 >> +    *isize = gdbarch_max_insn_length (gdbarch);
 >> +  if (msg)
 >> +    *msg = NULL;
 >> +  return 1;
 >> +}
 >
 > Should/can we check here where the jump to the jump pad will be in
 > range?  Might be better to detect this early ...

Client has no idea about where the jump pad will be installed.
If it's out of range, gdbserver will report it right after user
entered 'tstart' command

 > Well, we really should handle the other cases too; there's no reason
 > to simply fail if this happens to be a branch on count or such ...

In the new patch, I will handle other cases as such,

   bdnz   .Lgoto
1:INSN1

is transform to

   bdz
   b      .Lgoto
1:INSN1

   and

   bdnzt  eq, .Lgoto

is transfrom to

   bdz    1f (+12)
   bf     eq, 1f (+8)
   b      .Lgoto
1:INSN1

Is it right?

 >> +      frame.frameless = 0;
 >> +      frame.lr_offset = 16;
 >
 > This isn't correct for ppc32 ...

frame.lr_offset = 4;
Right?

 > In any case, this code makes many assumptions that may not always be
 > true.  But then again, the same is true for the i386 case, so maybe this
 > is OK for now ...   In general, if we have DWARF CFI for the function,
 > it would be much preferable to refer to that in order to determine the
 > exact stack layout.

This function will only be used if user want to collect return address
(using 'collect $_ret' action command)

I agree that we should use DWARF CFI, but I have no idea how can I
get the information in this function.  Any suggestion where I can look into?

Thanks,
Wei-cheng



More information about the Gdb-patches mailing list