This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR threads/10729: x86 hw watchpoints and non-stop mode


On Sunday 11 December 2011 23:38:11, Jan Kratochvil wrote:
> Hi Pedro,
> 
> on simple non-hit (inferior does not touch "j" at all) watchpoint case:
> strace -o 2 -q ./gdb -nx ./36 -ex start -ex 'watch j' -ex stepi -ex 'set confirm no' -ex q
> grep 'PTRACE_....USER' 1 >1b; grep 'PTRACE_....USER' 2 >2b
> 
> It has performance regression of 15 ptrace syscalls -> 27 ptrace syscalls.

Hmm.  Before, a single step (si) (instead of all syscalls fro the
begining) does:

$ tail -f 1 | grep PTRACE_P...USER
ptrace(PTRACE_POKEUSER, 15618, offsetof(struct user, u_debugreg), 0x601028) = 0
ptrace(PTRACE_PEEKUSER, 15618, offsetof(struct user, u_debugreg) + 48, [0x4000]) = 0
ptrace(PTRACE_POKEUSER, 15618, offsetof(struct user, u_debugreg) + 48, 0x4000) = 0
ptrace(PTRACE_POKEUSER, 15618, offsetof(struct user, u_debugreg) + 56, 0xd0101) = 0
ptrace(PTRACE_PEEKUSER, 15618, offsetof(struct user, u_debugreg) + 48, [0x4000]) = 0
ptrace(PTRACE_POKEUSER, 15618, offsetof(struct user, u_debugreg), 0) = 0
ptrace(PTRACE_POKEUSER, 15618, offsetof(struct user, u_debugreg) + 56, 0xd0100) = 0

So, 7 related ptrace calls.

After the patch, just one single-step does:

$ tail -f 2 | grep PTRACE_P...USER

ptrace(PTRACE_POKEUSER, 16139, offsetof(struct user, u_debugreg), 0x601028) = 0
ptrace(PTRACE_POKEUSER, 16139, offsetof(struct user, u_debugreg) + 8, 0) = 0
ptrace(PTRACE_POKEUSER, 16139, offsetof(struct user, u_debugreg) + 16, 0) = 0
ptrace(PTRACE_POKEUSER, 16139, offsetof(struct user, u_debugreg) + 24, 0) = 0
ptrace(PTRACE_POKEUSER, 16139, offsetof(struct user, u_debugreg) + 56, 0xd0101) = 0
ptrace(PTRACE_PEEKUSER, 16139, offsetof(struct user, u_debugreg) + 48, [0x4000]) = 0
ptrace(PTRACE_PEEKUSER, 16139, offsetof(struct user, u_debugreg) + 56, [0xd0101]) = 0

Still 7 related ptrace syscalls.

The 15 -> 27 jump in all PTRACE_P...USER syscalls is because for all stops,
we're now reading both DR_STATUS and DR_CONTROL, while before
we were only reading DR_STATUS.  And there are stops in starting up an inferior
until it reaches main.  That looks easily fixed.

In addition, before, we'd only poke to DR[N] if the register was used,
now (i386|amd64)_linux_prepare_to_resume always writes all of DR0-3,
even when only one DR is used.  That's looks easily fixed too.

I'm trying out this patch:

 gdb/amd64-linux-nat.c |    3 ++-
 gdb/i386-nat.c        |   26 ++++++++++++++++++--------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/gdb/amd64-linux-nat.c b/gdb/amd64-linux-nat.c
index 9699f84..dc6a735 100644
--- a/gdb/amd64-linux-nat.c
+++ b/gdb/amd64-linux-nat.c
@@ -390,7 +390,8 @@ amd64_linux_prepare_to_resume (struct lwp_info *lwp)
       struct i386_debug_reg_state *state = i386_debug_reg_state ();
 
       for (i = DR_FIRSTADDR; i <= DR_LASTADDR; i++)
-	amd64_linux_dr_set (lwp->ptid, i, state->dr_mirror[i]);
+	if (state->dr_ref_count[i] > 0)
+	  amd64_linux_dr_set (lwp->ptid, i, state->dr_mirror[i]);
 
       amd64_linux_dr_set (lwp->ptid, DR_CONTROL, state->dr_control_mirror);
 
diff --git a/gdb/i386-nat.c b/gdb/i386-nat.c
index 94306a1..6d59b14 100644
--- a/gdb/i386-nat.c
+++ b/gdb/i386-nat.c
@@ -607,22 +607,32 @@ i386_stopped_data_address (struct target_ops *ops, CORE_ADDR *addr_p)
   int rc = 0;
   unsigned status;
   unsigned control;
+  unsigned control_p = 0;
 
   /* Get the current values the inferior has.  If the thread was
      running when we last changed watchpoints, the mirror no longer
      represents what was set in this thread's debug registers.  */
   status = i386_dr_low.get_status ();
-  control = i386_dr_low.get_control ();
 
   ALL_DEBUG_REGISTERS(i)
     {
-      if (I386_DR_WATCH_HIT (status, i)
-	  /* This second condition makes sure DRi is set up for a data
-	     watchpoint, not a hardware breakpoint.  The reason is
-	     that GDB doesn't call the target_stopped_data_address
-	     method except for data watchpoints.  In other words, I'm
-	     being paranoiac.  */
-	  && I386_DR_GET_RW_LEN (control, i) != 0)
+      if (!I386_DR_WATCH_HIT (status, i))
+	continue;
+
+      /* Fetching DR_CONTROL may require another syscall.  Avoid when
+	 possible.  */
+      if (!control_p)
+	{
+	  control = i386_dr_low.get_control ();
+	  control_p = 1;
+	}
+
+      /* This second condition makes sure DRi is set up for a data
+	 watchpoint, not a hardware breakpoint.  The reason is
+	 that GDB doesn't call the target_stopped_data_address
+	 method except for data watchpoints.  In other words, I'm
+	 being paranoiac.  */
+      if (I386_DR_GET_RW_LEN (control, i) != 0)
 	{
 	  addr = i386_dr_low.get_addr (i);
 	  rc = 1;


With those changes, we're actually even better than before.  Your
example when from 15 -> 12.  A single stepi that doesn't trigger a
watchpoint only does:

ptrace(PTRACE_POKEUSER, 23332, offsetof(struct user, u_debugreg), 0x601028) = 0
ptrace(PTRACE_POKEUSER, 23332, offsetof(struct user, u_debugreg) + 56, 0xd0101) = 0
ptrace(PTRACE_PEEKUSER, 23332, offsetof(struct user, u_debugreg) + 48, [0x4000]) = 0

That is, write DR0 and DR_CONTROL on resume, and, read DR_STATUS on stop.

> On Fri, 09 Dec 2011 17:30:20 +0100, Pedro Alves wrote:
> > @@ -513,22 +499,7 @@ i386_update_inferior_debug_regs (struct i386_debug_reg_state *new_state)
> >    ALL_DEBUG_REGISTERS (i)
> >      {
> >        if (I386_DR_VACANT (new_state, i) != I386_DR_VACANT (&dr_mirror, i))
> > -	{
> > -	  if (!I386_DR_VACANT (new_state, i))
> > -	    {
> > -	      i386_dr_low.set_addr (i, new_state->dr_mirror[i]);
> > -
> 
> 
> > -	      /* Only a sanity check for leftover bits (set possibly only
> > -		 by inferior).  */
> > -	      if (i386_dr_low.unset_status)
> > -		i386_dr_low.unset_status (I386_DR_WATCH_MASK (i));
> 
> Deleting this part is a regression.  Testcase for that part is attached.

Thanks a lot, I'll take a look.  I had assumed this bit:

  if (lwp->stopped_by_watchpoint)
    amd64_linux_dr_set (lwp->ptid, DR_STATUS, 0);

in amd64_linux_prepare_to_resume would fix the issue.

> 
> > +	  && I386_DR_GET_RW_LEN (control, i) != 0)
> >  	{
> > -	  addr = state->dr_mirror[i];
> > +	  addr = i386_dr_low.get_addr (i);
> 
> Why to do this change?  Why we can no longer trust DR_MIRROR?  This is
> a performance regression.

This is non-stop, so threads can be running while we change the
global state->dr_mirror (and friends).  Say, we set a watchpoint,
and let the threads rusume.  Now, say you delete the watchpoint, or
add/remove watchpoints such that state->dr_mirror[*] changes.  Inserting/deleting
watchpoines updates state->dr_mirror[*].  Now threads haven't been updated
with the mirror yet, and say a thread has meanwhile hit an old watchpoint,
but we haven't handled the SIGTRAP yet.  If we trusted state->dr_mirror[*],
we'd mistake the real trapped address to whatever was
currently state->dr_mirror[i].  So state->dr_mirror now represents
intention.  To get at the address that trapped, we need to read the
state the thread had when it trapped.  I'll add some comments to the code.

-- 
Pedro Alves


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]