[+rfc] Re: [patch v6 00/21] record-btrace: reverse

Thu Nov 28 10:54:00 GMT 2013

> -----Original Message-----
> From: Jan Kratochvil [mailto:jan.kratochvil@redhat.com]
> Sent: Wednesday, November 27, 2013 7:57 PM
> To: Metzger, Markus T
> Cc: gdb-patches@sourceware.org
> Subject: Re: [+rfc] Re: [patch v6 00/21] record-btrace: reverse
> 
> On Thu, 07 Nov 2013 16:41:40 +0100, Metzger, Markus T wrote:
> > I hacked a first prototype of this (see below).  It passes most tests but
> > results in three fails in the record_goto suite.
> >
> > One thing that it shows, though, is that it only removes the 'mostly
> harmless'
> > hack in the various goto functions shown above.
> >
> > The more serious hacks in record_btrace_start_replaying
> >
> > 	  /* Make sure we're not using any stale registers.  */
> > 	  registers_changed_ptid (tp->ptid);
> >
> > 	  /* We just started replaying.  The frame id cached for stepping is
> based
> > 	     on unwinding, not on branch tracing.  Recompute it.  */
> > 	  frame = get_current_frame_nocheck ();
> > 	  insn = btrace_insn_get (replay);
> > 	  sal = find_pc_line (insn->pc, 0);
> > 	  set_step_info (frame, sal);
> >
> > and record_btrace_stop_replaying
> >
> > 	  /* Make sure we're not leaving any stale registers.  */
> > 	  registers_changed_ptid (tp->ptid);
> >
> > however, are not removed by this.
> 
> In such case it is not finished.  These hacks should not be needed.

See below.

> > They are required when reverse-stepping the first time or when
> > stepping past the end of the execution trace.
> 
> I have patched what you describe as the problem.  But as I do not have a box
> with reliably working BTS so it is difficult for me to say whether it works or
> not.  I can look at other problems if you describe them from a reliable box.

Those hacks are not related to "record goto" and are thus also not affected
by the patch to implement "record goto" via wait/resume.

Let me try to describe the problem.  It is also exposed by the next.exp test.

Assume we enable btrace and next over a function call.  We will end up
right after the call instruction.

(gdb) record btrace 
(gdb) n
50        return 0;     /* main.3 */
(gdb) record instruction-history -
31         0x0000000000400590 <fun1+0>: push   %rbp
32         0x0000000000400591 <fun1+1>: mov    %rsp,%rbp
33         0x0000000000400594 <fun1+4>: leaveq 
34         0x0000000000400595 <fun1+5>: retq   
35         0x000000000040059f <fun2+9>: leaveq 
36         0x00000000004005a0 <fun2+10>:        retq   
37         0x00000000004005af <fun3+14>:        leaveq 
38         0x00000000004005b0 <fun3+15>:        retq   
39         0x00000000004005c4 <fun4+19>:        leaveq 
40         0x00000000004005c5 <fun4+20>:        retq   
(gdb) disas
Dump of assembler code for function main:
   0x00000000004005c6 <+0>:     push   %rbp
   0x00000000004005c7 <+1>:     mov    %rsp,%rbp
   0x00000000004005ca <+4>:     callq  0x4005b1 <fun4>
=> 0x00000000004005cf <+9>:     mov    $0x0,%eax
   0x00000000004005d4 <+14>:    leaveq 
   0x00000000004005d5 <+15>:    retq   
End of assembler dump.
(gdb)

If we now do a reverse-next, we end up inside the function
we were supposed to step over.

(gdb) reverse-next
fun4 () at record_goto.c:44
44      }               /* fun4.5 */
(gdb) record instruction-history -
30         0x000000000040059a <fun2+4>: callq  0x400590 <fun1>
31         0x0000000000400590 <fun1+0>: push   %rbp
32         0x0000000000400591 <fun1+1>: mov    %rsp,%rbp
33         0x0000000000400594 <fun1+4>: leaveq 
34         0x0000000000400595 <fun1+5>: retq   
35         0x000000000040059f <fun2+9>: leaveq 
36         0x00000000004005a0 <fun2+10>:        retq   
37         0x00000000004005af <fun3+14>:        leaveq 
38         0x00000000004005b0 <fun3+15>:        retq   
39      => 0x00000000004005c4 <fun4+19>:        leaveq 
(gdb) disas
Dump of assembler code for function fun4:
   0x00000000004005b1 <+0>:     push   %rbp
   0x00000000004005b2 <+1>:     mov    %rsp,%rbp
   0x00000000004005b5 <+4>:     callq  0x400590 <fun1>
   0x00000000004005ba <+9>:     callq  0x400596 <fun2>
   0x00000000004005bf <+14>:    callq  0x4005a1 <fun3>
=> 0x00000000004005c4 <+19>:    leaveq 
   0x00000000004005c5 <+20>:    retq   
End of assembler dump.
(gdb)

The reason is the way how GDB implements next/reverse-next.

We store the frame_id of the current frame and do a single-step.  
Then we try to detect stepping into a subroutine by unwinding
the stack frames and comparing the frame_id's with our stored
frame_id.

The stored frame_id has been computed using dwarf2 frame
unwind.
After single-stepping, we're replaying the recorded execution.
The frame_id's are now computed using btrace frame unwind.

Our parent's frame_id does not compare equal to the stored
frame_id.  We fail to detect that we just reverse-stepped into
a subroutine.

The s/w record implementation does not suffer from this problem
because it traces data and is hence able to use the dwarf2 frame
unwinder also when replacing.

The way I tried to overcome this is to recompute all frame_id's
when we start replaying.  This will cause us to store a btrace
frame_id in the stepping algorithm.  Now we are able to detect
that we reverse-stepped into a subroutine.

Do you have a better idea?

Regards,
Markus.

Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052