[discuss] semantics, "replay debugging" vs. "reverse debugging"

Tue Oct 21 07:27:00 GMT 2008

> Just to make sure we're all on the same page,
> I'm gonna state what I believe is true, and invite
> discussion or contradiction.
> 
> Replay debugging --> ability to record an execution
> sequence and "play it back" (repeat it) with some
> degree of determinism.

I think there are two very important different types of recording in existence:

* Complete CPU state tracing, like you get with the trace boxes from GreenHills
and Lauterbach, for example.  They have a limited reach as they as basically
reusing a circular buffer, but the CPU execution can be replicated 100%.
However, you can only work within the length of the recording.

* Recording system execution at a higher level and then force the execution path
at important points, such as the Zealcore/Enea solutions.  Here, you force task
switches and asynch IO to occur at certain points, and hope that the system
underneath is deterministic enough that you see the same code execution. 

I am not familiar with gdb process recording, I would assume it goes into the
first category, but running on  ahost rather than using JTAG/etc.

> Reverse debugging --> ability to make the inferior
> process "back up" to a previous state, eg. reverse
> step and reverse continue-to-breakpoint.

Sounds good. 

> They're related but not identical.  One could theoretically
> have one without the other, although in practice all
> presently existing reverse-debug targets (that I know of)
> are implemented by using record and replay.

Well, depends on how you do the recording... what you need is some way to
reconstruct thee previous state in a consistent manner. This can be done in a
few ways:

* Complete recording, as above

* Completely deterministic target, where you can reset to start, and then just
reexecute until the point of interest.  Requires a controlled world that has no
asynch inputs.  For example, booting a machien inside a full-system simulator
works like this most of the time (if you script all serial inputs needed, and
provide simulation models for network services).  Simics does this with ease,
for example. It does require that the simulation system complete imposes
semantics on all timed events, which is the case in most cases. 

* Deterministic target with recording of asynch IO.  This is what Simics does
when a simulated set of machines is connected to a physical network connection
or subject to unscripted user input.  

> One could have reverse without record/replay if,
> for instance, one had a machine architecture where
> all instructions were reversable, ie. the machine
> itself could reverse-execute an instruction.

Not particularly likely, though? Most machine instructions destroy information,
like XOR R0,R0...

> And an example of a record/replay implementation
> without reverse debugging capability would be
> Michael Chastain's (circa 1999) implementation
> of Linux system-call based record and replay, which
> could deterministically replay a recorded program
> execution, but did not have reverse-step or
> reverse-continue-to-breakpoint.

Sounds like my recording type 2 above, actually. 

Best regards,

/jakob

_______________________________________________________

Jakob Engblom, PhD, Technical Marketing Manager

Virtutech                   Direct: +46 8 690 07 47    
Drottningholmsvägen 14      Mobile: +46 709 242 646   
11243 Stockholm             Web:    www.virtutech.com  
Sweden
________________________________________________________