This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] a prototype checkpoint-restart using core files


Michael Snyder wrote:

Folks, this isn't for commit, just for discussion.

Attached is an experimental patch that adds a command
"restore-core-file" or "rcore", which is the inverse of
"generate-core-file" (gcore).  Instead of copying the
memory and register state of a process into a file,
it takes an existing corefile, and copies its memory
and register state into the child process.

My prototype is even lamer :-) I use target read/write operations to collect state - but it can step backwards. An improved version in the works uses vfork() to make core images more cheaply.


The idea was to experiment with the concept of doing checkpoint and restore, by using a corefile as the checkpoint file. Obviously it has limitations -- it doesn't save any kernel state, I/O state etc. Just user state.

But it turns out that if you avoid those limitations,
it works!  As a conservative rule of thumb, you can
go back to an earlier state so long as you don't cross
a system call.  And in practice there are lots of
system calls that can be regarded as "stateless",
or that change only user state -- so you can cross
those.

One idea I've considered is getting the OS to set up some kind of notification at system calls, and then use it to warn the user who tries to resume the inferior after rolling back. In addition to obvious corruption issues, you can also get some funky Heisenbugs, for instance if the code of interest is inside "if (!file_exists()) {", but a forewarned user can then decide whether to press on or just rerun.

On shared memory, there's an old Mark Linton paper (1988
debug workshop I think) where they deal with shared memory
and replay by using the compiler to instrument all memory
refs that might be to shmem, basically adding a test to see
if the address is in shmem and if so, updating the shared
memory bits from a saved version. A hairy solution to a
hairy problem...

Stan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]