This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

I was just wondering how difficult it would be to run the gdb simulator to back out of a kernel core file. [Frankenstein Crash Analyser]

From: Piet/Pete Delaney <piet at sgi dot com>
To: Bharata B Rao <bharata at in dot ibm dot com>
Cc: Piet Delaney <piet at sgi dot com>, lkcd-devel at lists dot sourceforge dot net, crash at oss dot missioncriticallinux dot com, gdb at sources dot redhat dot com, obrien at freebsd dot org, linux-engr at sgi dot com
Date: Fri, 10 May 2002 00:24:48 -0700
Subject: I was just wondering how difficult it would be to run the gdb simulator to back out of a kernel core file. [Frankenstein Crash Analyser]
References: <OFC43B08DA.4388D1A8-ONC1256BB3.0035CB3C@de.ibm.com> <20020508175245.J982@in.ibm.com> <20020508171342.A23748@sgi.com> <20020509100849.K982@in.ibm.com> <20020509035727.H23748@sgi.com> <20020510113945.O982@in.ibm.com>

On Fri, May 10, 2002 at 11:39:46AM +0530, Bharata B Rao wrote:
> On Thu, May 09, 2002 at 03:57:27AM -0700, Piet/Pete Delaney wrote:
> > 
> > With or without gdb's help. By compiling the kernel -g and stripping the
> > copy in /boot/efi we can have small kernels where space is tight and 
> > informative kernels where space is cheap.
> > 
> > The kerntypes and map file infromation is all in vmlinux if compiled -g,
> > same for the commands and libraries. If we just use the objects like
> > gdb does it's nice and simple.
> > 
> 
> Agreed that size/disk space is not a concern, we can have vmlinux compiled 
> with -g. But I wonder why original designers of lcrash preferred kerntypes 
> to -g compiled vmlinux.

It use to be a problem about 8 years ago when the disk were about 400Mb to
2 GB. You had to allocate a complete disk just for gdb to swap to while
reading the symbol table. Folks use to only compile part of the kernel
-g for example at HP about 8 years ago on HP 10.0. It turned out that
due to a C compiler bug that even then it was in fast better to just
compiler the whole kernel -g. Also while you kernel hacking you never
know where your going to end up. 

Crash goes way back to AT&T System V. Sun Microsystems is still stripping
out all of the stab information and only leaving global type information
in mdb. That's good us, because it make it more difficult for Sun to fix 
bugs quickly and they invest in other competing paradigms like Kernel Vision
where the -g informaion is added after the release; IMHO it's a real mess.
So, we take market share from the Enterprise Servers in the next year
or two as Linux overtaks Solaris. I'm long Linux and FreeBSD and short Sun.

While reading Hiroshi's idea I was thinking ...

It would be real slick to be able to backout the pc in say panic() and
have have gdb use the simulator code on the dump to return up to
a high level function that called panic, move the pc to just
before the call to the function that made the kernel deceide
to call panic, and reexecute the code under the simulator to
understand why the results were such that panic had to be called.

For example, yesterday I hit a panic is the ia64 page fault handler:

    101         switch (handle_mm_fault(mm, vma, address, (mask & VM_WRITE) != 0)) {
    102               case 1:
    103                 ++current->min_flt;
    104                 break;
    105               case 2:
    106                 ++current->maj_flt;
    107                 break;
    108               case 0:
    109                 /*
    110                  * We ran out of memory, or some other thing happened
    111                  * to us that made us unable to handle the page fault
    112                  * gracefully.
    113                  */
    114                 signal = SIGBUS;
    115                 if (ia64_do_page_fault_panic_enabled) {
  ->116                         panic("ia64_do_page_fault");
    117                 }
    118                 ia64_do_page_fault_panic_count++;
    119                 goto bad_area;
    120               default:
    121                 goto out_of_memory;
    122         }

So when we start the analysis we are in dump_execute() having been called from panic(). 
Now, lets immagine we have a nice GUI like ddd interfaced to the crash analyser and we 
just move the PC with the mouse to the end of dump_execute() and then hit single step. 
The crash analyser simulates the execution and then leaves us at the call to dump_execute()
in panic(). So we do the same thing and return from panic() up to the above page
fault code. 

So now we move the mouse over line 116 and pick up the PC icon and move it to line
101 and hit the singel step button. So now the simulator takes us down into handle_mm_fault
and we can find out exactly how it came about that it returned the zero.

I wonder how hard this would be to do. I use to use the backout facility on HP's kernel
debugger all the time and use it in user space applications every day. It sure would be 
sweat to be able to do it on a core file.

-piet

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]