[RFC] Modernize HP-UX core file handling

Joel Brobecker brobecker@adacore.com
Thu Dec 16 19:53:00 GMT 2004


> After Mark's changes, I get something much better:
> 
>     (gdb) bt
>     #0  0xc01f5e38 in kill () from /usr/lib/libc.2
>     #1  0x00002988 in cause_crash () from ./crash
>     Cannot access memory at address 0xffffffed
> 
> I'll look at the unwinding failure when I have a moment.

Hmmm, I had a look at the problem above, and I'm a bit non-plussed...

First, the code:

        #include <sys/types.h>
        #include <signal.h>
        
        void
        cause_crash (void)
        {
          kill (getpid (), SIGABRT);
        }
        
        int
        main (void)
        {
          cause_crash ();
        
          return 0;
        }

Do the following (I used something close to GCC 3.4.3):

        % gcc -o crash crash.c
        % ulimit -c 100000
        % ./crash

This should produce a core file that GDB should be able to debug.
But the backtrace command fails:

        (gdb) core core
        (gdb) bt
        #0  0xc01f5e38 in kill () from /usr/lib/libc.2
        #1  0x00002988 in cause_crash ()
        warning: Cannot access memory at address 0xffffffed

Here is the assembly file for cause_crash():

        (gdb) disass
        Dump of assembler code for function cause_crash:
        0x00002958 <cause_crash+0>:     stw rp,-14(,sp)
        0x0000295c <cause_crash+4>:     copy r3,r1
        0x00002960 <cause_crash+8>:     copy sp,r3
        0x00002964 <cause_crash+12>:    stw,ma r1,40(,sp)
        0x00002968 <cause_crash+16>:    stw r3,-4(,sp)
        0x0000296c <cause_crash+20>:    b,l 0x2928 <getpid>,rp
        0x00002970 <cause_crash+24>:    nop
        0x00002974 <cause_crash+28>:    copy ret0,r19
        0x00002978 <cause_crash+32>:    copy r19,r26
        0x0000297c <cause_crash+36>:    ldi 6,r25
        0x00002980 <cause_crash+40>:    b,l 0x2940 <kill>,rp
        0x00002984 <cause_crash+44>:    nop
        0x00002988 <cause_crash+48>:    ldw -14(,r3),rp
        0x0000298c <cause_crash+52>:    ldo 40(r3),sp
        0x00002990 <cause_crash+56>:    ldw,mb -40(,sp),r3
        0x00002994 <cause_crash+60>:    bv,n r0(rp)
        End of assembler dump.

As you see, rp is saved at sp - 20 @+0 (all good), and then sp is
saved in r3 @+8 before sp is incremented by 64 bytes @+12. That's
where GDB says the function prologue ends.

However, the instruction right after that, at +16 saves r3 on the
stack at sp - 4. GDB didn't see that. So some logic in GDB that
says that if Save_SP is set in the function unwind record is set
and the FP register is non zero, then the FP register holds the
frame base:

     if (frame_pc_unwind (next_frame) >= prologue_end
         && u->Save_SP && fp != 0)

The problem is that somehow r3 got clobbered during the call to kill,
and no longer contains the frame base address (it contains 0x1). I am
a bit surprised by this, since r3 is supposed to be callee-save.  In any
case, GDB later fails when it uses that value as the base address where
to read the saved RP from.

The current code seems a bit more complicated than it needs to be,
at least to me. There is also a bit of duplication between the code that
analyzes the prologue, and the code that just skips it.  This mixture
between using part of the information from the unwind record, and
some other information derived from analyzing the prologue may
increase the complexity of the code. And also, the fact that the
HP native compiler and GCC do not use the same convention for the
frame (and the unwind record) makes it even more iffy. I am wondering
whether it might be possible to simplify all this by using the usual
approach relying mostly on prologue analysis and register tracking.

I'm just speaking aloud right now, sharing some general ideas, as fixing
the problem above within the current implementation does not seem easy
without resorting to a hack. And it almost feels that it would be a
hack on top on a pile of other hacks (no offense intended, it may be
my lack of knowledge of the pa/hpux architecture too). For instance,
if you look at the prologue skipping routine, you'll find that we
include all instructions at the begining of a function until we have
found where all registers are saved. We know before scanning the
instrunctions that the SP was saved, but we don't know that the register
containing the old SP has itself been saved on the stack. How to extend
the loop condition to include that possibility? The code computing
the frame cache also seems to be doing a lot of guessing... Perhaps
the same sort of cleanup as what we did with Andrew for mips would
be beneficial. It's something I'll keep on my long-term TODO list.

Anyway, just some thoughts.

-- 
Joel



More information about the Gdb-patches mailing list