[ECOS] ARM HAL Problems

Fri Jul 7 17:06:00 GMT 2006

Thanks.  I think I found the problem.  I had programmed RedBoot (ROM
startup) into the flash on my board.  RedBoot boots and runs very quickly --
before I can start GDB using the JTAG debugger.  Therefore, the caches were
enabled and active when the JTAG debugger takes over.  After loading my
application code via the JTAG interface, I would run my application which
would disable and flush the caches.  This cache flushing would overwrite
portions of my program/data (somewhat randomly), causing it to crash.  I
have fixed this by putting some additional stuff in my GDB init script to
disable and flush the caches before I load my application.  All works again.

Jay

-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch]
Sent: Thursday, July 06, 2006 3:15 PM
To: Jay Foster
Subject: Re: [ECOS] ARM HAL Problems

On Thu, Jul 06, 2006 at 02:58:00PM -0700, Jay Foster wrote:
> I'm working on an ARM9 HAL (ARM940T core, gcc 3.4.3), and am having
problems
> with the code crashing pseudo randomly.  The frustrating part is that this
> was working great last week.  This week, I can not get it to work at all.
I
> am loading the code (RAM startup) onto the target board using a JTAG
> emulator.  It either crashes on an ASSERT or dies with a data abort or
> prefetch abort.  After a couple of days of fruitless debugging, the best I
> can determine is that the CPU registers are getting corrupted by the RTC
> interrupt, causing the code to run off into the weeds in random ways.  I
> can't figure out how (if) this is happening.  I'm using only IRQ (no FIQ)
> interrupts to avoid nesting problems.   Any helpful debugging tips?

I few random things to check, from my past experiance.

1) I assume you have asserts enabled?  Well, yes, you do, since you
   say it sometimes dies with an assert.

2) When it has crashed, take a look at the interrupt vectors code in
   0x0-0x40. Have you de-referenced a null pointer and so corrupted
   the vectors. Also check the eCos list of interrupts, not just the
   ARM vectors. It is less likely, but still possible. I once spent a
   week looking for a bug like this. Something corrupts the IRQ
   vector. 10ms later the timer tick goes off and then you die. Nasty
   to find.

3) Make all you stacks bigger, just in case.

4) check the processor mode when it goes wrong. Interrupt mode? System
   mode?

5) Try back tracking from an assert/data abort by decoding the
   stack(s) by hand. Check the other stacks as well, not just the
   current CPU mode stack.

   Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss