This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

ARM7 stack issue (was ARM vectors.S hang issue)


Hi All,
This issue really has us perplexed.  I thought I would
provide some more facts and hopefully someone can give
us some insight.

Summarizing from the previous email, we are porting
eCos to the Gameboy Advance.  We are debugging timer
interrupt code that causes the platform to hang if our
user program makes calls to some routines such as
diag_printf or sprintf. (see thread below)

It appears that the hang is due to the IRQ code in
vectors.s overwriting stack information in the user
code (non IRQ code) stack information.  Thus, when the
interrupt routine returns, the user code loses its lr
among other things and crashes.  This explains why not
switching to supervisor mode (staying in irq mode) in
the IRQ routine fixes the issue (because eCos and
client code runs in supervisor mode).  Also, if we
create a bigger stack frame in the IRQ routine (say
128, instead of 76) the problem also goes away.  Or if
in the IRQ routine we switch to supervisor mode and
use an entirely new stack (and restore the old stack
pointer on mode switch) the problem goes away as well.
 Note, at this point, we have allocated a very large
16K user stack to decouple overflow issues, but we've
verified that the stack never grows beyond 1k or so.
 
Some other random facts:
-- We are using an older distribution of GNUpro for
thumb-elf targets.  The gcc version is 2.95.  We have
noticed that there is a patch for gcc
(ecos-gcc-2952.pat).  We are unsure what it fixes for
ARM targets (we'll probably try this next)  
-- The user code is compiled in thumb mode (it's
faster on the Gameboy Advance) 
-- We have verified that all registers are being
restored correctly upon IRQ routine exit (including
sp_svc, lr_svc, spsr_svc, lr_irq, spsr_irq,  and r0
through r12 (sp_irq doesn't matter) that should be all
of them)
-- We have also looked at the assembly generated for
the sprintf and diag_printf routines -- we have
scrutinized them for poor stack management (such as
modifying stack contents BEFORE adjusting the stack
pointer, or perhaps stacks that grow up instead of
down) and nothing has turned up as strange, but then
again this is a difficult task.  It seems that poor
(wrong) stack management in the user code (gcc
generated) is the only explanation for all this.  We
are open to any suggestions!

thanks,
--bill

> Bill Diehls wrote:
> > 
> > Hi All,
> > I have been working with a couple of friends
> porting
> > eCos to the Gameboy Advance which uses an ARM7TDMI
> > core.  We have ported the hal and we have Redboot
> up
> > and running with console and flash support.  We
> also
> > have the debug stubs working under Redboot.
> 
> You probably won't be surprised by this, but I'm
> sure people would want you
> to contribute this back to the main sources so that
> everyone can have a go
> (when it's done). See
>
http://sources.redhat.com/ecos/faq.html#contrib_assign
> for the formalities.
> 
> 
> > But we
> > are experiencing a strange issue in the ISR
> routine in
> > vectors.s.  When running the periodic timer test
> > program (intr.c) for 1000 ticks or so, we get a
> hang
> > in the program -- but only if we call diag_printf
> to
> > indicate the number of ticks in the while loop. 
> If we
> > remove diag_printf, the problem goes away
> completely.
> > 
> > More interestingly though, if we remove the switch
> to
> > supervisor mode (and the switch back to irq mode)
> in
> > the IRQ routine in vectors.s, the hang problem
> > disappears as well, regardless of calling
> diag_printf.
> >  (Note, sprintf instead of diag_printf exhibits
> the
> > same behavior as well as other homemade routines
> that
> > don't use variable args.)   This would probably
> > indicate a stack overflow in the supervisor stack,
> but
> > resizing __startup_stack to large values (say 8k)
> has
> > no effect.  Any thoughts on what might be
> happening
> > would be greatly appreciated.
> 
> It's not the startup stack size you should change,
> change the
> CYGNUM_HAL_COMMON_INTERRUPTS_STACK_SIZE config
> option. assuming you are
> using a separate interrupt stack (
> CYGIMP_HAL_COMMON_INTERRUPTS_USE_INTERRUPT_STACK ).
> 
> But if this _is_ an interrupt problem, as seems
> entirely possible,
> diagnostic output in itself should not be enough to
> cause interrupts
> because it's polled. What this may instead mean is
> that you interrupts are
> unmasked for the device doing the diag output. You
> should ensure those
> interrupts are masked in your platform
> initialization in
> cyg_hal_plf_comms_init(). Try adding an explicit
> mask of those interrupts
> in the test.
> 
> Also you could try turning off interworking with the
> ROM monitor (
> CYGSEM_HAL_USE_ROM_MONITOR) to be sure that any
> problems are only in your
> app, so if you change config options they definitely
> take effect.
> 
> Also, have you enabled assertion support, and the
> stack checking options?
> If you've enabled assertion support you should get
> told about a spurious
> (i.e. unhandled) interrupt if there is one.
> 
> Jifl
> -- 
> Red Hat, Rustat House, Clifton Road, Cambridge, UK.
> Tel: +44 (1223) 271062
> Maybe this world is another planet's Hell -Aldous
> Huxley || Opinions==mine


__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]