[ECOS] DSR stops running after heavy interrupts. Bug found?

Joe Porthouse jporthouse@toptech.com
Sat Apr 8 04:18:00 GMT 2006


Found it!!!  

It took two days to figure out what was happening but I think I have a
handle on it.  See if this sounds right.

After an ISR executes, if there is an associated DSR to execute, the DSR is
added to the DSR list and a scheduler lock is made.  Since DSRs are run with
interrupts enabled, the scheduler lock will prevent the application code
from running until all DSRs finish and release each of the scheduler locks.
After adding the DSR to the list, if there is only one scheduler lock (the
one just added), then a call must be made to start the first DSR executing.
If more then one scheduler lock is in place, then execution must resume from
where it left off (DSR or other critical section).  The DSR will start after
the next scheduler unlock is called.

If the ISR does not have an associated DSR, nothing is added to the DSR list
and the scheduler lock is not made, allowing the application or DSR to
resume when the ISR finishes.

The problem is in the /hal/arm/arch/current/src/vectors.S file at line 951.

  // The return value from the handler (in r0) will indicate whether a 
  // DSR is to be posted. Pass this together with a pointer to the
  // interrupt object we have just used to the interrupt tidy up routine.

  // don't run this for spurious interrupts!
  cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE   <-- Incorrectly references R4

  cmp     r0,#CYGNUM_HAL_INTERRUPT_NONE   <-- Change to this

The wrong register is referenced to determine if the ISR has a DSR to add to
the DSR list.  Since any value in R4 other then 0x0001 will call the
routines to add a DSR, and I assume most ISRs have a DSR, the default
behavior seems to works by chance in most configurations.

In my application my ISR does NOT have an associated DSR.  Even though the
correct 0x0001 is returned by the ISR, the call to add the DSR is still
made.  This includes performing a scheduler lock since it expects to release
it after the DSR runs, but there is no DSR.  I believe there is some type of
race condition here that allows the lock to not be released correctly since
there is no corresponding DSR in the DSR list.

Modifying only the above line has so far completely solved my issue of
loosing my DSRs execution.

Can someone review the proposed change, and if warranted, add it into the
CVS?  This problem could/will effect any ARM eCOS application.  Since "v1"
may have correctly referenced "r0" at some time in the past, the other half
dozen "v1, v2...v6" references in vectors.S could also be incorrect.

Joe Porthouse
Toptech Systems, Inc.

-----Original Message-----
From: ecos-discuss-owner@ecos.sourceware.org
[mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Andrew Lunn
Sent: Thursday, April 06, 2006 5:19 PM
To: Joe Porthouse
Cc: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] DSR stops running after heavy interrupts.

On Thu, Apr 06, 2006 at 05:08:45PM -0400, Joe Porthouse wrote:
> Stefan, thanks.  I'm glad to know I'm not the only one experiencing this
> problem.
> 
> I have made a little more progress.
> 
> I still can't explain the issues with the code listed in my first message
> with the code checking the return value from the ISR, but I believe it is
> somehow working correctly.  I still believe there may be a problem with R4
> being checked instead of R0.  I did verify that the memory was the same as
> my code window, as well as the flash image.
> 
> This is what I did find.
> 
> DSR calls are being added to the table... thousands of them... just not
> getting serviced.  The all calls that lead to "call_pending_DSRs" seem to
> originate from the unlock_inner() routine getting called.  This routine
> stops getting called when the problem occurs.  (you can see the logic
below)
> 
> 
> inline void Cyg_Scheduler::unlock() 
> { 
>     // This is an inline wrapper for the real scheduler unlock function in

>     // Cyg_Scheduler::unlock_inner(). 
>         
>     // Only do anything if the lock is about to go zero, otherwise we
simply
> 
>     // decrement and return. As with lock() we do not need any special
code 
>     // to decrement the lock counter. 
>      
>     CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); 
>           
>     HAL_REORDER_BARRIER(); 
>           
>     cyg_ucount32 __lock = get_sched_lock() - 1; 
>          
>     if( __lock == 0 )
>       unlock_inner(0); 
>     else
>       set_sched_lock(__lock); 
>    
>     HAL_REORDER_BARRIER(); 
> }
> 
> Upon examination the __lock value is "6" when unlock() is called at the
end
> of the ISR, thus unlock_inner never gets called.  If I get the variable
> location in the get_sched_lock() back to 1, my DSR calls resume.
> Mmmmmmm....
> 
> So somehow locks are being done without unlocks.  I am at a loss to figure
> out how this is occurring since I do not make lock calls in any of my
code.
> Could interrupt preemption somehow be occurring?  Does the
> hal_disable/enable interrupt calls mess with the lock?
> 
> Any good ideas on how to track this down?

Kernel instrumentation. 

CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); 

locks and unlocks are logged. See if you can find a case of a lock
without an unlock.

        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss




-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss



More information about the Ecos-discuss mailing list