[ECOS] simultaneous interrupts crashes a networking application

HG henri@broadbandnetdevices.com
Thu May 1 11:55:00 GMT 2003

Hi All,

While troubleshooting the randomly (about 50%) occuring crashes of
tftp_client_test.c (previously reported a few days ago) , I observed the
following with a logic analyser:

-    every instance of crash observed had a timer tick interrupt happening
in close time proximity to the interrupt request from     the 82559 (within
about 200 ns before or after the 82559's irq)

-    the time offset of the timer tick before or after the irq are very
deterministic : they are about the same when different             crashes
are observed.

-    the application crashes by itself , the tftp server is a freebsd unit,
and issues no udp packets other than the tftp ones
    (as observed with tcpdump)

The timer ticks happen at regular offsets because the cpu's 100 Mhz clock
and the 82559's state machine clock are derived from sources phase locked to
a single 10 Mhz oscillator. If they had been derived from separate crystals
, their exact frequency and phase relationship would have been random. This
would have made events where the timer tick and the ethernet interrupt occur
in close proximity truly random , rarely occuring events. But eventually ,
this event would occur crashing the application.

I dont dispute that this ecos application has been well tested , however I
think that the testing method imposed by my particular hardware platform
highly accelerates the instances of cases that are known to be otherwise
difficult to test : when unrelated interrupts  happen quasi-simultaneously.

in the configuration tool , under ecos Hal for both redboot and the ram
in hal interrupt handling allow nested interrupts is deselected
in hal context switch support use minimum thread context is deselected
the interrupt stack is at 32k
the stack for the thread is 30k

Anyone has ideas of other config items that  can be tried????

otherwise , i guess the area of code to check is the isr and dsr for the
82559 driver
and the part of the code that handles the timer tick????



Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

More information about the Ecos-discuss mailing list