[ECOS] Network code unstable (Solved for real this time).

Wed Mar 6 09:57:00 GMT 2002

On Wed, 2002-03-06 at 10:11, Pieter Truter wrote:
> 
> After a lot of testing and debugging I found out that the CS8900 is losing
> interrupts under heavy network load. This is more prominent when running
> from flash which is slower.
> 
> Looking at if_cs8900a.c I think I found the cause of my problem. The time
> between the interrupt and acknowledge() is too long. I then moved the
> acknowledge() in cs8900a_deliver() to cs8900a_isr() just after the mask()
> and now everything works great.

So, this was a case of new interrupts from the device not causing the ISR
to run, possibly because of edge triggering.  I don't understand while the
'while()' loop in the interrupt handling routine doesn't cause this to be
retriggered, but maybe it's just a chip problem.

> 
> I am still concerned about masking the interrupt for so long but I
> understand that this is probably done to be able to use the BSD stack with a
> realtime OS.

It's only the device interrupt which is masked.  I don't see how you can
avoid that - you've got to keep the device from [re]interrupting the driver 
while it handles the current one.  Also note that the "deliver" function gets 
called from a network processing thread, not directly by the DSR code.  This 
probably accounts for most of the delay.

> 
> The big problem with losing an interrupt from the CS8900a chip is that you
> have to cleanup all the info in the chip otherwise it would not generate any
> other interrupts. And if you do not know that you missed an interrupt you
> don't know when to cleanup. ;-(

Every ethernet device seems to have these quirks and, sadly, we have to deal
with them all, each in their own way :-(

I'll adjust the code per your findings.

Thanks.

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss