[ECOS] Network code unstable (Solved for real this time).
Gary Thomas
gthomas@redhat.com
Wed Mar 6 09:57:00 GMT 2002
On Wed, 2002-03-06 at 10:11, Pieter Truter wrote:
>
> After a lot of testing and debugging I found out that the CS8900 is losing
> interrupts under heavy network load. This is more prominent when running
> from flash which is slower.
>
> Looking at if_cs8900a.c I think I found the cause of my problem. The time
> between the interrupt and acknowledge() is too long. I then moved the
> acknowledge() in cs8900a_deliver() to cs8900a_isr() just after the mask()
> and now everything works great.
So, this was a case of new interrupts from the device not causing the ISR
to run, possibly because of edge triggering. I don't understand while the
'while()' loop in the interrupt handling routine doesn't cause this to be
retriggered, but maybe it's just a chip problem.
>
> I am still concerned about masking the interrupt for so long but I
> understand that this is probably done to be able to use the BSD stack with a
> realtime OS.
It's only the device interrupt which is masked. I don't see how you can
avoid that - you've got to keep the device from [re]interrupting the driver
while it handles the current one. Also note that the "deliver" function gets
called from a network processing thread, not directly by the DSR code. This
probably accounts for most of the delay.
>
> The big problem with losing an interrupt from the CS8900a chip is that you
> have to cleanup all the info in the chip otherwise it would not generate any
> other interrupts. And if you do not know that you missed an interrupt you
> don't know when to cleanup. ;-(
Every ethernet device seems to have these quirks and, sadly, we have to deal
with them all, each in their own way :-(
I'll adjust the code per your findings.
Thanks.
--
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss
More information about the Ecos-discuss
mailing list