This is the mail archive of the
ecos-discuss@sourceware.org
mailing list for the eCos project.
RE: FreeBSD Netstack EPIPE error
- From: "Bell, Andrew [Allen & Heath UK]" <Andrew dot Bell at allen-heath dot com>
- To: "Andrew Lunn" <andrew at lunn dot ch>
- Cc: <ecos-discuss at sourceware dot org>
- Date: Wed, 11 Oct 2006 15:31:43 +0100
- Subject: RE: [ECOS] FreeBSD Netstack EPIPE error
Hi All,
Many thanks for all the replies.
I've already upped CYGNUM_FILEIO_NFILE.
Alas were not using the eCos flash drivers, were using a nocache
write-through:
CYGARC_MEMDESC_NOCACHE( 0xfe000000, 0x01000000 ), // ROM region
with an API which is called from a single thread, so no where do we
disable interrupts. To remove the lengthy flash write from the equation
I've replaced it with a busy wait (while (1)) in my worker thread - It
still falls over. Hmm.
I'm assuming as I'm seeing full RX packet dumps during the xmit
starvation that the delivery thread is still running.
There seams to be a correlation between my user thread context being
busy shortly after a high volume of eth TX and the netstack then failing
to ack subsequent requests from the front end. If I remove the busy
wait, the netstack survives.
Is there anyway I can confirm the DSR call to wake the delivery thread
gets performed during this period? Or any reason why the TX side of the
netstack would stall whilst the RX is fine ?
I'm currently writing a test harness to attempt a replication of the
issue which is hardware independent. At present it works perfectly.
Grrr.
Anyone with any clues, you'd be saving my sanity.
Cheers
Andrew.
> From what I understand the protocol stack is interrupt driven. Here's
> the final twist. My main worker thread in eCos is very busy during the
> time I observe the netstack xmit starvation (writing to flash) for a
> period of around 16 seconds.
>
> I've made sure my main thread priority is lower (higher in integer
> terms) than the internal netstack delivery threads priority.
>
> Is there any way a user thread can cause netstack starvation? BTW I'm
> not locking out interrupts during this time.
Does the device ping afterwards?
It could be the device driver is locking up because of a bug in
handling full receive buffers. Transferring packets from the device
driver into the stack happens in a thread. If your threads are not
running for 16 seconds, it could be the ethernet device has been
signalling received packets but the stack has not had chance to get
them from the device. The device then overflows its receiver
buffer. Once threads start running again maybe the correct action is
not being taken to recover from running out of receive buffers. Some
ethernet device need telling to start again.
Andrew
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss