[ECOS] Two TCP/IP stack issues...

Grant Edwards grante@visi.com
Tue Apr 11 13:23:00 GMT 2000


First, TCP/IP panics
--------------------

> My application that's running eCos (ARM7TDMI) and the TCP/IP
> stack is seeing a panic from the stack after a minute or two
> under certain conditions.  The panics we're running are
> 
>     m_copydata: null mbuf in skip
>     m_copydata: null mbuf
>     sbdrop

After a bit of rooting around with gdb, we've determined that
these panics are the resulf of an sb struct getting corrupted.

The TCP/IP routines appear to be using two macros to attempt to
provide mutex access to the sb struct:

[from tcpip/v1_0a1/include/sys/socketvar.h]

/*
 * Set lock on sockbuf sb; sleep if lock is already held.
 * Unless SB_NOINTR is set on sockbuf, sleep is interruptible.
 * Returns error without lock if sleep is interrupted.
 */
#define sblock(sb, wf) ((sb)->sb_flags & SB_LOCK ? \
		(((wf) == M_WAITOK) ? sb_lock(sb) : EWOULDBLOCK) : \
		((sb)->sb_flags |= SB_LOCK), 0)

/* release lock on sockbuf sb */
#define	sbunlock(sb) { \
	(sb)->sb_flags &= ~SB_LOCK; \
	if ((sb)->sb_flags & SB_WANT) { \
		(sb)->sb_flags &= ~SB_WANT; \
		wakeup((caddr_t)&(sb)->sb_flags); \
	} \
}


These are used by normal foreground tasks, not DSRs or ISRs,
right?. Context switches can occur in the middle of accesses to
the sb_flags field, allowing two tasks to access the sb struct
simultaneously and corrupt it.

When we set our user-task priority to be the same as the eCos
network task, then the corruption of the sb structs stopped.
[We have time-slicing disabled.]


Q: Don't we need to serialize accesses to sb structs with
   mutexes?




Second, tcp_echo 
----------------

> In trying to duplicate the problem with some of the eCos test
> programs, I tried to lower the buffersize in tcp_source.  It
> seems to work fine down to about 100 bytes, but below that
> starts to fail.  I've tried buffer sizes of 60-70 bytes and
> after a second or two, the data trasfer just stops.  Sometimes
> I get "setsoftnet" messages on the diagnostic port.
> 
> Q: Should buffer sizes of 60-70 bytes work in the tcp_echo
>    test? I can't see anything in the source that leads me to
>    believe short buffer sizes should fail.

When running the tcp_echo program with the default task
priorities, there are long pauses in IP traffic (1 to 10
seconds).  Sometimes things clog up long enough that we run out
of mbufs and panic. If the priorities are changed to

#define IDLE_THREAD_PRIORITY     CYGPKG_NET_THREAD_PRIORITY+3
#define LOAD_THREAD_PRIORITY     CYGPKG_NET_THREAD_PRIORITY+1
#define MAIN_THREAD_PRIORITY     CYGPKG_NET_THREAD_PRIORITY-0

then data flows smoothly, and things never back up by more than
one ethernet packet.

Having the main thread priority at the old, higher level
(CYGPKG_NET_THREAD_PRIORITY-2) appears to prevent the network
task from running and processing incoming packets.  


Q: Why can't the network task run when the main thread is
   blocked on a read()?



I don't know if these two situations are related or not, but
I'd like to take a shot at trying to fix them -- any clues
anybody would care to lend would be appreciated.

-- 
Grant Edwards
grante@visi.com


More information about the Ecos-discuss mailing list