[ECOS] network problem

Tue Aug 28 09:00:00 GMT 2007

Andrew,

I am using a snapshot from the 2005 era. I did go through the archives just
after I sent my e-mail and did find something from 16-Nov-2005 subject

Possible sockets/fd race condition.

I did what they did in socreate in uipc_socket.c and it appears to have
fixed my problem. The latest eCos repository does not contain this fix.
Below is the so create code. I added a call to splnet and the appropriate
calls to splx. This affects both
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c and
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c.

I guess I need to submit a patch because this issue is still in the latest
eCos repository which I am getting ready to use the latest eCos for a new
project/processor.

Below is the new socreate function in
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c

int
socreate(dom, aso, type, proto, p)
	int dom;
	struct socket **aso;
	register int type;
	int proto;
	struct proc *p;
{
	register struct protosw *prp;
	register struct socket *so;
	register int error;
	int s = splnet();

	if (proto)
		prp = pffindproto(dom, proto, type);
	else
		prp = pffindtype(dom, type);

	if (prp == 0 || prp->pr_usrreqs->pru_attach == 0)
	{
		splx (s);
		return (EPROTONOSUPPORT);
	}
	if (prp->pr_type != type)
	{
		splx (s);
		return (EPROTOTYPE);
	}
	so = soalloc(p != 0);
	if (so == 0) {
		splx (s);
		return (ENOBUFS);
        }

	TAILQ_INIT(&so->so_incomp);
	TAILQ_INIT(&so->so_comp);
	so->so_type = type;
	so->so_proto = prp;
	error = (*prp->pr_usrreqs->pru_attach)(so, proto, p);
	if (error) {
		so->so_state |= SS_NOFDREF;
		sofree(so);
		splx (s);
		return (error);
	}
	*aso = so;
	splx (s);
	return (0);
}

Thanks for your response,
Rick Davis

-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch] 
Sent: Monday, August 27, 2007 4:12 AM
To: Rick Davis
Cc: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] network problem

On Mon, Aug 27, 2007 at 02:48:42AM -0400, Rick Davis wrote:
> I have a device using the MPC859T processor that has a small web server
> running using the standard eCos web server. I have a status page that
> auto-refreshes every 15 seconds and I am pinging the unit every second
(Yes,
> I have a customer that is actually doing this). I don't really know what
> other network activity is occurring at the customer's site but my test lab
> has Windows network chatter going on. After about 12 or so hours the web
> stops responding and the unit can no longer be pinged. The FEC Ethernet
> driver is receiving packets and is calling the eth_drv_dsr but the deliver
> function is never called.
> 
> I have been tracking this down for some time and have noticed the
> following...
> 
> 1. The alarm thread in timeout.c is getting blocked when calling
> splx_internal() just before the call to eth_drv_run_deliveries().
> 2. The current value of spl_state in sync.c is 4 (SPL_NET)
> 
> Any ideas why the network would not release the splx_mutex?
> Any suggestion on how to further track this down?
> I don't have a GDB interface on my platform. :(

What vintage of eCos are you using? If you go back far enough into the
mists of time, there was at least one bug fix for alarms. But that is
a long time ago.

Do you have asserts enabled? It might give some clues.....

You could also enable CYGIMPL_TRACE_SPLX and call show_sched_events()
when you hit the deadlock. That should tell you what function is
holding the mutex. You might want to add to the log structure
__builtin_return_addresss(0), so you can see one more level up the
call stack. Otherwise i think you will just get spi_slpnet, which is
not much use.

    Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss