This is the mail archive of the
ecos-discuss@sourceware.org
mailing list for the eCos project.
RE: network problem
Andrew,
I am using a snapshot from the 2005 era. I did go through the archives just
after I sent my e-mail and did find something from 16-Nov-2005 subject
Possible sockets/fd race condition.
I did what they did in socreate in uipc_socket.c and it appears to have
fixed my problem. The latest eCos repository does not contain this fix.
Below is the so create code. I added a call to splnet and the appropriate
calls to splx. This affects both
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c and
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c.
I guess I need to submit a patch because this issue is still in the latest
eCos repository which I am getting ready to use the latest eCos for a new
project/processor.
Below is the new socreate function in
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c
int
socreate(dom, aso, type, proto, p)
int dom;
struct socket **aso;
register int type;
int proto;
struct proc *p;
{
register struct protosw *prp;
register struct socket *so;
register int error;
int s = splnet();
if (proto)
prp = pffindproto(dom, proto, type);
else
prp = pffindtype(dom, type);
if (prp == 0 || prp->pr_usrreqs->pru_attach == 0)
{
splx (s);
return (EPROTONOSUPPORT);
}
if (prp->pr_type != type)
{
splx (s);
return (EPROTOTYPE);
}
so = soalloc(p != 0);
if (so == 0) {
splx (s);
return (ENOBUFS);
}
TAILQ_INIT(&so->so_incomp);
TAILQ_INIT(&so->so_comp);
so->so_type = type;
so->so_proto = prp;
error = (*prp->pr_usrreqs->pru_attach)(so, proto, p);
if (error) {
so->so_state |= SS_NOFDREF;
sofree(so);
splx (s);
return (error);
}
*aso = so;
splx (s);
return (0);
}
Thanks for your response,
Rick Davis
-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch]
Sent: Monday, August 27, 2007 4:12 AM
To: Rick Davis
Cc: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] network problem
On Mon, Aug 27, 2007 at 02:48:42AM -0400, Rick Davis wrote:
> I have a device using the MPC859T processor that has a small web server
> running using the standard eCos web server. I have a status page that
> auto-refreshes every 15 seconds and I am pinging the unit every second
(Yes,
> I have a customer that is actually doing this). I don't really know what
> other network activity is occurring at the customer's site but my test lab
> has Windows network chatter going on. After about 12 or so hours the web
> stops responding and the unit can no longer be pinged. The FEC Ethernet
> driver is receiving packets and is calling the eth_drv_dsr but the deliver
> function is never called.
>
> I have been tracking this down for some time and have noticed the
> following...
>
> 1. The alarm thread in timeout.c is getting blocked when calling
> splx_internal() just before the call to eth_drv_run_deliveries().
> 2. The current value of spl_state in sync.c is 4 (SPL_NET)
>
> Any ideas why the network would not release the splx_mutex?
> Any suggestion on how to further track this down?
> I don't have a GDB interface on my platform. :(
What vintage of eCos are you using? If you go back far enough into the
mists of time, there was at least one bug fix for alarms. But that is
a long time ago.
Do you have asserts enabled? It might give some clues.....
You could also enable CYGIMPL_TRACE_SPLX and call show_sched_events()
when you hit the deadlock. That should tell you what function is
holding the mutex. You might want to add to the log structure
__builtin_return_addresss(0), so you can see one more level up the
call stack. Otherwise i think you will just get spi_slpnet, which is
not much use.
Andrew
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss