[ECOS] Question about ecos server performance

Tue Aug 13 08:15:00 GMT 2002

On Tuesday 13 August 2002 07:56, Gary Thomas wrote:
> On Tue, 2002-08-13 at 08:51, NavEcos wrote:
> > On Tuesday 13 August 2002 07:04, Gary Thomas wrote:
> > > On Tue, 2002-08-13 at 08:03, NavEcos wrote:
> > > > [SNIP]
> > > >
> > > > > > The bug is as follows:
> > > > > >
> > > > > > 1) The server (eCos app) starts,
> > > > > > 2) Connect to the server with telnet, port 4000
> > > > >
> > > > > Then what?  What do you have to do [from the "client" side] to
> > > > > evoke the crash?
> > > >
> > > > Connect.  That's all.  My crash happens in less than 3000 bytes
> > > > of transferred data, always.
> > > >
> > > > If you want, I can send you my entire environment but before I
> > > > do that I'll update CVS.  Maybe it was a bad day when I downloaded?
> > >
> > > No, I was able to duplicate this.  I just asked before trying it as
> > > I didn't want to waste time if there was more that was necessary.
> > >
> > > The problem is obvious and, indeed, the program tells you exactly why.
> > > It's reporting "too many mbufs to tx", which comes from the logical
> > > network layer which tries to pack up a packet to be sent and give it
> > > to the physical driver.  However, in this case, the data structure
> > > which represents the packet has [perhaps] hundreds of little tiny
> > > pieces in it.  The method used by the physical layer can't handle that
> > > [currently].  I'll have to think a bit about how to fix this.
> >
> > Well, the documentation states that running out of mbufs will not
> > crash the TCP/IP layer.  Why does it?  I suspected that it was
> > because there were a bunch of tiny pieces but I didn't debug it.  I
> > did see the error message, of course, and assumed they may be linked.
> > I probably should have mentioned that.
> >
> > Maybe incorporating a counting semaphore to cause threads allocating
> > mbufs to block would do it?  I am not sure how much overhead there
> > would be in doing that, but it would nicely block the threads when there
> > were no more mbufs.
>
> This has *nothing* to do with running out of mbufs.  That's not what
> what message says at all.  It says that it [currently] can't handle
> a data packet which is composed of so many mbufs.

Sorry.  As I said, I didn't do much work in debugging it.

> > > I would say that this is a aberrant program though and just happened to
> > > run into this limitation.
> >
> > Well, I agree, it's an atypical example but it's still a serious problem
> > when you can crash it for whatever reason.  The code is legal.
> >
> > I don't care about performance for such a program, what and I do not
> > think ANYBODY would writing awful code like that.  But what concerns
> > me is that the stack crashes.  There are instances in which you may
> > get a bunch of small packets being sent.
> >
> > For example, say you have a profiler that sends out the PC at the
> > time of an interrupt at regular intervals.  If you get the interval
> > just right, you'll crash the box.  You may do this as a low priority
> > thread too that sends all available data.  In a quiet system, it will
> > end up sending 4 bytes almost always.
>
> But probably not continuously, as your example does though.

In most cases no, but in a critical system it would be dangerous.
For example, a medical system.

> I agree that there is a problem with the stack.  It's simply not a
> scenario I ever imagined (nor, until today, experienced).  It's been
> filed as a bug and will get fixed [someday].
>
> Of course, you're free to fix it yourself.  Remember that for the
> most part, eCos is now a *volunteer* project.  I'm certainly not
> getting paid to fix this (any more). Things will get fixed if and
> when there is time.

I am well aware that it's a volunteer project.  I've contributed
several patches before for the XScale board and I fixed a driver
problem with the stack back in September of last year before it was
a volunteer project.  That bug also caused a crash.

If you could give me some advice as to what exactly is going
on and where, I'll submit a patch as time permits.  I'm busy too
but I think I can look into it before next week.  I am trying to gain
a mastery of this OS, so I'll be happy to try to fix it.  I just wrote the
list to confirm it was indeed a bug.  I don't have a huge sampling
of hardware here.

-Rich

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss