[ECOS] TCP/IP preemption fix
Gary Thomas
gthomas@redhat.com
Fri Apr 14 03:41:00 GMT 2000
On 14-Apr-00 Grant Edwards wrote:
> On Thu, Apr 13, 2000 at 04:50:46PM -0600, Gary Thomas wrote:
>
>> >> Can you see if these patches fix [at least] the sockbuf corruption
>> >> problem you were seeing?
>> >
>> > After some additional testing, it seems the problem is still
>> > there. It looks like there are routines that access sb structs
>> > without calling sblock/sbunlock.
>> >
>> > The ones we've found are in code called by the network task:
>> > tcp_input, tcp_output, etc. There are calls to sbappend and
>> > similar functions/macros that result in unprotected accesses to
>> > sb struct fields.
>> >
>> > My original e-mail pointing out the unreliability of sblock and
>> > sbunlock didn't identify the entire problem.
>> >
>> >> The basic idea I've incorporated is to use the eCos scheduler
>> >> lock to emulate the user/kernel behaviour from the BSD world
>> >> (i.e. kernel code cannot be preempted)
>> >
>> > I think that sblock and sbunlock should work now, but I don't
>> > think they're called in enough places.
>>
>> So this seems to be a start. I'll try and investigate more.
>> BTW did you see any improvement at all (just to make sure we are
>> hunting the right fox)?
>
> Well, some of the throughput tests showed an improvement of about 5%, but I
> don't know why the changes should have done that. When we spent some more
> time analyzing things it looks like the sb corruptions have to be happening
> due to conflicts between our user tasks and functions called from the
> network task.
>
> AFAICT, the sblock/unlock calls are in routines called from user tasks, but
> not in the functions called by the network task. So the patch should
> prevent conflicts between users tasks. We have two user tasks that do
> TCP/IP via a single socket, but one handles input and the other handles
> output, so it doesn't appear they can be conflicting with each other, since
> there are separate input and output sb structs.
>
> The conflict we seem to be running into is between our user tasks and the
> code run from the network task such as tcp_input and tcp_output.
>
>> DO you have a good way to duplicate the failures?
>
> If we set our user tasks to a higher priority than the network task, we will
> almost always see a panic within a minute or two. Sometimes it will run for
> as long as several minutes.
>
>> Is it something I can set up here?
>
> Not really. The application that fails most predictably depends on custom
> target hardware and some specific software on a host for it to talk to. I've
> been trying to come up with a simple test configuration based on one of the
> existing eCos tests programs that demonstrates the problem, and will
> continue to try. But, I haven't been able to come up with the right
> combination of cpu loading and TCP/IP traffic patterns to make it fail
> predictably, so I have to build a library, throw it over the wall and let
> the application development guy try it out.
>
>> Any/all information on this will be useful.
>
> I'm trying to come up with a simple test case -- I've got one more idea to
> try out tomorrow. I might also add some sblock/sbunlock pairs to some of
> the functions like sbappend to see if that has an effect. (I've convinced
> myself it's got to.)
>
If it is caused by interaction between the network task and your threads, try
this patch:
cvs diff: Diffing net/tcpip/current/src/ecos
Index: net/tcpip/current/src/ecos/support.c
===================================================================
RCS file: /local/cvsfiles/ecc/ecc/net/tcpip/current/src/ecos/support.c,v
retrieving revision 1.7
diff -u -5 -p -r1.7 support.c
--- net/tcpip/current/src/ecos/support.c 2000/03/08 19:11:20 1.7
+++ net/tcpip/current/src/ecos/support.c 2000/04/14 10:38:37
@@ -494,10 +494,11 @@ cyg_netint(cyg_addrword_t param)
{
cyg_flag_value_t curisr;
while (true) {
curisr = cyg_flag_wait(&netint_flags, NETISR_ANY,
CYG_FLAG_WAITMODE_OR|CYG_FLAG_WAITMODE_CLR);
+ cyg_scheduler_lock(); // This code should not be preempted
#ifdef INET
if (curisr & (1 << NETISR_ARP)) {
// Pending ARP requests
arpintr();
}
@@ -510,10 +511,11 @@ cyg_netint(cyg_addrword_t param)
if (curisr & (1 << NETISR_IPV6)) {
// Pending IPv6 input
ip6intr();
}
#endif
+ cyg_scheduler_unlock();
}
}
//
// Network initialization
Basically, just keeping other threads from running while the network "task"
performs its housekeeping.
More information about the Ecos-discuss
mailing list