[ECOS] BSD socket stall

Bernd Edlinger bernd.edlinger@hotmail.de
Sat Sep 15 17:44:00 GMT 2012


Hi Henry,
> We will try out this change but we have already changed our application 
> to have separate sockets for each thread. I not sure we could provide 
> answer to the question of does this change the fix this issue in our 
> case. 
> 
> I am just surprised that an issue like this could still be the BSD 
> after all these years. I mean ECOS and freeBSD stack have been out for 
> what 10+ years. I am I clueless for having assumed having two threads 
> send one socket was OK. I believed that sockets would be thread save, I 
> guess that is not the case. 
> Thanks, 
> Henry 
For UDP sockets that use case is perfectly OK.
You will see, the new version will handle this correctly,
except for the still possible priority inversion.
For TCP sockets the results are undefined, except
when the message size is always exactly 1 byte.

Why did that problem not occur before? Hard to tell.
For instance I believe that the problem arises from
interrupting this statement in sb_lock():
sb->sb_flags |= SB_WANT;
that will be an atomic like OR [bx],0x2 on Intel,
but at least 3 assembler instructions on ARM.
So you should have no problem at all on an Intel.
Although this is a perfect example of what happens,
when you use a condition object without a mutex.

But you should also check for Spurious interrupts.
They are likely to occur due to the "Tickle Loop"
in the BSD stack, especially when at a high rate.
My latest AT91 Ethernet driver does not need this
any more, and avoids the spurious interrupts even
if the stack polls the IRQ from time to time.

Therefore I would recommend you check this list of
important patches which we at Softing developed over
the last year (I must apologize, the list is too long,
but we walk on thin ice as you know, and most of these
bug fixes are obviously badly needed):
Bug   20804: Misbehavior of printf %e/%g format
Bug 1001522: Array index out of bounds in tftp_server.c
Bug 1001629: bsd stack uses wrong timeout values if hz != 100
Bug 1001633: DHCP Client may hang
Bug 1001634: A code review of dlmalloc.cxx revealed several weaknesses
Bug 1001635: wrong results from Cyg_StdioStream::read
Bug 1001637: fcntl() fails to handle F_GETFL, F_SETFL
Bug 1001639: Problems with i2c.cxx
Bug 1001641: Erase function in flashiodev.c and flashiodevlegacy.c handle "err_address" differently
Bug 1001645: Recursive Posix Mutexes
Bug 1001648: flash_init() behaves differently if CYGHWR_IO_FLASH_DEVICE==1
Bug 1001649: AT91 hal extension
Bug 1001654: diag_printf truncates the values in %llu and %llx formats
Bug 1001655: eth_drv_send stack_corruption with CYGFUN_LWIP_MODE_SIMPLE
Bug 1001656: FreeBSD: add AF_PACKET socket family
Bug 1001657: httpd server should parse request header lines
It might help to understand what is the application for this patches,
especially the new transacted PHY interface and the Packet sockets.
Think of PTPv2: Here we have to exchange very complex data over SMI
with the PHY, and the PTP packets may be in raw ethernet format.
That is what finally led to these enhancements.
Regards,
Bernd Edlinger 		 	   		  

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss



More information about the Ecos-discuss mailing list