This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [SMP]serious bug in synchronisation primitives

From: sandeep <shimple0 at yahoo dot com>
To: Nick Garnett <nickg at ecoscentric dot com>
Cc: ecos-discuss at sources dot redhat dot com
Date: Mon, 29 Nov 2004 08:15:15 -0800 (PST)
Subject: Re: [ECOS] [SMP]serious bug in synchronisation primitives

>>>The DESTRUCT and BREAK wake reasons are explicitly intended for cases
>>>where the mutex will not be locked. This is why they set the result to
>>is there any list of tests (if any) in ecos cvs that do not intend to
>>lock mutex directly/indirectly?
> 
> I'm not sure what you are asking here. The kill and release tests test
> the basic mechanisms.
what i meant there was - are there any tests in current eCos cvs, that call
mutex lock but do not intend to return from this call with them being the owner
of mutex (i.e. result turns out to be false in their case)? awareness of those 
(kind of) tests could be helpful during debugging.

> The simple fix for this particular problem is to replace 
> 
>         pthread_mutex.lock();
> 
> with
> 
>         while( !pthread_mutex.lock() )
>             continue;
> 
> in pthread_setcancelstate().
after trying that some new conditions come into picture (dunno yet, if those
are  effects of this modification, or code flow has altered a bit because of
this additional codeleading to some races exhibit themselves now, that didn't
earlier.

tried running only compat-posix tests after above modifications. got some new
asserts now, like
- CYG_ASSERTCLASS( current, "Bad current thread" ); in unlock_inner
- some new asserts in pthread.cxx (forgot to note them)

also observed that, sometimes sigsetjmp test now, ends up doing some illegal
accesses. sometimes in some tests only idle-threads remain runnable.

problem with these race conditions is that, at times you run the same test
again and again in gui debugging tool for the architecture and the problem that
you observed in batch runs, doesn't surface. even in successive batch runs
sometimes problem surfaces, sometimes it doesn't.

one more doubt. why just pthread_mutex.lock situation, there are many
situations  in eCos code that go like "lock-some-mutex .... unlock-this-mutex".

since race condition surfaces when a thread is wait-sleeping on a mutex and
during this time release/some-other-function gets called on this leading to
result being false in mutex-locking process, when it wakes up.

shouldn't every such lock-some-mutex be handled like suggested while way??

> I suspect that a number of other calls to pthread_mutex.lock() would
> benefit from a similar modification. Others may require:
> 
>         if( !pthread_mutex.lock() )
>             PTHREAD_RETURN(EINTR);
> 
> However, that depends on the exact specification of the API call and
> whether it is permitted to return EINTR.

All this would require some time to investigate.

All this would require some time to investigate, and surely at a lower priority
than finding the cause of and fixing the races that have been observed so far
and get observed in the process.

sandeep



		
__________________________________ 
Do you Yahoo!? 
All your favorites on one personal page ? Try My Yahoo!
http://my.yahoo.com 

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

References:
- Re: [SMP]serious bug in synchronisation primitives
  - From: Nick Garnett

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]