This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Why to signal condvar with mutex held?


Nick Garnett <nickg@ecoscentric.com> writes:
> Sergei Organov <osv@topconrd.ru> writes:
> 
> > Hi,
> > 
> > Browsing eCos sources and documentation, I've observed that condvars are
> > usually being signalled with associated mutex locked. However, as far as
> > I understand, it is sub-optimal from the point of view of execution time
> > as it could lead to additional context switches. Consider the following
> > scenario:
> > 
> > 1. thread1 signals condvar.
> > 2. thread2 wakes up and preempts thread1 (context switch 1) only to see
> >    that the mutex is locked.
> > 3. thread2 goes to wait on the locked mutex, thread1 is resumed (context
> >    switch 2).
> > 4. thread1 unlocks mutex.
> > 5. thread2 wakes up and preempts thread1 (context switch 3).
> > 
> > Thus, we end up with 3 context switches instead of 1.
> > 
> > So there are the questions:
> > 
> > 1. Does this problem actually exist, or am I missing something [1]?
> 
> This can happen in some circumstances.
> 
> I guess that "wait morphing" is where the thread is taken off the
> condvar thread queue and inserted on the mutex thread queue and its
> state changed, without being woken up. I considered this during the
> design phase and decided against it. This would be the only instance
> where such a thing was needed, and putting intimate details of the
> threads state into the condvar code would have made some configuration
> options more difficult to handle.

Well, as I neither understood what "intimate details of the thread state"
besides those the condvar already knows are required to implement wait
morphing, nor I really understood your later comments in this thread:

> The way the C++ classes interact also means that the inner state of
> the thread is just not visible to the condition variable
> class. Opening up this information would potentially lead to losing
> the abstraction and structure of the system.

, I decided to take an exercise to implement the wait morphing
optimization myself to actually see the troubles are you talking about.

To my surprise the implementation basically took only changes in the
mutex.cxx file and didn't require exposition of additional thread state
to the condvar code. In fact, the only changes to other files I've made
were making Cyg_Thread::set_sleep_reason() and Cyg_Thread::clear_timer()
routines non-static. My best attempt to see why those two (and in fact a
few more) were made static in the first place didn't reveal anything
significant, so if calling these routines for the thread when another
thread is current and scheduler is being locked may bring some troubles,
please let me know.

While I can't claim 100% bug-free implementation without more thorough
testing and review, now I'm almost sure an implementation of wait
morphing that doesn't break the design of the eCos does exist.

But first, I've prepared an addition to the kernel/tm_basic.cxx test to
test condvars behavior (the condvars test is somehow missed there). I've
specially crafted the test to focus on the worst-case behavior of the
condvars, and here are the (unfortunate) results:

                                 Confidence
     Ave     Min     Max     Var  Ave  Min  Function
  ======  ======  ======  ====== ========== ========
    5.87    5.00    6.00    0.23   87%   12%  Signal cond [no waiters]
    6.00    6.00    6.00    0.00  100%  100%  Broadcast cond [no waiters]
   64.32   64.00   65.00    0.44   67%   67%  Signal/wait cond [mutex unlocked]
  163.97  163.00  164.00    0.06   96%    3%  Signal/wait cond [mutex locked]
   64.42   64.00   65.00    0.49   58%   58%  Broadcast/wait cond [mutex unlocked,  1 waiter]
  164.42  164.00  165.00    0.49   58%   58%  Broadcast/wait cond [mutex   locked,  1 waiter]
  271.32  271.00  272.00    0.44   67%   67%  Broadcast/wait cond [mutex unlocked, 28 waiters]
 1361.13 1361.00 1362.00    0.22   87%   87%  Broadcast/wait cond [mutex locked, 28 waiters]

Signal/wait and broadcast/wait are times from the moment low-priority
thread signals/broadcasts the condvar to the moment waiting
high-priority thread returns from the condvar wait function. Please
notice how much is the difference between [mutex unlocked] and [mutex
locked] variants of the same operation. These cases differ only in the
relative order of mutex unlock and condvar signal/broadcast operations.
The last two strings correspond to the test where besides those
high-priority thread there are a few medium-priority threads waiting on
the same condvar. The results show that there is definitely an
opportunity for improvements.

After the implementation of wait morphing, the following times using the
same test case and the same hardware have been observed:

                                 Confidence
     Ave     Min     Max     Var  Ave  Min  Function
  ======  ======  ======  ====== ========== ========
    6.00    6.00    6.00    0.00  100%  100%  Signal cond [no waiters]
    6.61    6.00    7.00    0.47   61%   38%  Broadcast cond [no waiters]
   60.74   60.00   61.00    0.38   74%   25%  Signal/wait cond [mutex unlocked]
  101.65  101.00  102.00    0.46   64%   35%  Signal/wait cond [mutex locked]
   62.13   62.00   63.00    0.22   87%   87%  Broadcast/wait cond [mutex unlocked,  1 waiter]
  103.74  103.00  104.00    0.38   74%   25%  Broadcast/wait cond [mutex   locked,  1 waiter]
  277.84  277.00  278.00    0.27   83%   16%  Broadcast/wait cond [mutex unlocked, 28 waiters]
  286.71  286.00  287.00    0.41   70%   29%  Broadcast/wait cond [mutex   locked, 28 waiters]


IMHO, the improvement in the worst case times is amazing while none
of the rest of times degrade significantly. Though I still think that
initial decision against wait morphing was probably right at the early
times of eCos development as premature optimization is indeed evil, I
believe the decision made could now be re-considered.

-- 
Sergei.


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]