This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [BZ 5240][PATCH] Pthread hang where there are still waiters when mutex is in "unlocked" state.


On Fri, 2007-11-16 at 14:14 -0800, Ulrich Drepper wrote: 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Ryan S. Arnold wrote:
> > Any ideas where to go from here?
> 
> It's your platform.  I don't see any problems here.  You'll have to
> investigate it.  There is no special trick, look at the state (variable
> content) and think hard.

Hi Ulrich,
I thought hard and concur with Ryan that there still is a problem. The
sequence of events as described in the original bug report:

A: lll_lock(), lock value == 1
B: lll_timedlock(), didn't get the lock -> __lll_timedlock_wait(),
B: tries to exchange the lock value 1 with 2, this succeeds
B: lll_futex_timed_wait(), lock_value == 2, B goes to sleep
C: lll_lock(), didn't get the lock -> __lll_lock_wait()
C: lll_futex_wait(), lock_value == 2, C goes to sleep
A: lll_unlock(), lock value == 0, old value == 2 
A: lll_futex_wake(), B gets woken, lock value == 0
A: lll_lock(), lock value == 1
B: tries to exchange the lock value 0 with 2, this fails
B: exits ___lll_timedlock_wait() with ETIMEDOUT
A: lll_unlock(), lock value == 0, old value == 1, no wake up

but now C is still sleeping on the lock.

Another way to look at it: after B has changed the lock value from 1 to
2 the lock holder has the obligation to do a wake up call. After the
unlock this obligation gets transferred to the woken thread. The lock
value is 0 after the unlock but there might be other threads still
waiting for the lock and only the woken thread knows about this. That is
why __lll_lock_wait() does an compare-and-exchange that changes the lock
value from 0 to 2. The unlock following a successful compare and
exchange in __lll_lock_wait() will see the lock value of 2 and do the
wake up call. This is NOT the case for a __lll_timedlock_wait call if
the lock value is 1 after the lll_futex_timed_wait() sleep and the time
is up. The obligation to wake other thread that might still be sleeping
on the lock is lost.

I think that any code that does not have an explicit wakeup call in the
timeout path is broken. The above sequence can definitly hit s390 as
well. I've tried the testcase on a 64-bit machine:

Thread #0: trying to acquire lock.
Thread #0: locked, mutex value = 1
Thread #1: trying to acquire timedlock.
Thread #2: trying to acquire lock.
Thread #0: (still) locked, mutex value = 2
Thread #0: unlocked, mutex value = 0
Thread #0: locked (2), mutex value = 1
Thread #1: pthread_mutex_timerlock returned ETIMEDOUT, mutex value = 1
Thread #1 finished
Thread #0: unlocked (2), mutex value = 0
Thread #0 finished
Thread #2 has not been woken up for at least 10 seconds!
mutex value = 0

This is true for the old implementation of __lll_timedlock_wait and the
current one.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]