This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Cancellation investigations.
- To: Ulrich Drepper <drepper at cygnus dot com>
- Subject: Cancellation investigations.
- From: Kaz Kylheku <kaz at ashi dot footprints dot net>
- Date: Tue, 26 Sep 2000 13:40:53 -0700 (PDT)
- cc: libc-alpha at sources dot redhat dot com
I found a bug: someone :) moved the function
__pthread_set_own_extricate_if
into spinlock.h and removed the locking that it performs. The locking is,
unfortunately, necessary because it eliminates the following race condition
against pthread_cancel:
thread1 thread2
1. calls waiting operation
2. sets extricate pointer
3. calls pthread_cancel
4. gets thread's extricate pointer
5. finishes wait
6. sets extricate to null
7. returns from null
8. uses extricate pointer. *BOOM*
Since the extricate pointer refers to an auto object, we must ensure that
it's never used by thread2 after thread1 finishes.
I'm very sorry about not commenting this obscurity in the source code; it
really does look like the lock is not necessary!
Steps 3, 4 and 8 in thread2 are done under the lock; the original code acquired
the lock in 2. and 6 also. It seems okay for step 2 to avoid acquiring the
lock, but it's essential in 6 to delay the execution of the thread so that its
auto objects remain valid while in use by the other thread.
If locks are undesirable, something else needs to be invented to delay thread1
while someone is using its extricate struct. It's clear that most of the time,
there is no problem; eliminating a rare race condition should not impose a big
penalty on the frequent case. Some signalling arrangement with flags, relying
on memory coherency, could be worked out. Perhaps detecting that a delay is
needed and having the thread call suspend() in that case or something like
that.
Moving the extricate struct into non-local storage (like the thread descriptor)
won't help, because the race will then happen over the pointer to the object
being waited on. (e.g. pointer to the condition variable or what have you).
For now, I think the lock should just be put back, and a task related to
the optimization of this be added to a TODO list.