This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] Propose fix for race conditions in pthread cancellation (bz#12683)
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: "GNU C. Library" <libc-alpha at sourceware dot org>, Rich Felker <dalias at aerifal dot cx>
- Date: Wed, 10 Sep 2014 18:47:58 -0300
- Subject: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)
- Authentication-results: sourceware.org; auth=none
Hi all,
I have summarized in [1] the current issues with GLIBC pthread cancellation system,
the current GLIBC implementation and the proposed solution by Rich Felker with the
adjustment required to enabled it on GLIBC.
It is still heavily WIP and I'm still planning to add more content, so any question,
comments, advices are welcomed.
The GLIBC adjustment to proposed solution is in fact the current work I'm doing to
rewrite pthread cancellation subsystem [2]. My code still needs a *lot* of cleanup,
but initial results are promising. It is building on both powerpc64 and x86_64
(it won't build on others platforms basically because I rewrite the way cancelable
syscalls are done).
Current NPTL testcase are all passing but:
FAIL: nptl/tst-cancel-wrappers
FAIL: nptl/tst-cancel20
FAIL: nptl/tst-cancel21-static
FAIL: nptl/tst-cancel4
FAIL: nptl/tst-cancel5
FAIL: nptl/tst-cancelx20
FAIL: nptl/tst-cancelx21
FAIL: nptl/tst-cancelx4
FAIL: nptl/tst-cancelx5
FAIL: nptl/tst-detach1
The 'nptl/tst-cancel-wrappers' is expected since I get rid of the
enable_asynccancel/disable_asynccancel function, but the other are due the fact now
cancellation *will not* on one important case:
* syscall is blocked but with some side effects already having taken place (for
instance partial read/write/send/etc.)
This is the cases for tst-cancel[4/5] that checks for cancelable write and send
and the way the test is code, kernel IP address from signal handler is *after*
syscall, indicating partial read/send. Similar cases occurs for tst-cancel[20|21],
where the read returns after the syscall in pipe reading. I'm still checking
nptl/tst-detach1.
Anyway, now I would like comments about proposed solution and if the cases for
new failures should not be allowed or if testcases now should be adjusted.
I also note that this new implementation shows correct behavior on the testcases
from bug reported and replicated on bugzilla: first one does not show leaked
file descriptors and second correctly hangs.
[1] https://sourceware.org/glibc/wiki/Release/2.21/bz12683
[2] https://github.com/zatrazz/glibc/commits/master-bz12683