Sources Bugzilla – Bug 9813
pselect implementation (when not implemneted by the kernel) agriviates the race
Last modified: 2012-12-19 10:41:58 UTC
pselect is an operation that must be performed atomically. As such, the only race free implementation is one done in the kernel. If the race exists, then it is possible that "select" will hang until the timeout (or forever), because the signal that the programmer thought would wake it up happened before "select" was called. The glibc implementation is only as a stop gap for platforms where the function is not defined, to encourage people to use it anyways, and is known not to cover 100% of the cases. That being said, the current pselect implementation makes the race condition worse, almost guaranteeing that the race will take place. The current implementation looks like this: 1: sigprocmask // Enable the signals 2: select // Perform the actual select 3: sigprocmask // Re-disable the signals A typical use scenario would be: 4: while 5: pselect 6: if( signal happened ) ... 7: Do something not signal related 8: loop over the while In the current implementation, any signal arriving after the sigprocmask in line 3, and before the "select" in line 2 is GUARANTEED to trigger the race condition, as the signal will take effect as soon as the sigprocmask in line 1 takes place, necessarily before the select in line 2. This means the chances for the race are directly proportional to the relative amount of time the program spends doing something other than waiting on the select. I am attaching a modified implementation of pselect that greatly reduces the window in which the race can take effect, limiting it to only within the actual pselect function.
Created attachment 3710 [details] Proposed patch to narrow the race window Proposed patch to the problem
Forgot to add - in the above patch, NSIG_LONGS is undefined. Here is its definition: // Number of __vals in sigset_t that actually contain useful data #define NSIG_LONGS (_NSIG/(8*sizeof(((sigset_t *)NULL)->__val[0]))) Shachar
Created attachment 3712 [details] Program demonstrating the problem This program demonstrate the problem. Under a kernel with pselect support, it prints: sig_happened=1 sig_happened=1 sig_happened=1 sig_happened=1 sig_happened=1 And exits almost immediately.
Shachar, I suspect that it's not worth trying to make the fix you suggest. The fix will only appear in modern glibc, and any modern system will have a kernel-supported. The fundamental problem can't be remedied: the idea to add a userspace implementation of pselect() was extremely muddleheaded, and worsens portability problems for applications. The portability question goes from being "do I have pselect() or not?" to "do I have a pselect() or not, and if I do, is it one that works?"; the last part of the second question can only be verified with a check of the kernel (and glibc) versions.