Bug 9813 - pselect implementation (when not implemneted by the kernel) agriviates the race
: pselect implementation (when not implemneted by the kernel) agriviates the race
Status: NEW
Product: glibc
Classification: Unclassified
Component: libc
: unspecified
: P2 normal
: ---
Assigned To: Not yet assigned to anyone
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-02-04 10:05 UTC by Shachar Shemesh
Modified: 2012-12-19 10:41 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Proposed patch to narrow the race window (2.63 KB, patch)
2009-02-04 10:36 UTC, Shachar Shemesh
Details | Diff
Program demonstrating the problem (857 bytes, text/x-csrc)
2009-02-04 10:52 UTC, Shachar Shemesh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shachar Shemesh 2009-02-04 10:05:33 UTC
pselect is an operation that must be performed atomically. As such, the only
race free implementation is one done in the kernel. If the race exists, then it
is possible that "select" will hang until the timeout (or forever), because the
signal that the programmer thought would wake it up happened before "select" was
called. The glibc implementation is only as a stop gap for platforms where the
function is not defined, to encourage people to use it anyways, and is known not
to cover 100% of the cases.

That being said, the current pselect implementation makes the race condition
worse, almost guaranteeing that the race will take place.

The current implementation looks like this:
1: sigprocmask // Enable the signals
2: select // Perform the actual select
3: sigprocmask // Re-disable the signals

A typical use scenario would be:

4: while
5: pselect
6: if( signal happened ) ...
7: Do something not signal related
8: loop over the while

In the current implementation, any signal arriving after the sigprocmask in line
3, and before the "select" in line 2 is GUARANTEED to trigger the race
condition, as the signal will take effect as soon as the sigprocmask in line 1
takes place, necessarily before the select in line 2. This means the chances for
the race are directly proportional to the relative amount of time the program
spends doing something other than waiting on the select.

I am attaching a modified implementation of pselect that greatly reduces the
window in which the race can take effect, limiting it to only within the actual
pselect function.
Comment 1 Shachar Shemesh 2009-02-04 10:36:40 UTC
Created attachment 3710 [details]
Proposed patch to narrow the race window

Proposed patch to the problem
Comment 2 Shachar Shemesh 2009-02-04 10:42:25 UTC
Forgot to add - in the above patch, NSIG_LONGS is undefined. Here is its
definition:

// Number of __vals in sigset_t that actually contain useful data
#define NSIG_LONGS (_NSIG/(8*sizeof(((sigset_t *)NULL)->__val[0])))

Shachar
Comment 3 Shachar Shemesh 2009-02-04 10:52:12 UTC
Created attachment 3712 [details]
Program demonstrating the problem

This program demonstrate the problem. Under a kernel with pselect support, it
prints:
sig_happened=1
sig_happened=1
sig_happened=1
sig_happened=1
sig_happened=1

And exits almost immediately.
Comment 4 Michael Kerrisk 2012-02-19 22:06:49 UTC
Shachar, I suspect that it's not worth trying to make the fix you suggest. The
fix will only appear in modern glibc, and any modern system will have a
kernel-supported. The fundamental problem can't be remedied: the idea to add a
userspace implementation of pselect() was extremely muddleheaded, and worsens
portability problems for applications. The portability question goes from being
"do I have pselect() or not?" to "do I have a pselect() or not, and if I do, is
it one that works?"; the last part of the second question can only be verified
with a check of the kernel (and glibc) versions.