This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Linux nanosleep investigation.


I have been doing some digging in the nanosleep system call and the
timespec <-> jiffies conversion functions.

I have the following findings to report:

(1) timespec_to_jiffies performs rounding up to the jiffy

(2) If HZ is not a divisor of one billion, then the timespec value
    { 0, 999999999 } leads to a larger jiffies value than { 1, 0 }.
    For example if HZ is 1024, than { 0, 999999999 } converts to 1025
    jiffies, whereas { 1, 0 } converts to 1024 jiffies.

(3) Converting from jiffy to timespec and back to jiffy recovers the
    original jiffy value. This is good!

(4) If the nanosleep calls the scheduler and wakes up in the same time tick
    period, than the remaining time does not decrease (or is
    even inflated by the rounding). 

(5) The sys_nanosleep system call adds one jiffy to any non-zero timeout
    value. See the line which reads: 

	expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);

    This one extra jiffy contributes to the repeated inflation of the remaining
    time reported by Kevin Hendricks in his problem report. That's Because the
    remaining time is computed from this adjusted expire time.

Out of these findings, (4) is the obvious cause of our problems.  There is no
way to fix it, and hence the naive __libc_nanosleep(&rem, &rem); algorithm
cannot possibly work. (Why didn't I see this obvious fact before?) It is an
inherent problem in all relative waits against a quantized clock: short waits
look like they are zero length waits. Our sampling of the jiffies value fails
to catch transitions; this aliasing makes it look like time is standing still,
or moving very slowly.

If a thread is flooded with signals while waiting in pthread_cond_timedwait,
the only way it will eventually return is if it catches enough clock tick
transitions while executing the nanosleep system call. That's assuming
that (5) is fixed: (5) means that when the thread is flooded with signal
wakeups, it keeps incrementing the remaining time by 1 jiffy on each call.

My conclusion: until Linus provides us with a sys_nanosleep_abs() call, we must
call gettimeofday(&now) before each nanosleep.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]