This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Remove signal handling for nanosleep (bug 16364)


On 11/09/2015 05:30 PM, Adhemerval Zanella wrote:

> I do not think this check is strictly required: the other failures (EFAULT and
> EINVAL) should also indicate something very flaky on the system.

We should always check for unexpected errors.  We might have been
overlooking something.

>> The test is racy in two ways: the child could exit before nanosleep in
>> the parent starts, or the child exit could be delayed after nanosleep in
>> the parent ends.  I'm not sure if there is way to make this more reliable.
> 
> I am aware of that, that's why I try to mitigate this by making the time on
> parent twice for the child and some magnitude higher than the syscalls itself.
> However I also can't see a way to make this test entirely reliable: even by
> doing some synchronization (either by shared semaphores or pthread barrier)
> and/or sending the SIGCHLD directly using signal the two race scenarios you 
> describe will still have a small window to occur.  I am not sure which 
> strategy will be better and I think we should not rely or add hacks to try
> to mitigate for such kernel failures.

I think you could make it more likely to hit the window if you forked
several child processes.

>> The larger question is whether the EINTR check is sufficient, or if a
>> time-based check is needed as well.  That is, if the kernel bug
>> consistent of silent early termination of nanosleep.
> 
> My understanding is on old kernels the nanosleep calls was not restarted
> in a nanosleep call (with the restart_syscall), so it nanosleep will
> early terminate.

To be sure, you should check how much time has elapsed in the nanosleep
calls, with clock_gettime(CLOCK_MONOTONIC).  This would cover both the
-1/EINTR case and the return value 0 case.  The existing code does not
look at EINTR, so if there was a bug, it was the return value 0 scenario.

Florian



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]