This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
- From: Florian Weimer <fweimer at redhat dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: libc-alpha at sourceware dot org
- Date: Mon, 9 Nov 2015 18:18:52 +0100
- Subject: Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
- Authentication-results: sourceware.org; auth=none
- References: <1447023171-31542-1-git-send-email-adhemerval dot zanella at linaro dot com> <56405109 dot 9070404 at redhat dot com> <56408F14 dot 1040600 at linaro dot org> <56409561 dot 7050707 at redhat dot com> <5640A89D dot 80804 at linaro dot org> <5640AD98 dot 5030105 at redhat dot com> <5640CA33 dot 1040608 at linaro dot org>
On 11/09/2015 05:30 PM, Adhemerval Zanella wrote:
> I do not think this check is strictly required: the other failures (EFAULT and
> EINVAL) should also indicate something very flaky on the system.
We should always check for unexpected errors. We might have been
overlooking something.
>> The test is racy in two ways: the child could exit before nanosleep in
>> the parent starts, or the child exit could be delayed after nanosleep in
>> the parent ends. I'm not sure if there is way to make this more reliable.
>
> I am aware of that, that's why I try to mitigate this by making the time on
> parent twice for the child and some magnitude higher than the syscalls itself.
> However I also can't see a way to make this test entirely reliable: even by
> doing some synchronization (either by shared semaphores or pthread barrier)
> and/or sending the SIGCHLD directly using signal the two race scenarios you
> describe will still have a small window to occur. I am not sure which
> strategy will be better and I think we should not rely or add hacks to try
> to mitigate for such kernel failures.
I think you could make it more likely to hit the window if you forked
several child processes.
>> The larger question is whether the EINTR check is sufficient, or if a
>> time-based check is needed as well. That is, if the kernel bug
>> consistent of silent early termination of nanosleep.
>
> My understanding is on old kernels the nanosleep calls was not restarted
> in a nanosleep call (with the restart_syscall), so it nanosleep will
> early terminate.
To be sure, you should check how much time has elapsed in the nanosleep
calls, with clock_gettime(CLOCK_MONOTONIC). This would cover both the
-1/EINTR case and the return value 0 case. The existing code does not
look at EINTR, so if there was a bug, it was the return value 0 scenario.
Florian