This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Remove signal handling for nanosleep (bug 16364)

From: Florian Weimer <fweimer at redhat dot com>
To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
Cc: libc-alpha at sourceware dot org
Date: Mon, 9 Nov 2015 18:18:52 +0100
Subject: Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
Authentication-results: sourceware.org; auth=none
References: <1447023171-31542-1-git-send-email-adhemerval dot zanella at linaro dot com> <56405109 dot 9070404 at redhat dot com> <56408F14 dot 1040600 at linaro dot org> <56409561 dot 7050707 at redhat dot com> <5640A89D dot 80804 at linaro dot org> <5640AD98 dot 5030105 at redhat dot com> <5640CA33 dot 1040608 at linaro dot org>

On 11/09/2015 05:30 PM, Adhemerval Zanella wrote:

> I do not think this check is strictly required: the other failures (EFAULT and
> EINVAL) should also indicate something very flaky on the system.

We should always check for unexpected errors.  We might have been
overlooking something.

>> The test is racy in two ways: the child could exit before nanosleep in
>> the parent starts, or the child exit could be delayed after nanosleep in
>> the parent ends.  I'm not sure if there is way to make this more reliable.
> 
> I am aware of that, that's why I try to mitigate this by making the time on
> parent twice for the child and some magnitude higher than the syscalls itself.
> However I also can't see a way to make this test entirely reliable: even by
> doing some synchronization (either by shared semaphores or pthread barrier)
> and/or sending the SIGCHLD directly using signal the two race scenarios you 
> describe will still have a small window to occur.  I am not sure which 
> strategy will be better and I think we should not rely or add hacks to try
> to mitigate for such kernel failures.

I think you could make it more likely to hit the window if you forked
several child processes.

>> The larger question is whether the EINTR check is sufficient, or if a
>> time-based check is needed as well.  That is, if the kernel bug
>> consistent of silent early termination of nanosleep.
> 
> My understanding is on old kernels the nanosleep calls was not restarted
> in a nanosleep call (with the restart_syscall), so it nanosleep will
> early terminate.

To be sure, you should check how much time has elapsed in the nanosleep
calls, with clock_gettime(CLOCK_MONOTONIC).  This would cover both the
-1/EINTR case and the return value 0 case.  The existing code does not
look at EINTR, so if there was a bug, it was the return value 0 scenario.

Florian

Follow-Ups:
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella

References:
- [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]