This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.

From: Torvald Riegel <triegel at redhat dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: GLIBC Devel <libc-alpha at sourceware dot org>
Date: Tue, 09 Dec 2014 11:16:00 +0100
Subject: Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.
Authentication-results: sourceware.org; auth=none
References: <1417804668 dot 22797 dot 108 dot camel at triegel dot csb> <1417805577 dot 25868 dot 4 dot camel at triegel dot csb> <20141206135040 dot GA16212 at domone> <1418038997 dot 25868 dot 34 dot camel at triegel dot csb> <20141208222857 dot GB13499 at domone>

On Mon, 2014-12-08 at 23:28 +0100, OndÅej BÃlka wrote:
> On Mon, Dec 08, 2014 at 12:43:17PM +0100, Torvald Riegel wrote:
> > On Sat, 2014-12-06 at 14:50 +0100, OndÅej BÃlka wrote:
> > > On Fri, Dec 05, 2014 at 07:52:57PM +0100, Torvald Riegel wrote:
> > > >    alarm (1);
> > > >  
> > > >    int res = sem_wait (&s);
> > > > -  if (res == 0)
> > > > -    {
> > > > -      puts ("wait succeeded");
> > > > -      return 1;
> > > > -    }
> > > > -  if (res != -1 || errno != EINTR)
> > > > +  /* We accept all allowed behavior: Implementations that return EINTR and
> > > > +     those that rely on correct interruption through signals to use sem_post
> > > > +     in the signal handler.  */
> > > > +  if (res != 0 && !(res == -1 && errno == EINTR))
> > > >      {
> > > > -      puts ("wait didn't fail with EINTR");
> > > > +      puts ("wait neiter succeeded nor failed with EINTR");
> > > >        return 1;
> > > >      }
> > > >  
> > > 
> > > That does change test logic as it originally failed when wait succeeded.
> > 
> > Yes, but why do you think that this is inconsistent?  The previous test
> > didn't add a token in the signal handler, so if wait succeeded, then the
> > test should fail.
> > 
> > However, the correct way to interrupt the semaphore with a signal is to
> > add a token.  My patch does that.  Second, if we do not want to make
> > timing assumptions (which the existing test would do if we add a token
> > to the semaphore in the signal handler), then we need to accept that the
> > (first) signal handler execution might happen before sem_wait actually
> > executes.  Therefore, we must not fail in this case.
> > 
> > We have to correctly interrupt with signals because as the futex
> > documentation stands (allowing return of EINTR on spurious wake-ups),
> > there's no way to implement the specific behavior of the existing
> > implementation (which assumes EINTR is returned *only* on signals).
> > 
> > IOW, the existing test does white-box testing with timing assumptions;
> > with this patch, we do make a slightly different black-box test with no
> > timing assumptions.
> 
> Which misses point of test which is to find bugs.

Please read the POSIX spec.  It allows both outcomes, and without timing
assumptions etc., we can't drive executions towards just one of the
outcomes.

> What if in new fooarch 
> assembly one forget to return -1 on interrupt which breaks user application 
> which will assume that semaphore is locked,

I can strengthen the test on res==0, checking whether there is no token
left.  I don't think it buys us much though.

Another option would be to disallow failure; however, this only works if
sem_assume_only_signals_cause_futex_EINTR remains 0 and not set to a
different value by a distribution (see Patch 3/3).

> one can deal with rare false 
> positives in tests.

I think you'd be creating false negatives with what you have in mind.
The false positive would be not failing when 0 is returned, incorrectly,
ie your example.

False negatives are a pain.  You can deal with them of course, but it
stands in the way of doing continuous integration and such.  what we de
facto do is just ignore all tests that we know can have false negatives,
which doesn't make the test useful at all.

> One does not need justify this fact by forward progress/fairness
> assumptions. Just assumption that OS which keeps all CPU idle for one
> second while there is suitable task is simply broken.

Well, that *is* a timing assumption.  There is nothing broken about this
in general.  Remember that we do have tests failing now and then just
because of that.  Which is awful for testing.

> It would be better
> to add serialization so no other test runs in parallel with this.

You can't prevent other load on the machine in general.

Also, please review the actual background for this change.  See patch
3/3 for that.  The change to the test is not ideal, but please see the
trade-offs.

Follow-Ups:
- Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.
  - From: OndÅej BÃlka

References:
- [PATCH 0/3] Fix semaphore destruction (BZ #12674)
  - From: Torvald Riegel
- [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.
  - From: Torvald Riegel
- Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.
  - From: OndÅej BÃlka
- Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.
  - From: Torvald Riegel
- Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6.
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]