This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro

From: Alexandre Oliva <aoliva at redhat dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: libc-alpha at sourceware dot org, carlos at redhat dot com, mtk dot manpages at gmail dot com
Date: Tue, 26 Nov 2013 20:02:31 -0200
Subject: Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
Authentication-results: sourceware.org; auth=none
References: <20131113081059 dot 3464 dot 51385 dot stgit at frit dot home> <20131113081132 dot 3464 dot 30409 dot stgit at frit dot home> <1384859432 dot 32326 dot 364 dot camel at triegel dot csb> <orsiurva0g dot fsf at livre dot home> <1384956325 dot 3152 dot 591 dot camel at triegel dot csb> <orhab6t6m8 dot fsf at livre dot home> <1385051174 dot 3152 dot 1637 dot camel at triegel dot csb> <orvbzksz47 dot fsf at livre dot home> <1385409288 dot 3152 dot 3539 dot camel at triegel dot csb>

On Nov 25, 2013, Torvald Riegel <triegel@redhat.com> wrote:

> On Fri, 2013-11-22 at 04:56 -0200, Alexandre Oliva wrote:

>> Yup.  My assignment was to audit our implementation and document
>> thread-safety properties in it.

> That's why I asked whether you had also documented these cases, because
> they are potentially or likely unsafe.

I don't see that they are.  Without synchronization, there's no
happens-before; since it's read-only access to an atomic word, it's
potentially stale without consequences.  That's my assessment, and
that's what I based my decision on that this was not a safety issue.

>> > IOW, it would be preferable to give such cases an "incorrectsync" tag
>> > or similar
>> 
>> For this situation, I'd use xguargs, although I wouldn't mark feof as
>> MT-Unsafe, because, well, it really isn't; it's combining it with other
>> calls in MT-Unsafe ways that would be, and one way to perform such
>> combination is to perform this sort of unexpected inlining that LTO on a
>> static glibc might end up performing.

> Can you elaborate?

I'm not sure what you want me to elaborate on, but I've detailed the
issue in the proposed additional paragraphs for the MT-Safe definition
that deal with lack of synchronization and inlining across library
implementation barriers.  It's like the terminal-settings or
temporary-signal issues: although the system calls that obtain and
modify the terminal settings are atomic, calling them both in sequence
makes room for another thread to change settings in between, and then
the modified settings you write will override the other thread's change.
Likewise, signal is atomic and returns the old handler, but if you
restore it later with another call to signal, also atomic, you may
overwrite the handler someone else installed.  Thus the composition of
functions that are MT-Safe may fail to be safe itself.  Inlining may
compose with that because additional reordering may take place that,
without inlining, was guaranteed not to take happen.

>> > I wouldn't say that they are safe.  First, what non-inlining gives us is
>> > that we get a (hopefully) high likelihood that the code that the
>> > compiler will generate is similar to the code it would generate for an
>> > atomic access with relaxed memory order.  Having this high likelihood
>> > might be fine for now, but I think it would be better to change this to
>> > atomic memory accesses eventually.
>> 
>> I agree.  But it doesn't follow from this that they are not safe as they
>> are, does it?

Sorry, the âI agreeâ above was misleading.  I only agreed that it would
be better to change this to atomic memory accesses.  Although, on second
thought, I'm not even convinced this would be an improvementto the cases
at hand, for it would force unnecessary synchronization.

> Is "high likelihood" sufficient for you?

Given the code at hand, we have more than that; the compiler doesn't
have much choice.  There are only so many ways of reading a single
memory word, and whatever bits you select from it afterwards doesn't
make a difference as to the assessment.

> Whether your argument holds or not depends on your definition of
> MT-Safety.  I very much believe that a large number of people will
> expect that MT-Safety means something along the lines of sequential
> consistency

I haven't seen many people express that expectation; yours was a first
to me, and since upstream MT-Safe includes functions that provide for
explicit interleaving, as we discussed previously, I don't see how you
can possibly hold on to this assumption.

Anyway, we're speaking of a case in which there's no happens-before,
precisely because of the absence of user-initiated synchronization, so,
accessing the âstaleâ data (whatever was there at the last
synchronization point) or newer data (whatever any other threads might
have written afterwards) both work for such trivial assessments as
whether one or more bits of a single word are set or clear.  It's a
single hw read, and even if there was locking in place to guarantee the
returned value was based on the most recent global write, it could
become stale the moment the lock was released, even before the return
value reached the caller.

> You seem to have based your MT-Safety properties on a weaker set of
> guarantees.  But this isn't explained anywhere in the docs.

Neither is the POSIX-incompatible notion of MT-Safety you invented.

The properties that are present are the ones mentioned in the definition
we and POSIX offer.

> Also, how should a user expect that feof doesn't have synchronization,
> as you call it (what do you mean precisely?)?  It's marked MT-Safe.

sin() doesn't have synchronization either, and it's perfectly MT-Safe.
Why should it synchronize with any other threads?

feof might want to synchronize, but nothing good would come out of it;
because it's effectively atomic, synchronization barriers would just
make it more expensive, without any advantage.

> Is it only feof that uses nonatomic accesses and no other synchronizing
> operations yet is marked MT-Safe, or do you remember other functions as
> well?

I remember there were various other read-word-and-apply-bitmask macros
and functions for which the same reasoning applied.

> But you don't give a complete definition of what it means to be safe;

I think I do, but it's not the POSIX-incompatible definition you're
looking for.

> thus, even if you are right, it's not something people can be sure to
> understand it in the way you intended it.

That much is true.  But it's also hopeless.  If you, who're an expert in
the field, keep on insisting on an assumption you've long held that's
not only not backed by the standard, but that's contradicted by the
standard, in spite of my recurring efforts to show that, how could I
hope people who are NOT experts in the field will understand it
precisely the way I mean it? :-(

> Hmm :(  Do you have a rough recollection of which subsystems this
> happened in?  Or any other estimate of how widespread this problem might
> be?

If I had to document this, I'd grep for bitwise operators in macros
definitions and then for uses of these macros.  This would even catch
feof_unlocked and ferror_unlocked, both defined as macros that I
documented as MT-Safe with a note about the safe bitwise uses.  To my
surprise, feof and ferror both perform locking!

Apologies about misremembering where the pitfall was; we can now carry
on the discussion with s/feof/feof_unlocked/, debating whether the
explicitly-unlocked function is MT-Safe.

>> > Maybe that's best, assuming that readers understand undefined behavior.
>> > Perhaps we could add that it could have similar effects as a data race?

>> I think this might give the impression the potential damage is far more
>> limited than it could be, because data races aren't quite as dangerous
>> as destroying the universe ;-)

> No, they are as dangerous (ie, they are undefined behavior).

That's where looking under the hood can make a difference.  Not all
forms of undefined behavior are equally dangerous.

A load from a single word in a non-inlinable function, even if
technically a data race due to lack of synchronization, given the
constraints provided by the function call barrier, can't possibly be
more dangerous than having that single word load performed between a
lock acquire and a release, or as an atomic load.  Or can it? (example,
please)

>> The constraints are for cases in which you want to step *out* of the
>> MT-Safety zone, as in, you can to call MT-Unsafe functions in MT
>> programs.  In order to do that, you may have to take additional care.

> But then they are also, effectively, constraints on MT-Safe functions;

They MT-Safe functions can be called at will, as long as you don't call
MT-Unsafe ones.  If the second part of my previous sentence is a
constraint, well, then yes.  Otherwise, no, they're not constraints on
MT-Safe functions.  They *become* constraints once you step out of the
Safe perimeter, and get into the, let's say, extended safe perimeter.
Then, and only then, do the annotated MT-Safe functions become unsafe to
call if additional constraints aren't observed, because some other
function that makes them unsafe is to be called.

> if you need to have these requirements to make a generally MT-Unsafe
> function MT-Safe under these requirements, then this is equivalent to a
> constraint on an MT-Safe function, right?

Sorry, I can see it that way.  Within the MT-Safe perimeter (i.e., as
long as you only call MT-Safe functions), they are safe.  Outside it,
they aren't.  Just like nearly any other MT-Safe function!

> Perhaps it would be more useful to explain the structure of your safety
> model right at the top, instead of explaining it bitwise as one reads
> along?  For example, say that we got generally Safe and Unsafe, and that
> for some functions we consider generally Unsafe to be actually Safe
> under some constraints, and that we'll discuss the constraints in detail
> further down.  Etc.

I think there's still some faulty assumption in there, for this is not
quite what I mean.

  There's Safe and Unsafe, full stop.  You need not read any further if
  that's good enough for you.

  Now, if it's not good enough, in some cases you may be able to get
  away calling some Unsafe functions, as long as the entire program
  observes some specific constraints.  If you want that, read on.

To me that's a very reasonable way to describe it.  Bringing the
exceptions into the rule turns it upside-down and makes it confusing.

> Just considering MT-Safety without an intended loss of generality.
> Sorry if that made it look like confusion on my side...

Ah, good!

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer

Follow-Ups:
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel

References:
- [MTASCsft PATCH WIP5 00/33] MT-, AS- and AC-Safety docs
  - From: Alexandre Oliva
- [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]