This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro

From: Torvald Riegel <triegel at redhat dot com>
To: Alexandre Oliva <aoliva at redhat dot com>
Cc: libc-alpha at sourceware dot org, carlos at redhat dot com, mtk dot manpages at gmail dot com
Date: Wed, 27 Nov 2013 17:10:32 +0100
Subject: Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
Authentication-results: sourceware.org; auth=none
References: <20131113081059 dot 3464 dot 51385 dot stgit at frit dot home> <20131113081132 dot 3464 dot 30409 dot stgit at frit dot home> <1384859432 dot 32326 dot 364 dot camel at triegel dot csb> <orsiurva0g dot fsf at livre dot home> <1384956325 dot 3152 dot 591 dot camel at triegel dot csb> <orhab6t6m8 dot fsf at livre dot home> <1385051174 dot 3152 dot 1637 dot camel at triegel dot csb> <orvbzksz47 dot fsf at livre dot home> <1385409288 dot 3152 dot 3539 dot camel at triegel dot csb> <or8uwaom88 dot fsf at livre dot home>

On Tue, 2013-11-26 at 20:02 -0200, Alexandre Oliva wrote:
> On Nov 25, 2013, Torvald Riegel <triegel@redhat.com> wrote:
> 
> > On Fri, 2013-11-22 at 04:56 -0200, Alexandre Oliva wrote:
> 
> >> Yup.  My assignment was to audit our implementation and document
> >> thread-safety properties in it.
> 
> > That's why I asked whether you had also documented these cases, because
> > they are potentially or likely unsafe.
> 
> I don't see that they are.

I know that *you* don't, but you'll also know that we don't really know
what POSIX really intents MT-Safe to be :)  It's an open question, and
therefore it would have been useful if we had documented cases that we
might have to revisit again after clarification of the standard's
intent.

> Without synchronization, there's no
> happens-before; since it's read-only access to an atomic word, it's
> potentially stale without consequences.  That's my assessment, and
> that's what I based my decision on that this was not a safety issue.
> 
> >> > IOW, it would be preferable to give such cases an "incorrectsync" tag
> >> > or similar
> >> 
> >> For this situation, I'd use xguargs, although I wouldn't mark feof as
> >> MT-Unsafe, because, well, it really isn't; it's combining it with other
> >> calls in MT-Unsafe ways that would be, and one way to perform such
> >> combination is to perform this sort of unexpected inlining that LTO on a
> >> static glibc might end up performing.
> 
> > Can you elaborate?
> 
> I'm not sure what you want me to elaborate on,

I couldn't understand "it's combining it with other calls in MT-Unsafe
ways that would be".  (And I still can't understand that part when
looking at what you said below.)

> but I've detailed the
> issue in the proposed additional paragraphs for the MT-Safe definition
> that deal with lack of synchronization and inlining across library
> implementation barriers.  It's like the terminal-settings or
> temporary-signal issues: although the system calls that obtain and
> modify the terminal settings are atomic, calling them both in sequence
> makes room for another thread to change settings in between, and then
> the modified settings you write will override the other thread's change.
> Likewise, signal is atomic and returns the old handler, but if you
> restore it later with another call to signal, also atomic, you may
> overwrite the handler someone else installed.  Thus the composition of
> functions that are MT-Safe may fail to be safe itself.  Inlining may
> compose with that because additional reordering may take place that,
> without inlining, was guaranteed not to take happen.
> 
> >> > I wouldn't say that they are safe.  First, what non-inlining gives us is
> >> > that we get a (hopefully) high likelihood that the code that the
> >> > compiler will generate is similar to the code it would generate for an
> >> > atomic access with relaxed memory order.  Having this high likelihood
> >> > might be fine for now, but I think it would be better to change this to
> >> > atomic memory accesses eventually.
> >> 
> >> I agree.  But it doesn't follow from this that they are not safe as they
> >> are, does it?
> 
> Sorry, the âI agreeâ above was misleading.  I only agreed that it would
> be better to change this to atomic memory accesses.  Although, on second
> thought, I'm not even convinced this would be an improvementto the cases
> at hand, for it would force unnecessary synchronization.

If any of the data is modified concurrently, we must use atomic
accesses; otherwise, there is no guarantee that we won't read garbage
(and thus violate the code's preconditions).  Also, if it is likely (or
the case given current compilers etc.) that we won't read garbage in the
current implementation, then changing those accesses to
relaxed-memory-order atomic accesses will not lead to runtime overheads
because the loads itself are then really those you need.  I'm not aware
of any architecture that would use something special for relaxed-MO
atomic loads/stores that are not the same instructions used for normal
loads/stores (considering word-sized or smaller accesses, and just plain
stores no read-modify-write ops, of course).

> > Is "high likelihood" sufficient for you?
> 
> Given the code at hand, we have more than that; the compiler doesn't
> have much choice.  There are only so many ways of reading a single
> memory word

The compiler can reload from memory, for example, making this more than
one load.  It could decide to split a 64b load into two 32b loads.
There were GCC bugs in this area that were revealed by fuzz testing, so
I'm not making that up.  I haven't looked at the particular GCC cases,
so I can't comment on whether we have a problem obviously, but I
wouldn't dismiss it.

> , and whatever bits you select from it afterwards doesn't
> make a difference as to the assessment.

Widening a *single* load is typically fine (as long as it doesn't cross
page boundaries, for example), I agree.

> > Whether your argument holds or not depends on your definition of
> > MT-Safety.  I very much believe that a large number of people will
> > expect that MT-Safety means something along the lines of sequential
> > consistency
> 
> I haven't seen many people express that expectation;

ISO C and C++ both went for sequential consistency as default.  You can
see that in the atomics, and it's also the implicit goal by many in the
committee, AFAICT, for functions that are their equivalent of MT-Safe.

Another example is the recent POSIX clarification for the ordering
guarantees on condition variables, which requires a stronger guarantee
that is like sequential consistency.

The reason behind this is that many in the field consider guarantees
weaker than roughly sequential consistency to be just to hard to reason
about.

> yours was a first
> to me, and since upstream MT-Safe includes functions that provide for
> explicit interleaving

Which cases are you referring to explicitly?  And do you mean
interleaving within a particular MT-Safe call, or among the composition
of several MT-Safe calls?

> , as we discussed previously, I don't see how you
> can possibly hold on to this assumption.
> 
> Anyway, we're speaking of a case in which there's no happens-before,
> precisely because of the absence of user-initiated synchronization, so,
> accessing the âstaleâ data (whatever was there at the last
> synchronization point) or newer data (whatever any other threads might
> have written afterwards) both work for such trivial assessments as
> whether one or more bits of a single word are set or clear.  It's a
> single hw read, and even if there was locking in place to guarantee the
> returned value was based on the most recent global write, it could
> become stale the moment the lock was released, even before the return
> value reached the caller.

Establishing an inter-thread happens-before when there's a reads-from
relation is *not* about ensuring atomicity with subsequent decisions.
It is just about establishing ordering of other things that would have
happened *before* if one would execute all the steps made by all threads
in some sequential order (with interleaving allowed).

> > You seem to have based your MT-Safety properties on a weaker set of
> > guarantees.  But this isn't explained anywhere in the docs.
> 
> Neither is the POSIX-incompatible notion of MT-Safety you invented.

You don't explain this.  I don't explain what I would think MT-Safety
should be either, but I'm not patching documentation.  So it doesn't
matter what I don't do, or what POSIX doesn't do.  But it very much
matters whether our documentation has a sufficient definition or not!

> The properties that are present are the ones mentioned in the definition
> we and POSIX offer.

Sorry, the definition is not sufficient.  If it were, you wouldn't have
to resort to having to mention examples of what certain functions do
that are MT-Safe, but would instead just point to the definition.

> > Also, how should a user expect that feof doesn't have synchronization,
> > as you call it (what do you mean precisely?)?  It's marked MT-Safe.
> 
> sin() doesn't have synchronization either, and it's perfectly MT-Safe.
> Why should it synchronize with any other threads?

Your argument is backwards.  You say that there can be MT-Safe functions
that don't need shared-memory synchronization because they don't access
shared memory.  Fine.  But how does that imply that functions that use
shared memory don't need shared-memory synchronization?

> feof might want to synchronize, but nothing good would come out of it;
> because it's effectively atomic, synchronization barriers would just
> make it more expensive, without any advantage.

There can be advantage, because if it would be sequentially consistent,
it allows a programmer to reason about other things in the program
without having to add his own synchronization in there.

Thus, both not providing and do providing stronger ordering guarantees
can have an advantage.  Thus, it's a choice we make, and not the only
possible outcome.  If it's a choice, we need to document what we chose.

> > Is it only feof that uses nonatomic accesses and no other synchronizing
> > operations yet is marked MT-Safe, or do you remember other functions as
> > well?
> 
> I remember there were various other read-word-and-apply-bitmask macros
> and functions for which the same reasoning applied.

Okay.  Were these internal macros, or ones part of the external
interface?

> > But you don't give a complete definition of what it means to be safe;
> 
> I think I do, but it's not the POSIX-incompatible definition you're
> looking for.

I disagree.  See above.

> > thus, even if you are right, it's not something people can be sure to
> > understand it in the way you intended it.
> 
> That much is true.  But it's also hopeless.  If you, who're an expert in
> the field, keep on insisting on an assumption you've long held that's
> not only not backed by the standard, but that's contradicted by the
> standard, in spite of my recurring efforts to show that, how could I
> hope people who are NOT experts in the field will understand it
> precisely the way I mean it? :-(

Simply by expanding your definition and making it sufficient complete
until it can be understood without hand-waving or pointing to examples
of what function X or Y would be doing.  Make it self-contained, and
sufficient to explain what's really going on.  It probably should also
be not too conservative to be useless, although anything correct but
conservative would be good enough for 2.19 I suppose.

As an expert in the field (as you said ;), I strongly believe that your
definition (and POSIX's, but that's a separate question) is insufficient
and not clear.  If you're looking for other evidence supporting this
claim, have a look at the C11 and C++11 models, and see the difference,
especially in which aspects they consider in their models; even if we
ignore some of the arcane memory orders, for example, their models just
consider aspects that aren't covered in the POSIX definition, nor in
yours (like the details of the ordering aspects).  I think it's very
likely that C11 didn't want a model that's more complex than really
necessary, so it should also be unlikely that the additional complexity
is just bloat.

Whether I think that POSIX's definition should be different than the one
you seem to have in mind is a completely different topic.  You do also
seem to think that the standard is free of contradictions, which I
believe it is not in the area of MT-Safety and related definitions.
Certainly not free of differences between what's specified and the
actual intent, as the condvar clarification example shows.  Saying that
"function X is MT-Safe, but couldn't ever be MT-Safe under a strong
MT-Safe definition" doesn't necessarily imply that we need a weak
MT-Safe definition; it could also be that X just isn't MT-Safe, or not
without additional constraints.

> 
> >> > Maybe that's best, assuming that readers understand undefined behavior.
> >> > Perhaps we could add that it could have similar effects as a data race?
> 
> >> I think this might give the impression the potential damage is far more
> >> limited than it could be, because data races aren't quite as dangerous
> >> as destroying the universe ;-)
> 
> > No, they are as dangerous (ie, they are undefined behavior).
> 
> That's where looking under the hood can make a difference.  Not all
> forms of undefined behavior are equally dangerous.
> 
> A load from a single word in a non-inlinable function, even if
> technically a data race due to lack of synchronization, given the
> constraints provided by the function call barrier, can't possibly be
> more dangerous than having that single word load performed between a
> lock acquire and a release, or as an atomic load.  Or can it? (example,
> please)

First, there's less ordering established by the pure load (ie, relaxed
memory order) compared to the lock, but I guess you didn't want to
discuss that particular difference ;)

Beyond that, there are the risks I mentioned above: compiler reloading
the value (which it is allowed to do because it can assume that this is
sequential code), and the compiler loading the value in more than one
step.  The latter is unlikely for a single word I *guess*.  The
likelihood of the former depends on the other code in this function, I
suppose.

I don't precisely recall what the related GCC bugs were that got fixed.
I think there was one reloading case, and there was another with a
speculative store -- but I don't remember off the top of my head.

> >> The constraints are for cases in which you want to step *out* of the
> >> MT-Safety zone, as in, you can to call MT-Unsafe functions in MT
> >> programs.  In order to do that, you may have to take additional care.
> 
> > But then they are also, effectively, constraints on MT-Safe functions;
> 
> They MT-Safe functions can be called at will, as long as you don't call
> MT-Unsafe ones.  If the second part of my previous sentence is a
> constraint, well, then yes.  Otherwise, no, they're not constraints on
> MT-Safe functions.  They *become* constraints once you step out of the
> Safe perimeter, and get into the, let's say, extended safe perimeter.
> Then, and only then, do the annotated MT-Safe functions become unsafe to
> call if additional constraints aren't observed, because some other
> function that makes them unsafe is to be called.

I agree when splitting into "core safe" and "extended safe".  Maybe put
that into the docs?

> > if you need to have these requirements to make a generally MT-Unsafe
> > function MT-Safe under these requirements, then this is equivalent to a
> > constraint on an MT-Safe function, right?
> 
> Sorry, I can see it that way.

Good! ;)

> Within the MT-Safe perimeter (i.e., as
> long as you only call MT-Safe functions), they are safe.  Outside it,
> they aren't.  Just like nearly any other MT-Safe function!

I think splitting into core-safe (or, safe without constraints),
extended safe (safe under certain constraints), and unsafe (never safe)
would help people to understand the categories more easily.  (Or
whatever you want to call those.)  Then the context of the constraints
(or, requirements) also becomes clear.

> > Perhaps it would be more useful to explain the structure of your safety
> > model right at the top, instead of explaining it bitwise as one reads
> > along?  For example, say that we got generally Safe and Unsafe, and that
> > for some functions we consider generally Unsafe to be actually Safe
> > under some constraints, and that we'll discuss the constraints in detail
> > further down.  Etc.
> 
> I think there's still some faulty assumption in there, for this is not
> quite what I mean.
> 
>   There's Safe and Unsafe, full stop.  You need not read any further if
>   that's good enough for you.
> 
>   Now, if it's not good enough, in some cases you may be able to get
>   away calling some Unsafe functions, as long as the entire program
>   observes some specific constraints.  If you want that, read on.
> 
> To me that's a very reasonable way to describe it.  Bringing the
> exceptions into the rule turns it upside-down and makes it confusing.

That works too, because you split it into core-safe, extended-safe,
unsafe, and let readers know that extended-safe is with constraints.
That's what I was looking for.

Follow-Ups:
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva

References:
- [MTASCsft PATCH WIP5 00/33] MT-, AS- and AC-Safety docs
  - From: Alexandre Oliva
- [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]