This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro

From: Torvald Riegel <triegel at redhat dot com>
To: Alexandre Oliva <aoliva at redhat dot com>
Cc: libc-alpha at sourceware dot org, carlos at redhat dot com, mtk dot manpages at gmail dot com
Date: Thu, 28 Nov 2013 17:37:56 +0100
Subject: Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
Authentication-results: sourceware.org; auth=none
References: <20131113081059 dot 3464 dot 51385 dot stgit at frit dot home> <20131113081132 dot 3464 dot 30409 dot stgit at frit dot home> <1384859432 dot 32326 dot 364 dot camel at triegel dot csb> <orsiurva0g dot fsf at livre dot home> <1384956325 dot 3152 dot 591 dot camel at triegel dot csb> <orhab6t6m8 dot fsf at livre dot home> <1385051174 dot 3152 dot 1637 dot camel at triegel dot csb> <orvbzksz47 dot fsf at livre dot home> <1385409288 dot 3152 dot 3539 dot camel at triegel dot csb> <or8uwaom88 dot fsf at livre dot home> <1385568632 dot 3152 dot 6871 dot camel at triegel dot csb> <orsiuhn0xl dot fsf at livre dot home>

On Wed, 2013-11-27 at 16:40 -0200, Alexandre Oliva wrote:
> On Nov 27, 2013, Torvald Riegel <triegel@redhat.com> wrote:
> 
> > I know that *you* don't, but you'll also know that we don't really know
> > what POSIX really intents MT-Safe to be :)
> 
> I know that you don't, but I believe I have a very clear picture in my
> mind.

I think I do understand the options, and the differences between the
options.  That I think that the definitions out there are not clear
enough doesn't mean that my understanding is muddled (or, to be precise,
my understanding of my understanding, and so on...).

I also don't claim that the picture *in your mind* isn't clear.  But I
haven't see a *definition* from you that would reflect that clarity.

> I've just reread all the relevant parts of POSIX, and I didn't
> see any hint of a conflict with my understanding

It's not unlikely than an incomplete definition wouldn't be in conflict
with some other more precise definition.  If you don't cover a certain
aspect, or leave it vague, you're not creating a conflict with something
more detailed.  But that doesn't mean that the vague thing is as precise
as the more detailed definition it doesn't conflict with.

> that POSIX means
> MT-Safe as funcitons that are safe to call in the presence of other
> threads,

Where is "safe" defined?  If this definition exists, is it complete?
(If the rest of the sentence is supposed to be the definition of "safe",
then see below.)

> i.e., that will behave as documented regardless of what other
> threads are doing

Our functions have sequential specifications, that is, there are
preconditions and postconditions involving program state, arguments,
return value.  IOW, a function is specified as a mapping from the state
before to the state after its execution.  (If there are exceptions to
this, let's discuss them separately.  Given that your patch adds MT-Safe
docs in the first place, I doubt there is a significant number of
functions that have a concurrent specification.)  Thus, the behavior is
documented as one-step state transitions.

With that in mind, to reason about whether a function's execution
behaves as documented, we thus need to compare against possible one-step
state transitions (that the function specs we have...).  This reasoning
includes having to choose possible program states (so that we can check
which preconditions apply to each state and whether the postcondition
holds).  We can't pick some intermediate states produced during the
execution of another concurrent (MT-Safe) function because those
intermediate states are simply not defined by the function specs; if we
would allow any imaginary intermediate state even though undefined,
every MT-Safe function could effectively randomly pick from each of it's
allowed effects.  This is hardly useful; thus, we need to reason about
states right before or after execution of functions, not intermediate
states.

Once we do that, we have a couple of options regarding what to consider
to "behave as documented" (think about function executions as edges,
states as vertices, and how to possibly connect them):
1) Equivalence to some sequential execution: The allowed behavior /
states are exactly those produced by some sequential execution of
functions.  If an execution consists of a multi-set of function calls,
the allowed behavior is some order of those functions (whether or not
this order is per-thread doesn't matter that much right now).  (We
likely don't want to introduce arbitrary "phantom" calls that weren't
initiated in the program, so we restrict it to the calls ran by the
program.)
2) Drop effects of certain functions (ie, edges going into some dead,
never used nor observable state).  Lost updates are an example.
3) Have two or more edges (/function calls) start at one common state,
and produce some specially combined state.  If the outcome is not
representable by a sequential execution (i.e., as in 1) ), we either
have a new state that might not be covered/considered by the sequential
specification, or we have effectively added a custom state transition
that applies when we have several functions concurrently in flight.  The
latter won't be covered by a sequential specification, in particular not
one that so far didn't have any docs about its behavior under
concurrency.

2) and 3) are thus not really practical, and certainly not the common
case.  (BTW, having 1) to 3) already shows that what "safe" means isn't
as clear-cut as it may seem.)
In any case, both need extra documentation beyond the sequential
specification.  Thus, if we start out with just a sequential
specification, we need to reason about correctness by looking at what
would be allowed by a sequential execution.  In turn, this means that
MT-Safe would need to be *virtually atomic* (in the sense that MT-Safe
functions will create states that could be created by some sequential
execution).

Note that we're not considering any ordering constraints on the
equivalent sequential execution yet.  In particular, (a) whether
per-thread program order constrains the candidates for the sequential
execution that we compare against, and (b) whether the equivalent
sequential execution can be different in each thread.

Regarding (a), we have the choice whether a thread executing one
function X after another function Y should be able to rely on X being
ordered after Y in the equivalent sequential execution.  Not giving this
guarantee certainly makes it harder for programmers to reason about a
program (e.g., having to consider that your thread's program order might
be shuffled around just because another MT-Safe function executed
concurrently is hard..).

I don't want to discuss (b) in too much detail, but there again we have
a choice: We can make sure that two threads agree on how parts of the
equivalent sequential execution look once some information bleeds from
one of the threads to the other, or we can ignore that to some extent.
If we ignore it, than we create difficulties for the programmer
reasoning about allowed behavior because we get effects similar to 2)
and 3) above (imagine a graph; the difference to what's described above
is that we have per-thread sequences that we need to merge, or where
some sequences' effects are dropped).

Obviously, this is somewhat simplified, and still incomplete.  But it
should show that we have several aspects for which we have a choice with
more than one reasonable option.  It's not my primary goal right now to
argue which choice is better, but to start with showing that there *is*
a choice that we have to make.  If we find such choice when exploring
what "safe" actually means, then it should be clear that "safe" is not
sufficiently defined -- otherwise, the definition would make clear how
to choose.

> as long as they all refrain from invoking undefined
> behavior.

Invoking undefined behavior is anything MT-Unsafe, and anything
violating the preconditions of the sequential specifications?  Or
something else?

> This is perfectly sufficient as a user-visible definition.

No, see above.  There are choices we have to make and reflect in the
definition.

> It doesn't
> bring with it any inter-thread ordering requirements;

That's related to choice (b) above, but also choice (a) I believe
(indirectly in the form of coherency rules).  Even if you ignore both,
and implicitly assume the lack of coverage of these choices in the
definition of "safe" to mean that there is no inter-thread ordering
whatsoever, there are still choices your definition doesn't make.

> those arise from
> the user-visible synchronization primitives only, which no functions are
> required to use internally, for the simple fact that no function is
> required to perform synchronization.

If they don't access shared state, they are obviously not required to do
that.  If they do, they will do in practice because then several threads
access the same state.  If you don't want to read or produce garbage (in
the sense of states beyond what's specified in the sequential spec)
while doing that, you need to synchronize in some way; even if it's
single-word state, you need to synchronize to prevent the compiler from
making the assumption that this is sequential code (and thus reload, or
do any other optimization only safe in sequential code).

If you mean something else by synchronization, then please define it.  I
already asked for a proper definition in a prior email.

> That possibility is  explicitly
> stated as conditional: when necessary.
> 
> The internal library implementation may need to be aware of the
> underlying hardware and language memory models, and the application may
> want to take advantage of additional flexibility provided by them, but
> those are outside the scope of POSIX, and of the definition of whether
> POSIX-defined interfaces are MT-Safe.

If POSIX doesn't want to rely on a language's memory model (which would
be unwise in the long term IMHO), fine.  But then it needs to define
it's own memory model precisely.  Just compare what C11 does or Java
does, with what POSIX says.  You will notice that the former are much
more detailed.  Do you think that's just bloat?

> I refuse to repeat the discussion we've had before.  I proved that some
> of your now-repeated statements were false, but you insist on them, and
> you don't provide any arguments to support your position that
> contradicts POSIX explicit statements.

Pardon?

> From that, I conclude it's
> pointless for me to spend any more time on it.

Yeah, maybe it is...

> >> Sorry, the âI agreeâ above was misleading.  I only agreed that it would
> >> be better to change this to atomic memory accesses.  Although, on second
> >> thought, I'm not even convinced this would be an improvementto the cases
> >> at hand, for it would force unnecessary synchronization.
> 
> > If any of the data is modified concurrently, we must use atomic
> > accesses; otherwise, there is no guarantee that we won't read garbage
> 
> I'm not familiar with any hw memory model in which an unsynchronized
> single-word read could actually read garbage, rather than a value that
> may be outdated.

First, this is about compilers too, not just the HW.  The only
counter-example that comes to my mind right now is self-modifying code
on x86.  But me and you not knowing about counter-examples is not a
guarantee...

> > The compiler can reload from memory,
> 
> It can't.  It's only one access.

It can.  Why could it not?  In the absence of synchronizing
instructions, it can assume sequential code.  If it can figure aliasing
out and the assumed sequential code doesn't write to the memory location
between the load and the reload, it can assume that nobody else modified
it, so reload from memory.

It couldn't reload if it were a volatile.  It couldn't if it were asm
code (e.g., like in atomic_forced_read()).  And it couldn't if it were
an explicit atomic access, of course.  But you didn't mention either, so
I assume we're discussing normal C/C++ code here.

> > It could decide to split a 64b load into two 32b loads.
> 
> It can't, it's a single word load.

Why not (unless volatile/asm)?  The compiler only has to make sure that
the outcome is equivalent to what the abstract machine would do.  Sure
it can do the split if that's supported

> > There were GCC bugs in this area that were revealed by fuzz testing, so
> > I'm not making that up.
> 
> That would be a compiler bug.  Compiler bugs can cause misbehavior.

Those bugs were cases in which the compiler did stuff that's perfectly
fine in sequential code.  And you were speaking about sequential code
(from the compiler's POV).

> But
> in these cases, it's the compiler that's wrong, not the application (or
> the library) that it miscompiles.  Under the language model it ought to
> implement, aligned-word-single-read is safe.
> 
> [snipped discussion whose repetition I refuse to take part of]
> 
> >> sin() doesn't have synchronization either, and it's perfectly MT-Safe.
> >> Why should it synchronize with any other threads?
> 
> > Your argument is backwards.  You say that there can be MT-Safe functions
> > that don't need shared-memory synchronization because they don't access
> > shared memory.  Fine.  But how does that imply that functions that use
> > shared memory don't need shared-memory synchronization?
> 
> It doesn't.  A single counter-example is enough to disprove what
> purports to be a general statement.

Exactly, it doesn't.  So *your* claim that generally, MT-Safe functions
don't need to synchronize is wrong.

> What begs proving is whether POSIX mandates synchronization for any
> case.  It offers synchronization operations and mutexes as examples of
> what an implementation might internally use to achieve safety, but the
> only interfaces for which synchronization is mandatory are the ones used
> for synchronization (mutexes, locks and condition variables)

In the choices I outlined above, you can certainly make the choice that
once more than one thread accesses the same virtually shared state, they
essentially "fork" the shared state into per-thread state, and don't
synchronize on it ever more.  If that's your mental model, make sure the
definition says that.

(I think it's no surprise if I find such a guarantee to not be really
useful.)

> 
> >> I remember there were various other read-word-and-apply-bitmask macros
> >> and functions for which the same reasoning applied.
> 
> > Okay.  Were these internal macros, or ones part of the external
> > interface?
> 
> I can't say they were all internal, but there aren't very many macros
> that are part of the external interface.  Those that come to mind are
> the non-wide ctype macros, that access read-only memory, and the file
> type macros for stat structs, that would require user synchronization of
> the stat structs in the unlikely case of inter-thread use.
> 
> > As an expert in the field
> 
> You know what?  Maybe you're an expert in concurrency and memory models,
> and that twists your view of MT-Safe under the looks-like-a-nail
> principle.  This possibility appears to fit the obsevable behavior.

You do know what memory models are built for, don't you?  They define
allowed behavior under multi-threaded executions.  Why would MT-Safe not
be a subset of this topic?

> > I agree when splitting into "core safe" and "extended safe".  Maybe put
> > that into the docs?
> 
> I don't want people to get the wrong idea that there's an extended
> safe.  That only complicates the model.  See, you got that nonsense out
> of earlier misphrasing, and see how hard it is to take that out of your
> mind now!

Sorry, that's no nonsense to me.

References:
- [MTASCsft PATCH WIP5 00/33] MT-, AS- and AC-Safety docs
  - From: Alexandre Oliva
- [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]