This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro

From: Alexandre Oliva <aoliva at redhat dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: libc-alpha at sourceware dot org, carlos at redhat dot com, mtk dot manpages at gmail dot com
Date: Fri, 22 Nov 2013 04:56:56 -0200
Subject: Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
Authentication-results: sourceware.org; auth=none
References: <20131113081059 dot 3464 dot 51385 dot stgit at frit dot home> <20131113081132 dot 3464 dot 30409 dot stgit at frit dot home> <1384859432 dot 32326 dot 364 dot camel at triegel dot csb> <orsiurva0g dot fsf at livre dot home> <1384956325 dot 3152 dot 591 dot camel at triegel dot csb> <orhab6t6m8 dot fsf at livre dot home> <1385051174 dot 3152 dot 1637 dot camel at triegel dot csb>

On Nov 21, 2013, Torvald Riegel <triegel@redhat.com> wrote:

> On Thu, 2013-11-21 at 08:02 -0200, Alexandre Oliva wrote:
>> On Nov 20, 2013, Torvald Riegel <triegel@redhat.com> wrote:
>> 
>> > What about linking statically and using LTO?

>> That's something I'd considered at some point, before concluding that a
>> lot of code in glibc just wasn't suited for this sort of use.

> But that would be a limitation of the current glibc implementation, and
> not an inherent property of any glibc implementation, right?

Yup.  My assignment was to audit our implementation and document
thread-safety properties in it.

> IOW, it would be preferable to give such cases an "incorrectsync" tag
> or similar

For this situation, I'd use xguargs, although I wouldn't mark feof as
MT-Unsafe, because, well, it really isn't; it's combining it with other
calls in MT-Unsafe ways that would be, and one way to perform such
combination is to perform this sort of unexpected inlining that LTO on a
static glibc might end up performing.

> Either case, I think we should state that data races can lead to
> undefined behavior

s/can lead to/are/ :-)

>> A number of very small functions (feof comes to mind) are safe in
>> spite of avoiding synchronization primitives because we assume they
>> won't be inlined.  What makes them safe currently is their single
>> access to a single word;

> I wouldn't say that they are safe.  First, what non-inlining gives us is
> that we get a (hopefully) high likelihood that the code that the
> compiler will generate is similar to the code it would generate for an
> atomic access with relaxed memory order.  Having this high likelihood
> might be fine for now, but I think it would be better to change this to
> atomic memory accesses eventually.

I agree.  But it doesn't follow from this that they are not safe as they
are, does it?  Now let's see whether the second argument accomplishes
that...

> Second, accesses with relaxed memory order are often not sufficient to
> produce sequential consistency.  Thus, on weak-memory-model
> architectures, we can't anymore guarantee equivalence of a
> multi-threaded execution of MT-Safe functions to some sequential
> execution of the same functions; we just don't get a total order that
> programmers can reason more easily about.

Ok, this gets closer.  So, my counter-argument, when it comes
specifically to feof, is that it's ok if we get outdated information
because, well, without synchronization, you risk getting outdated
information, and there's no standard-mandated synchronization in feof.
If you want to ensure you get up-to-date access, you perform additional
synchronization, just like you would if you wanted to ensure that two
operations on a stream are performed without intervening, conflicting
operations on the same stream.

So, I still stand by my analysis that it is safe.

>> if we were to inline it along with other calls that take
>> such shortcuts, this would no longer make for consistent executions,
>> because of the very reordering possibilities you bring up.

> Did you make a list of these cases?

'fraid not :-(

>> How about this:
>> 
>> ...  Calling them within such contexts invokes undefined behavior.
>> 
>> ?

> Maybe that's best, assuming that readers understand undefined behavior.
> Perhaps we could add that it could have similar effects as a data race?

I think this might give the impression the potential damage is far more
limited than it could be, because data races aren't quite as dangerous
as destroying the universe ;-) E.g., feof's access of stream status bits
without synchronization is technically a data race, but a Mostly
Harmless (TM) one :-) Calling an MT-Unsafe function, OTOH, could result
in crashes, corruption, and a lot more that isn't immediately associated
with data race!

> That is a useful summary; should it be made part of the documentation
> (in suitably rephrased form)?

I think it already is! :-( Now, I hope we can figure out what it was
that set you down a different path, in which you still are (see below),
and fix that...

>> Sorry, I don't follow.  Can you expand on what you have in mind?

> The documentation defines MT-Safe without mentioning that there are
> constraints.

Because there aren't.  Well, other than that of not calling MT-Unsafe
functions.

Anything else that you seem to have misunderstood as constraints on
general programs that apply even to MT-Safe functions are the result of
a misunderstanding: as long as you stick to MT-Safe functions (and
observe MT-Safety requirements on your own code, and don't otherwise
invoke undefined behavior), you're MT-Safe.

The constraints are for cases in which you want to step *out* of the
MT-Safety zone, as in, you can to call MT-Unsafe functions in MT
programs.  In order to do that, you may have to take additional care.

> I was speaking about "global" constraints because I incorrectly
> remembered "envromt" just being attached to those modifying the
> environment, not also to those environment-reading functions that are
> MT-Safe under the no-concurrent-env-modification constraint.

I'm pretty sure I'm responsible for this confusion myself; as the
definitions of the exceptions evolved, significant details changed.  For
example, right now, if you search for envromt in manual/* in the
lxoliva/thread-safety-docs branch, you (probably) won't find a
distinction between readers and writes, except for the fact that setters
are marked with uunguard.

I'm very close to deciding that these cases ought to be marked with
uunguard:<what>, and that the various cases of <what> (for uunguard and
any other notes that may have a counterpart) should be defined in a
separate itemization in the manual, i.e., not under:

  When a function is safe to call only under certain constraints, we
  will add keywords to the safety notes whose meanings are defined as
  follows:

nor

  Additional safety issues that cannot be worked around by constraining
  the program (other than by refraining from calling the affected
  functions) are also documented with keywords, whose meanings are
  defined as follows:

but under a different section headed:

  Additional keywords may be attached to functions, indicating features
  that do not make a function unsafe to call, but that may need to taken
  into account in certain classes programs:

I'd move (and adjust) glocale, envromt, memleak and fdleak under the
paragraph above.  I'd also move the itemization under the âsafe to call
only underâ after the âadditional safety issuesâ, and rewrite their
headers as follows:

  Functions that are unsafe to call in certain contexts are annotated
  with keywords that document their features that make them unsafe to
  call:
  @item staticbuf ...
  @item asi18n ...
  ...

and

  For some features that make functions unsafe to call in certain
  contexts, there are known ways to avoid the safety problem other than
  refraining from calling the function altogether.  The keywords that
  follow refer to such features, and each of their definitions indicate
  how the whole program needs to be constrained in order to remove the
  safety problem indicated by the keyword.  Only when all the reasons
  that make a function unsafe are observed and addressed by applying the
  documented constraints does the function become safe to call in a
  context.
  @item uunguard ...
  @item xguargs...
  ...

> This might show that I didn't pay enough attention or that I easily
> forget things; OTOH, it could also indicate that it would be useful to
> have a better explanation of how your annotations work :)

How's the above?

> That sounds alright to me.  For every function marked this way, is it
> documented which the "affected objects" are, and obvious which function
> can be writer?

No, but it became clear to me in this thread that they should be.  So,
I'm going to extend the definition of uunguard to indicate that, when it
is followed by a :<what>, the <what> may be documented in the new
itemization I mentioned above, or just that the <what> identifies the
group of functions that all operate on the same object, so that they all
need to be guarded by the same mutex.  Come to think of it, staticbuf
could get this sort of annotation as well, and then it could be promoted
to the group that can be worked around.  Or those cases in which it
could should become uunguard.  Or something like that ;-)

>> > If that's the case, why do you have uunguard at all, given that it's
>> > just the negation of MT-Safe (or, be equal to MT-Unsafe)?

>> It documents a *reason* for a so-marked function to be MT-Unsafe.

> But isn't it listed under MT-Safe constraints?  That is, this list:

> +When a function is safe to call only under certain constraints, we will
> +add keywords to the safety notes whose meanings are defined as follows:

I could tell that this sentence and my misuse of some notes set you down
the wrong path, of assuming that the constraints would apply even for
MT-Safe functions.  I hope the wording and rearrangement I propose in
this email clarifies that.

>> It's not a constraint; it's a (mis)feature that can be avoided by
>> constraining the program in certain ways.  In this case, by explicitly
>> guarding all potential readers and writers.

> Will the readers get an annotation as well?

Yeah.

> But see below; maybe it's just that you listed all of them under in one
> list about MT-Safe constraints...

Yeah, that was a big part of the problem.

But not that it's Safety constraints, not MT-Safe constraints.  Not all
keywords have to do with MT-safety issues; some are AS- and/or AC-safety
issues.  I'm still trying to figure out what drove you down this wrong
path of narrowing the keywords in that section to MT-Safety; any ideas?

>> > What I'm saying is that the definition of MT-Safe at the very top should
>> > link to the listing of constraints.

>> Hmm, is this a result of the disconnect/misunderstanding we've detected?
>> I don't think linking the definition of MT-Safe to misfeatures that may
>> cause functions to be MT-Unsafe makes much sense.

> I think it does, because otherwise readers can read just the definition
> of MT-Safe and aren't aware that it comes with constraints.

I tink we've now clarified that they don't come with constraints, right?
If so, is there anything left to be done WRT the suggestion upthread,
quoted here with â>> >â?

> These reasons also make sense, but I believe they can be worked around
> (at least as far as just glibc is concerned, ignoring the rest of the
> application and whether it wants to handle signals (by establishing
> process-wide consensus on whether some signals are to be handled or not
> using suitable synchronization).

I'll excuse myself from this theoretical/speculative part of the
conversation.  Feel free to propose improvements to functions marked
with tempsig stimer that would make them safer to call if you can find
out ways to accomplish that ;-) Just let me know, so that I can adjust
the documentation patchset, or adjust it yourself, should the patch be
already in.

> Well, maybe we should just use a more descriptive name then, so that it
> matches the definition? :)

How do you refer to a piece of code that fails to release a lock it
should have released?  Other than buggy, I mean ;-) lacksunlock? :-)

> I can't follow this reasoning.  Screen space isn't scarce, is it?  This
> is complex stuff, so reading the terms is really the smallest part of
> the costs.  I wouldn't mind at all being at least a little bit verbose
> here.  AFAIU, Joseph would also like to see keywords not optimized for
> size.

I've responded to this in response to his email, to avoid getting us
talking past each other about this in two subthreads.  Let's please
carry on this part of the conversation in that subthread from now on,
shall we?

Thanks for your help in debugging the docs! :-)

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer

Follow-Ups:
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel

References:
- [MTASCsft PATCH WIP5 00/33] MT-, AS- and AC-Safety docs
  - From: Alexandre Oliva
- [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]