This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
- From: Torvald Riegel <triegel at redhat dot com>
- To: Alexandre Oliva <aoliva at redhat dot com>
- Cc: libc-alpha at sourceware dot org, carlos at redhat dot com, mtk dot manpages at gmail dot com
- Date: Tue, 19 Nov 2013 12:10:32 +0100
- Subject: Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
- Authentication-results: sourceware.org; auth=none
- References: <20131113081059 dot 3464 dot 51385 dot stgit at frit dot home> <20131113081132 dot 3464 dot 30409 dot stgit at frit dot home>
On Wed, 2013-11-13 at 06:11 -0200, Alexandre Oliva wrote:
> for ChangeLog
>
> * manual/macros.texi: Introduce macros to document multi
> thread, asynchronous signal and asynchronous cancellation
> safety properties.
> * manual/intro.texi: Introduce the properties themselves.
> ---
> manual/intro.texi | 513 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> manual/macros.texi | 42 ++++
> 2 files changed, 555 insertions(+)
>
> diff --git a/manual/intro.texi b/manual/intro.texi
> index deaf089..569f696 100644
> --- a/manual/intro.texi
> +++ b/manual/intro.texi
> @@ -172,6 +172,519 @@ pattern matching facilities (@pxref{Pattern Matching}).
> @comment <wordexp.h> (not yet implemented)
> @comment confstr
>
> +The safety properties of @glibcadj{} functions, documented as MT-, AS-
> +and AC- -Safe and -Unsafe are assessed according to the criteria set
> +forth in the POSIX standard for Multi-Thread, Async-Signal and
> +Async-Cancel safety.
> +
> +Intuitive definition of these properties, that attempt to capture
> +the meaning of the standard definitions, follow:
Spelling: I believe that you should either drop the (first) comma(s), or
use "which" instead of "that".
> +
> +@itemize @bullet
> +
> +@item
> +@cindex MT-Safe
> +MT-Safe functions are safe to call in the presence of other threads. MT
> +stands for Multi Thread.
> +
> +@item
> +@cindex AS-Safe
> +AS-Safe functions are safe to call from asynchronous signal handlers.
> +AS stands for Asynchronous Signal.
> +
> +@item
> +@cindex AC-Safe
> +AC-Safe functions are safe to call when asynchronous cancellation is
> +enabled. AC stands for Asynchronous Cancellation.
> +
> +@item
> +@cindex MT-Unsafe
> +@cindex AS-Unsafe
> +@cindex AC-Unsafe
> +MT-Unsafe, AS-Unsafe, AC-Unsafe functions are not safe to call within
> +the contexts described above: they may cause deviations from the
> +specification in the behavior of the calls themselves, or of any other
> +concurrent, ongoing or subsequent calls.
If a program calls an MT-Unsafe function and thereby creates a data
race, then this data race can manifest in undefined behavior even
*before* the abstract machine would make the MT-Unsafe call. This can
happen because the compiler can assume that programs are data-race-free,
and use certain optimizations that wouldn't be allowed on (properly
synchronizing) MT-Safe versions of the called code. For example, if
there is sequential code after the call but depending on its outcome,
then the compiler could speculative and move it before the MT-Unsafe
call, which could cause a crash before the call actually happens.
Thus, I think it's worth pointing out that it's not just in subsequent
calls, even though this might not happen with most glibc functions as
they aren't inlined.
> +Functions not explicitly documented as Safe should be regarded as
> +Unsafe.
> +
> +@end itemize
> +
> +By ``safe to call'', we mean that, as long as the program does not
> +invoke undefined or unspecified behavior, the called functions will
> +behave as documented, and they won't cause any other functions to
> +deviate from their documented behavior.
> +
> +Although we strive to abide by the standards, in some cases our
> +implementation is safe even when the standard does not demand safety,
> +and in other cases our implementation does not meet the standard safety
> +requirements. At this point, we document the result of an assessment of
> +the properties of our implementation, so the safety documentation in
> +this manual is not to be regarded as a promise of future behavior: in
> +future releases, functions that are documented as safe may become
> +unsafe, and safety constraints may be removed or introduced. We
> +envision turning the results of the assessment into a set of promises as
> +stable as our interfaces, but we're not there yet.
> +
> +When a function is safe to call only under certain constraints, we will
> +add keywords to the safety notes whose meanings are defined as follows:
> +
> +@itemize @bullet
> +
> +@c glocale-revisit
> +@item @code{glocale}
> +@cindex glocale
> +
> +In threads that have not overridden the thread-local locale object by
> +calling @code{uselocale}, calling functions annotated with
> +@code{glocale} concurrently with @code{setlocale} may cause the
> +functions to behave in ways that don't correspond to either the previous
> +or the subsequent global locale.
> +
> +Although the @code{setlocale} function modifies the global locale object
> +while holding a lock, @code{glocale}-annotated functions may access this
> +global object multiple times, without any measures to ensure it doesn't
> +change while it's in use.
> +
> +Each of these unprotected uses will use either the previous or the
> +subsequent locale information, so they won't cause crashes or access to
> +uninitialized, unmapped or recycled memory. However, since some cases
> +use cached locale information while others access the effective locale
> +object anew, concurrent changes to the global locale object may cause
> +these functions to behave in ways that they could not behave should the
> +execution of @code{setlocale} and of the so-annotated functions be
> +atomic, or even should @code{setlocale} alone be atomic.
Can you break up this sentence?
> +The @code{glocale} constraint indicates functions are only safe to call
> +if the effective thread-local locale is not the global locale object
> +(because it was overridden with @code{uselocale}). Failing that,
> +@code{setlocale} should not be called while these functions are active.
> +
> +
> +@item @code{envromt}
> +@cindex envromt
> +
> +Functions marked with @code{envromt} access the environment with
> +@code{getenv} or similar, requiring the environment to be effectively
> +read-only for MT-Safe operation.
> +
> +Environment-modifying functions do not protect in any way against
> +concurrent modifications or access, so calling @code{envromt}-marked
> +functions concurrently with @code{setenv}, @code{putenv},
> +@code{unsetenv} or direct modifications of the global environment data
> +structures is ill-advised; external concurrency control must be
> +introduced by callers of these environment-modifying and
> +@code{envromt}-marked functions.
> +
> +Functions that modify the environment are also marked with
> +@code{envromt}, but they are not MT-Safe for the reasons above. Since
> +all environment-modifying functions are MT-Unsafe, functions that only
> +access the environment are marked as MT-Safe when no other safety issue
> +applies.
I know you spelled the exception out here, but I'm wondering whether
everyone will remember this when seeing MT-Safe somewhere. Why didn't
you use a different MT-Safe tag (eg, MT-Safe-Env) for those functions,
but instead merged them with the "true" MT-Safe?
> +
> +@item @code{oncesafe}
> +@cindex oncesafe
> +
> +Functions marked with @code{oncesafe} use the internal @code{libc_once}
> +machinery or similar to initialize internal data structures.
> +
> +If a signal handler interrupts such an initializer, and calls any
> +function that also performs @code{libc_once} initialization, it will
> +deadlock if the thread library is linked in.
If "oncesafe" marks cases where a deadlock can happen, shouldn't it be
called "onceunsafe" or something like this instead? "oncesafe" sounds
like "safe even if libc_once" to me, not like "safe unless libc_once".
> +Furthermore, if an initializer is partially complete before it is
> +canceled or interrupted by a signal whose handler requires the same
> +initialization, some or all of the initialization may be performed more
> +than once, leaking resources or even result in corrupt internal data
> +structures.
> +
> +Applications that need to call functions marked with @code{oncesafe}
> +should ensure the initialization is performed before configuring signal
> +handlers or enabling cancellation, so that the AS- and AC-Safety issues
> +related with @code{libc_once} do not arise.
> +
> +
> +@item @code{1stcall}
> +@cindex 1stcall
> +
> +Functions marked with @code{1stcall} perform thread-unsafe
> +initialization when they are first called.
> +
> +In order to prevent this thread-unsafe initialization from rendering the
> +functions unsafe to call in multi-threaded programs, such functions
> +should be called at least once in single-threaded mode, i.e., before
> +additional threads are started.
> +
> +If @code{1stcall} is the only reason for a function to be marked as
> +MT-Unsafe, and the function is first called in single-threaded mode,
> +then it becomes safe to call after other threads are started.
I agree with Carlos on this (ie, it sounds as if it's better to be
listed as an exception to MT-Safe; example: MT-Safe | oncesafe |
1stcall; with the definition above, you need to add that there could be
other restrictions additionally).
> +
> +@item @code{uunguard}
> +@cindex uunguard
> +
> +Functions marked with @code{uunguard} modify non-atomically arguments or
> +global objects that other functions access without synchronization. To
> +ensure MT- and AS-Safe behavior, callers should refrain from calling
> +so-marked functions concurrently with readers of those objects. A
> +consequence of regarding modifiers of these objects as unsafe is that
> +the covered objects can be regarded as constant (subject to the
> +observation of safety constraints), so that all readers can be
> +considered safe in this regard.
AFAIU, uunguard are functions that if called, override the safety
properties of MT-Safe or AS-Safe functions? If so, I think this needs
to be said more explicitly. For example:
Functions marked with @code{uunguard} modify arguments or global objects
that other, potentially MT-Safe or AS-Safe, functions access without
synchronization. Thus, calls to so-marked functions potentially make
all concurrent calls to otherwise MT-Safe or AS-Safe functions MT-Unsafe
and/or AS-Unsafe.
(We can drop nonatomically here because it's not a sufficient
characterization ("concurrent" is better because it would be defined
over happens-before orders, which matter here).)
There should also be a link in the MT-Safe and AS-Safe paragraphs to
this paragraph here, because both depend on this exception (ie, it's
only MT-Safe if no uunguard is called concurrently).
> +
> +Unguarded users of the global locale object modified by @code{setlocale}
> +are marked with @code{glocale}.
> +
> +Unguarded users of the @code{printf} extension objects modified by
> +@code{register_printf_function} are the entire family of printf
> +functions.
> +
> +Unguarded users of file streams configured with @code{__fsetlocking} for
> +locking by the caller are the entire family of stdio functions.
> +
> +
> +@item @code{xguargs}
> +@cindex xguargs
> +
> +Functions marked with @code{xguargs} may use or modify objects passed
> +(indirectly) as arguments, without any guards to guarantee consistency.
> +To ensure MT- and AS-Safe behavior, callers must ensure that the objects
> +passed in are not accessed or modified concurrently by other threads or
> +signal handlers.
> +
> +This mark is only applied with regard to an object when the
> +@code{uunguard} mark is not applied because of the same object, and the
> +object is opaque or not intended to be modified by code outside of
> +@theglibc{}.
Wouldn't you need to apply the mark only if the object is not opaque and
might be considered to be fine to be modified outside of glibc? That
is, the inverse of what you wrote?
When specifying this more formally, I believe we need to be careful to
specify the "lifetime" of this constraint (eg, is it just for the
duration of this call?).
> The rationale is that, for such an object, there could be
> +a reasonable (but unsatisfied) expectation that the library would take
> +care of synchronization to modify the object.
> +
> +Strings, structs and other object types whose members are meant to be
> +modified by users are @emph{not} marked with @code{xguargs}.
> +User-initiated inspection and modification are already constrained by
> +the standard synchronization requirements, plus any other type- or
> +object-specific additional requirements, such as user-initiated locking
> +of streams configured to avoid internal locking. (File streams happen
> +to be opaque types, but the principle stands.) Users must satisfy these
> +requirements regardless of whether modifications are coded by users or
> +the library, so the mark would be redundant. The mark, when present,
> +clarifies that users remain responsible for satisfying any
> +synchronization requirements when calling the marked function, because
> +the library will not take care of them on its own.
> +
> +
> +@item @code{tempchwd}
> +@cindex tempchwd
> +
> +Functions marked with @code{tempchwd} may temporarily change the current
> +working directory during their execution, which may cause relative
> +pathnames to be resolved in unexpected ways in other threads or within
> +asynchronous signal or cancellation handlers.
> +
> +This is not enough of a reason to mark so-marked functions as MT-Unsafe,
> +but when this behavior is optional (e.g., @code{nftw} with
> +@code{FTW_CHDIR}), avoiding the option in multi-threaded programs may be
> +a good alternative to using full pathnames or file descriptor-relative
> +(e.g. @code{openat}) system calls.
> +
> +
> +@item @code{tempsig}
> +@cindex tempsig
> +
> +Functions marked with @code{tempsig} may temporarily install signal
> +handlers for internal purposes, which may interfere with other uses of
> +those signals.
> +
> +This makes such functions MT-Unsafe and AS-Unsafe to begin with.
Why does it make such functions MT-Unsafe in every case?
> If
> +this note appears as an AC-Safety issue, however, the problem is more
> +complex: the temporarily-installed signal handler may remain in place if
> +the thread is cancelled. Safety for all these situations can be
> +achieved by refraining from using the specific signals while calling the
> +function.
> +
> +
> +@item @code{tempterm}
> +@cindex tempterm
> +
> +Functions marked with @code{tempterm} may temporarily change the
> +terminal settings.
> +
> +This would not be enough of a reason to mark so-marked functions as
> +MT-Unsafe, but the recommended mode to modify terminal settings is to
> +call @code{tcgetattr}, modify some flags, and then call
> +@code{tcsetattr}. Functions marked with @code{tempterm} do that, so
> +they leave a window in which changes made by other threads are lost.
> +
> +It is thus advisable for applications using the terminal to avoid
> +concurrent interactions with it, more so if they expect different
> +terminal modes.
> +
> +If this mark appears as an AC-Safety note, it means the function may
> +also fail to restore the original terminal mode in case of asynchronous
> +cancellation.
> +
> +
> +@item @code{stimer}
> +@cindex stimer
> +
> +Functions marked with @code{stimer} use the @code{alarm} function or
> +similar to set a time-out for a system call or a long-running operation.
> +In a multi-threaded program, there is a risk that the time-out signal
> +will be delivered to a different thread, thus failing to interrupt the
> +intended thread.
> +
> +Keeping @code{SIGALARM} blocked on all threads, and refraining from
> +calling concurrenly functions that may set up such timers, is the only
> +way to avoid the safety issue at hand.
> +
> +
> +@end itemize
> +
> +
> +Additional safety issues that cannot be worked around by constraining
> +the program (other than by refraining from calling the affected
> +functions) are also documented with keywords, whose meaning is defined
> +as follows:
> +
> +@itemize @bullet
> +
> +@item @code{staticbuf}
> +@cindex staticbuf
> +
> +Functions annotated with @code{staticbuf} use internal objects in ways
> +that may cause concurrent calls to interfere destructively. Internal,
> +here, does not mean the objects are not exposed to callers, just that
> +they are not supplied by callers (contrast with the @code{uunguard} and
> +@code{xguargs} keywords).
> +
> +These functions are all MT-Unsafe and AS-Unsafe. However, many of them
> +offer reentrant variants for MT-Safe and, in some cases, AS-Safe use.
> +
> +In many of these cases, the static buffer is only used to hold a return
> +value; in a few of these, such as @code{tmpnam}, the use of the internal
> +buffer can be avoided by passing the buffer as an argument, which makes
> +the call MT-Safe and AS-Safe.
> +
> +
> +@item @code{asi18n}
> +@cindex asi18n
> +
> +Functions marked with @code{asi18n} use internationalization functions
> +(@code{gettext}), which brings in a number of dependencies and issues
> +yet to be documented.
> +
> +
> +@item @code{shlimb}
> +@cindex shlimb
> +
> +Functions marked with @code{shlimb} use the dynamic loader to bring in
> +additional code modules. This involves opening files, mapping them into
> +memory, allocating additional memory, resolving symbols, applying
> +relocations and more, all of this while holding the dynamic loader
> +lock.
> +
> +The non-recursive lock itself is enough for the function to be AS- and
> +AC-Unsafe, but many other issues may arise.
> +
> +
> +@item @code{fdleak}
> +@cindex fdleak
> +
> +Functions annotated with @code{fdleak} may leak file descriptors if
> +asynchronous thread cancellation interrupts their execution.
> +
> +Functions that allocate or deallocate file descriptors will generally be
> +marked as such, because even if they attempted to protect the file
> +descriptor allocation and deallocation with cleanup regions, allocating
> +a new descriptor and storing its number where the cleanup region could
> +release it cannot be performed as a single atomic operation, just like
> +releasing it and taking it out of the data structure normally
> +responsible for releasing it cannot be performed atomically, always
> +leaving a window in which the descriptor cannot be released because it
> +wasn't stored in the cleanup handler argument yet, or in which it was
> +already taken out of it before releasing it in the normal flow (we
> +cannot keep it there because, in case of cancellation, we wouldn't be
> +able to tell whether it was already released, and the same number could
> +have been already assigned to another descriptor by another thread, so
> +we couldn't just release it again).
> +
> +Such leaks could be internally avoided, with some performance penalty,
> +by temporarily disabling asynchronous thread cancellation. However,
> +since callers of allocation or deallocation functions would have to do
> +this themselves, to avoid the same sort of leak in their own layer, it
> +makes more sense for the library to assume they are taking care of it
> +than to impose a performance penalty that is redundant when the problem
> +is solved in upper layers, and insufficient when it isn't.
> +
> +This remark by itself does not cause a function to be regarded as
> +AC-Unsafe. However, cummulative effects of such leaks may pose a
> +problem for some programs. If this is the case, suspending asynchronous
> +cancellation for the duration of calls to such functions is recommended.
> +
Spelling: cummulative -> cumulative, I believe.
> +
> +@item @code{memleak}
> +@cindex memleak
> +
> +Functions annotated with @code{memleak} may leak memory if asynchronous
> +thread cancellation interrupts their execution.
> +
> +The problem is similar to that of file descriptors: there is no atomic
> +interface to allocate memory and store its address in the argument to a
> +cleanup handler, or to release it and remove its address from that
> +argument, without at least temporarily disabling asynchronous
> +cancellation, which these functions do not do.
Given that we control the allocator, I believe that it can be made
atomic (without disabling signals or reentrancy). File descriptors are
different in that if the kernel doesn't give us an atomic operation with
either idempotency guarantees or a commit/abort return value, we indeed
can't detect failures.
> +This remark does not by itself cause a function to be regarded as
> +generally AC-Unsafe. However, cummulative effects of such leaks may be
cumulative
> +severe enough for some programs that disabling asynchronous cancellation
> +for the duration of calls to such functions may be required.
> +
> +
> +@item @code{lockleak}
> +@cindex lockleak
> +
> +Functions annotated with @code{lockleak} may leak locks if asynchronous
> +thread cancellation interrupts their execution.
So this is not a leaking lock, but a non-cleaned-up acquisition of a
lock? "lockacquisitionleak" is longer, but might be more clear? Or
"lockacqleak"?
> +While the problem is similar to that of file descriptors, in that there
> +is not any atomic interface to lock and take note of the need for
> +unlocking in a cleanup, or to unlock and take note that there is no
> +longer such a need, the problem posed by lock leaks is far more serious:
> +when a file descriptor or a piece of memory is leaked, it becomes
> +inaccessible and subsequent attempts to allocate a file descriptor or
> +some memory will just use another resource. However, once a lock is
> +left taken, attempts to take that lock will block indefinitely.
> +(Recursive locks will only block other threads, and read locks will only
> +block writer threads, but the point still holds in general).
> +
> +For the reasons above, functions that leak locks are all AC-Unsafe.
> +
> +
> +@item @code{selfdeadlock}
> +@cindex selfdeadlock
> +
> +Functions marked with @code{selfdeadlock} take a non-recursive lock to
> +ensure MT-Safety while modifying data structures guarded by the lock.
> +
> +If such a function is called by a signal handler that interrupted
> +another such function that took the lock, the result is a deadlock.
> +
> +Blocking asynchronous signal delivery while calling such functions is
> +the only safe way to avoid a deadlock if any signal handler might need
> +to call them.
> +
> +
> +@item @code{asynconsist}
> +@cindex asynconsist
> +
> +Functions marked with @code{asynconsist} take a recursive lock to ensure
> +MT-Safety while accessing or modifying data structures guarded by the
> +lock.
Should this be "asyncconcist" (ie, don't drop one of the "c"'s)? I was
reading "async-on-sist" and wondering what it means...
> +If such a function is called by a signal handler that interrupted
> +another such function that took the lock, both may misbehave for
> +observing inconsistent (partially updated or cached) data structures.
> +
> +Blocking asynchronous signal delivery while calling such functions is
> +the only safe way to avoid the misbehavior that may ensue if any signal
> +handler might need to call them.
> +
> +
> +@item @code{asmalloc}
> +@cindex asmalloc
> +
> +This is a sub-case of @code{asynconsist}. Functions marked with
> +@code{asmalloc} perform memory allocation or deallocation with the
> +@code{malloc}/@code{free} family of functions.
> +
> +If heap management functions are interrupted by asynchronous signals,
> +and the signal handlers attempt to perform memory allocation or
> +deallocation of their own, they may encounter heap data structures in a
> +partially updated state, and the interrupted calls may malfunction
> +because of the changes made within the signal handler.
> +
> +The @code{asmalloc} mark implies @code{selfdeadlock} (AS-unsafe) and
> +@code{lockleak} (AC-Unsafe).
> +
> +
> +@item @code{incansist}
> +@cindex incansist
> +
> +Functions marked with @code{incansist} modify data structures in a
> +non-atomic way.
What does "incansist" stand for?