This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v6] Also use l_tls_dtor_count to decide on object unload (BZ #18657)

From: Torvald Riegel <triegel at redhat dot com>
To: "Carlos O'Donell" <carlos at redhat dot com>
Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, libc-alpha at sourceware dot org, roland at hack dot frob dot com
Date: Fri, 24 Jul 2015 10:52:44 +0200
Subject: Re: [PATCH v6] Also use l_tls_dtor_count to decide on object unload (BZ #18657)
Authentication-results: sourceware.org; auth=none
References: <55A903FD dot 3060503 at redhat dot com> <1437413472-21674-1-git-send-email-siddhesh at redhat dot com> <55B14684 dot 7080601 at redhat dot com>

On Thu, 2015-07-23 at 15:54 -0400, Carlos O'Donell wrote:
> On 07/20/2015 01:31 PM, Siddhesh Poyarekar wrote:
> > v6: Juggled around the test case code a bit so that it uses tst-tls-atexit.c
> > with a couple of macros.
> > 
> > When an TLS destructor is registered, we set the DF_1_NODELETE flag to
> > signal that the object should not be destroyed.  We then clear the
> > DF_1_NODELETE flag when all destructors are called, which is wrong -
> > the flag could have been set by other means too.
> > 
> > This patch replaces this use of the flag by using l_tls_dtor_count
> > directly to determine whether it is safe to unload the object.  This
> > change has the added advantage of eliminating the lock taking when
> > calling the destructors, which could result in a deadlock.  The patch
> > also fixes the test case tst-tls-atexit - it was making an invalid
> > dlclose call, which would just return an error silently.
> > 
> > I have also added a detailed note on concurrency which also aims to
> > justify why I chose the semantics I chose for accesses to
> > l_tls_dtor_count.  Thanks to Torvald for his help in getting me
> > started on this and (literally) teaching my how to approach the
> > problem.
> > 
> > Change verified on x86_64; the test suite does not show any
> > regressions due to the patch.
> 
> This version looks awesome. Thanks for the awesome work!
> 
> OK for 2.22 with suggestions (see below).
> 
> I had lots of fun coming up with orderings between the release and acquire
> pairs and the mutex locks which I treated as release+acquire barriers
> for the purpose of the analysis.

Don't simply treat lock/unlock as fences (if that's what you meant by
"barriers").  Technically, they behave differently.  Whereas a fence can
work in combination with other atomics accesses on other memory
locations, unlock calls only synchronize with later lock calls on the
same mutex (there is a total order of all lock/unlock calls per mutex).
Thus, you can model lock/unlock as acquire/release operations on the
mutex (ie, how one would implement them), but they are not fences.

> My own rough review notes for reference:
> 
> __cxa_thread_atexit_impl
>         atomic_fetch_add_relaxed l_tls_dotr_count +1
>         - dl_load_lock held.
>                 - Excludes any unloads so forbids __dl_close_worker
>                 - Does not forbid a thread being killed.

OK.

>                         - Deferred cancel happens after lock release
>                           so not concurrent with anything else.
>                           And thread eventually calls __call_tls_dtors
> 			  with release barrier.
> 	- dl_load_lock is release+acquire barrier, which allows relaxed
> 	  ordering atomic_fetch_add to l_tls_dtor_count, because after
> 	  unlock you will either see consistent MAP or you won't see
> 	  MAP at all since load/stores can't leak past the unlock.

This isn't correct.  lock/unlock will synchronize with other lock/unlock
and thus may affect ordering for l_tls_dtor_count accesses, but they
aren't fences.

My understanding of how this works based on the discussions with
Siddhesh is that due to the critical sections using dl_load_lock,
__cxa_thread_atexit_impl and __dl_close_worker are mutually exclusive
and totally ordered.  Calling __cxa_thread_atexit_impl after
__dl_close_worker isn't legal, so __dl_close_worker will observe any
increment because of synchronization via dl_load_lock (not because of
the unlock/lock being fences).

__call_tls_dtors uses an atomic read-modify-write, so via the total
modification order for l_tls_dtor_count, we get a total order for all
increments/decrements.  __dl_close_worker now also needs to "observe"
the map deref before the decrements, and there's no lock being used
there, so we need release MO on the decrements and acquire MO on the
__dl_close_worker load of l_tls_dtor_count 

> 
> __call_tls_dtors
>         atomic_fetch_add_release curr->map->l_tls_dtor_count -1
>         - write release
>         - keeps map deref before the decrement which means you never
>           access map with l_tls_dtor_count set to 0.
> 
> __dl_close_worker
>         Took dl_load_lock
>         atomic_load_acquire l_tls_dtor_count
>         - load acquire
>         - either sees l_tls_dtor_count as 0 or positive
>                 - if zero, then we know map in __call_tls_dtors was
>                   already derefed so we know it's safe to destroy map
>                 - if positive then we know map has not been dereffed yet

Minor nit: the map *may* not have been dereffed yet / may still be in
use.  Doesn't make a difference for your reasoning here, but I'm
mentioning it anyway because IMO it's important to distinguish "may not
have happened" from "hasn't happened".  If you say "hasn't happened",
you're mentally treating the map deref and the decrement as one atomic
action (ie, if you observe one you will also observer the other).
In my experience, it's useful to be aware of what one can say and can't
say and to be really precise about that, because it keeps you alert and
makes it less likely that one makes conclusions that aren't correct.  

It's simple and often tempting to squash several different states in the
interleaving into bigger groups, but can often be misleading.
Therefore, practicing to look for and think about the exact bounds (eg,
lower bounds regarding what one knows) is useful.

Follow-Ups:
- Re: [PATCH v6] Also use l_tls_dtor_count to decide on object unload (BZ #18657)
  - From: Carlos O'Donell

References:
- Re: [PATCH v4] Also use l_tls_dtor_count to decide on object unload (BZ #18657)
  - From: Carlos O'Donell
- [PATCH v6] Also use l_tls_dtor_count to decide on object unload (BZ #18657)
  - From: Siddhesh Poyarekar
- Re: [PATCH v6] Also use l_tls_dtor_count to decide on object unload (BZ #18657)
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]