This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Async signal safe TLS accesses
- From: Andrew Hunter <ahh at google dot com>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: Paul Pluzhnikov <ppluzhnikov at google dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 5 Dec 2013 12:01:28 -0800
- Subject: Re: [PATCH] Async signal safe TLS accesses
- Authentication-results: sourceware.org; auth=none
- References: <1380830518-16721-1-git-send-email-ahh at google dot com> <1382477766-15770-1-git-send-email-ahh at google dot com> <Pine dot LNX dot 4 dot 64 dot 1310222145490 dot 13258 at digraph dot polyomino dot org dot uk> <CALoOobMsO6X86JFD4J7F-EL-J+xOTEOVbzH=6mwrvfCnFvw57Q at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1311052233090 dot 30260 at digraph dot polyomino dot org dot uk> <CALoOobM70-mix+=1zuDnSoK7SRqQChbL=03xBkUcFf1fyS1Mjw at mail dot gmail dot com> <CALoOobP7kdpZCZA0a7MZWCtONu81fW4H_qAWOEfpfvzxJgG_=Q at mail dot gmail dot com> <CALoOobP6rTDosadvLKhHY+deDsU-FtvyO8QX_Y4dZy716e2ATQ at mail dot gmail dot com> <1385641717 dot 3152 dot 8418 dot camel at triegel dot csb>
On Thu, Nov 28, 2013 at 4:28 AM, Torvald Riegel <triegel@redhat.com> wrote:
> I've looked at the synchronization bits in this patch. Detailed
> comments below, and one general comment:
>
> Please document the synchronization code you add in more detail. The
> C11 memory model should be the base model, even though the atomic
> operations we currently have are somewhat different.
> In particular, if using barriers / memory orders (MO) such as "_acq",
> please make sure that you document which ordering they are meant to
> achieve. (Or, if there are no such barriers or just relaxed MO, please
> document why we don't need a particular ordering.) For example, if you
> use an acquire barrier / MO, please briefly document with which release
> barriers / MO this is supposed to create happens-before relations.
> Also, the atomic operations that we currently have don't cover all the
> options in the C11 model (for example, we don't have relaxed MO
> read-modify-write ops AFAIR); thus, if you use a glibc atomic operation
> that is stronger than what you'd actually need in the C11 model, please
> document this as well so we can easily change that later on once we
> should have those atomic ops too.
>
Done, throughout (I think). v3 had several of these that got lost in
Paul's latest version due to a merge problem.
>> +#else
>> +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
>> +#endif
>> + /* We've computed the new value we want, now try to install it. */
>> + ptrdiff_t val;
>> + while ((val = map->l_tls_offset) == NO_TLS_OFFSET)
>> + {
>> + atomic_compare_and_exchange_bool_acq (&map->l_tls_offset,
>> + offset,
>> + NO_TLS_OFFSET);
>> + }
>> + if (val != offset)
>> + {
>> + /* Lost a race to a TLS access in another thread. Too bad,
>> nothing
>> + we can do here. */
>> + goto fail;
>> + }
>
> It looks as if you use the loop only to reload map->l_tls_offset after
> the CAS, right? (AFAICT, the only state transition you're trying to do
> or wait for is from map->l_tls_offset == NO_TLS_OFFSET to something
> else. Our CAS is a strong CAS in the C11 terminology (ie, it won't fail
> spuriously). Thus, doing the CAS once should be sufficient.)
>
> If so, why don't you either use just an atomic_compare_exchange_val_acq,
> or an atomic_compare_exchange_val_acq with one load+if in front of it to
> avoid the CAS overhead when not needed (ie, like test-and-test-and-set)?
>
> The use of _acq MO seems inconsistent. If you don't need _acq on the
> CAS, but were using it because it's the weakest atomic op we currently
> offer, please document that. If you do need it, then you also need an
> acquire load+if in front of the CAS, if the load+if can skip the CAS;
> also, please document with which release MO this is supposed to
> synchronize in this case (e.g., does the CAS need to have
> acquire-release MO?).
>
I am in fact using acquire for lack of a nobarrier option. I changed
while to if and reread the value after CAS, as suggested.
>> + ptrdiff_t offset;
>> + while ((offset = the_map->l_tls_offset) == NO_TLS_OFFSET)
>> {
>> - __rtld_lock_lock_recursive (GL(dl_load_lock));
>> - if (__builtin_expect (the_map->l_tls_offset == NO_TLS_OFFSET,
>> 1))
>> - {
>> - the_map->l_tls_offset = FORCED_DYNAMIC_TLS_OFFSET;
>> - __rtld_lock_unlock_recursive (GL(dl_load_lock));
>> - }
>> - else
>> + atomic_compare_and_exchange_bool_acq (&the_map->l_tls_offset,
>> + FORCED_DYNAMIC_TLS_OFFSET,
>> + NO_TLS_OFFSET);
>> + }
>> + if (offset == FORCED_DYNAMIC_TLS_OFFSET)
>> + {
>> + allocate_and_init (&dtv[GET_ADDR_MODULE], the_map);
>> + }
>
> The previous comments should apply here in the same way as above.
>
Applied an identical fix.
>> + else
>> + {
>> + void **pp = &dtv[GET_ADDR_MODULE].pointer.val;
>> + while (atomic_forced_read (*pp) == TLS_DTV_UNALLOCATED)
>> {
>> - __rtld_lock_unlock_recursive (GL(dl_load_lock));
>> - if (__builtin_expect (the_map->l_tls_offset
>> - != FORCED_DYNAMIC_TLS_OFFSET, 1))
>> - {
>> - void *p = dtv[GET_ADDR_MODULE].pointer.val;
>> - if (__builtin_expect (p == TLS_DTV_UNALLOCATED, 0))
>> - goto again;
>> -
>> - return (char *) p + GET_ADDR_OFFSET;
>> - }
>> + /* for lack of a better (safe) thing to do, just spin.
>> + Someone else (not us; it's done under a signal mask) set
>> + this map to a static TLS offset, and they'll iterate all
>> + threads to initialize it. They'll eventually set
>> + is_static=true, at which point we know they've fully
>> + completed initialization. */
>> + atomic_delay ();
>
> You could use a futex to block until *pp != TLS_DTV_UNALLOCATED. If
> this is a one-shot transition (ie, like in a latch), then this should be
> simple and would just need a suitable futex wake operation elsewhere.
>
> However, the pure spin-wait is probably fine unless we want to support
> programs with real-time priorities (ie, where a higher-prio thread could
> starve a lower-prio thread forever, for example, if the former is
> spin-waiting for a result produced by the latter). I don't know whether
> we do or do not want to support that, so I'm leaving that to others to
> decide.
>
This case can only occur (I believe) when we have:
dlopen(L1) (contains TLS variable x)
Thread T1 reads x and goes to dynamically initialize it, racing with...
Thread T2 dlopen(L2) (contains tls reloc referencing x from L1)
which is at best extremely obscure; I would prefer to avoid the
complexity of futex unless we really see this as a priority.
>> }
>> + /* make sure we've picked up their initialization of the actual
>> block. */
>> + atomic_read_barrier ();
>
> Please document where the matching release barrier or release-MO store
> is. Is it the write barrier I've commented on below?
>
Yes; done (this was in v3 but got lost temporarily.)
>> + dtv[map->l_tls_modid].pointer.is_static = true;
>> + atomic_write_barrier ();
>> + dtv[map->l_tls_modid].pointer.val = dest;
>
> Is this the matching release barrier for the acquire MOs above? If so,
> please document it. In general, if we have a barrier + atomic
> load/store combination, I would prefer if we could point out this
> combination too (to show over which route the happens-before is supposed
> to be established). In this case it's rather easy to see (well, I
> *guess* it's the final store :) , but I guess documenting it can't hurt
> either.
>
No, it's just pairing against the read barrier after the spinloop
above. (As pointed out above, the CAS's aren't actually acquire (or
don't have to be, at least.)