This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Async signal safe TLS accesses


On Thu, Nov 28, 2013 at 4:28 AM, Torvald Riegel <triegel@redhat.com> wrote:
> I've looked at the synchronization bits in this patch.  Detailed
> comments below, and one general comment:
>
> Please document the synchronization code you add in more detail.  The
> C11 memory model should be the base model, even though the atomic
> operations we currently have are somewhat different.
> In particular, if using barriers / memory orders (MO) such as "_acq",
> please make sure that you document which ordering they are meant to
> achieve.  (Or, if there are no such barriers or just relaxed MO, please
> document why we don't need a particular ordering.)  For example, if you
> use an acquire barrier / MO, please briefly document with which release
> barriers / MO this is supposed to create happens-before relations.
> Also, the atomic operations that we currently have don't cover all the
> options in the C11 model (for example, we don't have relaxed MO
> read-modify-write ops AFAIR); thus, if you use a glibc atomic operation
> that is stronger than what you'd actually need in the C11 model, please
> document this as well so we can easily change that later on once we
> should have those atomic ops too.
>

Done, throughout (I think). v3 had several of these that got lost in
Paul's latest version due to a merge problem.
>> +#else
>> +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
>> +#endif
>> +  /* We've computed the new value we want, now try to install it.  */
>> +  ptrdiff_t val;
>> +  while ((val = map->l_tls_offset) == NO_TLS_OFFSET)
>> +    {
>> +      atomic_compare_and_exchange_bool_acq (&map->l_tls_offset,
>> +                                           offset,
>> +                                           NO_TLS_OFFSET);
>> +    }
>> +  if (val != offset)
>> +    {
>> +      /* Lost a race to a TLS access in another thread. Too bad,
>> nothing
>> +        we can do here.  */
>> +      goto fail;
>> +    }
>
> It looks as if you use the loop only to reload map->l_tls_offset after
> the CAS, right?  (AFAICT, the only state transition you're trying to do
> or wait for is from map->l_tls_offset == NO_TLS_OFFSET to something
> else.  Our CAS is a strong CAS in the C11 terminology (ie, it won't fail
> spuriously).  Thus, doing the CAS once should be sufficient.)
>
> If so, why don't you either use just an atomic_compare_exchange_val_acq,
> or an atomic_compare_exchange_val_acq with one load+if in front of it to
> avoid the CAS overhead when not needed (ie, like test-and-test-and-set)?
>
> The use of _acq MO seems inconsistent.  If you don't need _acq on the
> CAS, but were using it because it's the weakest atomic op we currently
> offer, please document that.  If you do need it, then you also need an
> acquire load+if in front of the CAS, if the load+if can skip the CAS;
> also, please document with which release MO this is supposed to
> synchronize in this case (e.g., does the CAS need to have
> acquire-release MO?).
>

I am in fact using acquire for lack of a nobarrier option.  I changed
while to if and reread the value after CAS, as suggested.
>> +  ptrdiff_t offset;
>> +  while ((offset = the_map->l_tls_offset) == NO_TLS_OFFSET)
>>      {
>> -      __rtld_lock_lock_recursive (GL(dl_load_lock));
>> -      if (__builtin_expect (the_map->l_tls_offset == NO_TLS_OFFSET,
>> 1))
>> -       {
>> -         the_map->l_tls_offset = FORCED_DYNAMIC_TLS_OFFSET;
>> -         __rtld_lock_unlock_recursive (GL(dl_load_lock));
>> -       }
>> -      else
>> +      atomic_compare_and_exchange_bool_acq (&the_map->l_tls_offset,
>> +                                           FORCED_DYNAMIC_TLS_OFFSET,
>> +                                           NO_TLS_OFFSET);
>> +    }
>> +  if (offset == FORCED_DYNAMIC_TLS_OFFSET)
>> +    {
>> +      allocate_and_init (&dtv[GET_ADDR_MODULE], the_map);
>> +    }
>
> The previous comments should apply here in the same way as above.
>

Applied an identical fix.

>> +  else
>> +    {
>> +      void **pp = &dtv[GET_ADDR_MODULE].pointer.val;
>> +      while (atomic_forced_read (*pp) == TLS_DTV_UNALLOCATED)
>>         {
>> -         __rtld_lock_unlock_recursive (GL(dl_load_lock));
>> -         if (__builtin_expect (the_map->l_tls_offset
>> -                               != FORCED_DYNAMIC_TLS_OFFSET, 1))
>> -           {
>> -             void *p = dtv[GET_ADDR_MODULE].pointer.val;
>> -             if (__builtin_expect (p == TLS_DTV_UNALLOCATED, 0))
>> -               goto again;
>> -
>> -             return (char *) p + GET_ADDR_OFFSET;
>> -           }
>> +         /* for lack of a better (safe) thing to do, just spin.
>> +           Someone else (not us; it's done under a signal mask) set
>> +           this map to a static TLS offset, and they'll iterate all
>> +           threads to initialize it.  They'll eventually set
>> +           is_static=true, at which point we know they've fully
>> +           completed initialization.  */
>> +         atomic_delay ();
>
> You could use a futex to block until *pp != TLS_DTV_UNALLOCATED.  If
> this is a one-shot transition (ie, like in a latch), then this should be
> simple and would just need a suitable futex wake operation elsewhere.
>
> However, the pure spin-wait is probably fine unless we want to support
> programs with real-time priorities (ie, where a higher-prio thread could
> starve a lower-prio thread forever, for example, if the former is
> spin-waiting for a result produced by the latter).  I don't know whether
> we do or do not want to support that, so I'm leaving that to others to
> decide.
>

This case can only occur (I believe) when we have:

dlopen(L1) (contains TLS variable x)

Thread T1 reads x and goes to dynamically initialize it, racing with...
Thread T2 dlopen(L2) (contains tls reloc referencing x from L1)

which is at best extremely obscure; I would prefer to avoid the
complexity of futex unless we really see this as a priority.

>>         }
>> +      /* make sure we've picked up their initialization of the actual
>> block.  */
>> +      atomic_read_barrier ();
>
> Please document where the matching release barrier or release-MO store
> is.  Is it the write barrier I've commented on below?
>

Yes; done (this was in v3 but got lost temporarily.)

>> +  dtv[map->l_tls_modid].pointer.is_static = true;
>> +  atomic_write_barrier ();
>> +  dtv[map->l_tls_modid].pointer.val = dest;
>
> Is this the matching release barrier for the acquire MOs above?  If so,
> please document it.  In general, if we have a barrier + atomic
> load/store combination, I would prefer if we could point out this
> combination too (to show over which route the happens-before is supposed
> to be established).  In this case it's rather easy to see (well, I
> *guess* it's the final store :) , but I guess documenting it can't hurt
> either.
>

No, it's just pairing against the read barrier after the spinloop
above.  (As pointed out above, the CAS's aren't actually acquire (or
don't have to be, at least.)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]