[PATCH v2 04/19] nptl: Do not use pthread set_tid_address as state synchronization (BZ #19951)

Thu Aug 26 14:58:14 GMT 2021

On 26/08/2021 07:41, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> The use after free described in BZ#19951 is due the use of two different
>> PD fields, 'joinid' and 'cancelhandling', to describe the thread state
>> to synchronize the calls of pthread_join(), pthread_detach(),
>> pthread_exit(), and normal thread exit.
>>
>> And any state change potentially requires to check for both fields
>> atomically to handle partial state (such as pthread_join() with a
>> cancellation handler to issue a 'joinstate' field rollback).
>>
>> This patch uses a different PD member with 4 possible states (JOINABLE,
>> DETACHED, EXITING, and EXITED) instead of pthread 'tid' field:
>>
>>   1. On pthread_create() the inital state is set either to JOINABLE or
>>      DETACHED depending of the pthread attribute used.
>>
>>   2. On pthread_detach(), a CAS is issued on the state.  If the CAS
>>      fails it means that thread is already detached (DETACHED) or is
>>      being terminated (EXITING).  For former an EINVAL is returned,
>>      while for latter pthread_detach() should be reponsible to join the
>>      thread (and deallocate any internal resources).
>>
>>   3. On pthread_create() exit phase (reached either if the thread
> 
> Suggest: “In the exit phase of the wrapper function for the thread start
> routine (reached …”

Ack.

> 
>>      function has returned, pthread_exit() has being called, or
>>      cancellation handled has been acted upon) we issue a CAS on state
>>      to set to EXITING mode.  If the thread is previously on DETACHED
>>      mode it will be the thread itself responsible to deallocate any
> 
> Suggest: “the thread itself is responsible for arranging the
> deallocation of any resource”
> 
> (detached threads cannot immediately deallocate themselves)

Ack.

> 
>>      resource, otherwise the threads needs to be joined.
> 
> “the thread[] needs to be joined”

Ack.

> 
> 
>>   4. The clear_tid_field on 'clone' call is changed to set the new
>>      'state' field on thread exit (EXITED).  This state ins only
>>      reached at thread termination.
>>
>>   4. The pthread_join() implementation is now simpler: the futex wait
>>      is done directly on thread state and there is no need to reset it
>>      in case of timeout (since the state is now set either by
>>      pthread_detach() or by the kernel on process termination).
> 
> Duplicate 4.

Ack.

> 
>> The race condition on pthread_detach is avoided with only one atomic
>> operation on PD state: once the mode is set to THREAD_STATE_DETACHED
>> it is up to thread itself to deallocate its memory (done on the exit
>> phase at pthread_create()).
> 
> See above regarding thread self-deallocation.
> 
> The design as described above looks sound to me, those are just nits.

Right, should I change this paragraph as well (it is not clear the
suggestion here).

> 
>> This change trigger an invalid C11 thread tests: it crates a thread,
>> which detaches itself, and after a timeout the creating thread checks
>> if the join fails.  The issue is once thrd_join() is called the thread
>> lifetime has already expired.  The test is changed so the sleep is done
>> by the thread itself, so the creating thread will try to join a valid
>> thread.
> 
> This could be a separate commit.

Ack.

> 
>> diff --git a/nptl/nptl-stack.h b/nptl/nptl-stack.h
>> index 8bcfde7ec5..c6b2b3078f 100644
>> --- a/nptl/nptl-stack.h
>> +++ b/nptl/nptl-stack.h
>> @@ -32,7 +32,7 @@ extern size_t __nptl_stack_cache_maxsize attribute_hidden;
>>  static inline bool
>>  __nptl_stack_in_use (struct pthread *pd)
>>  {
>> -  return pd->tid <= 0;
>> +  return atomic_load_relaxed (&pd->joinstate) == THREAD_STATE_EXITED;
>>  }
> 
> Maybe this should be an acquire load, to synchronize with the thread
> exit as communicated by the kernel?

It seems ok, I also added the comment:

  /* It synchronize with the thread exit as communicated by the kernel with
     set_tid_address set at clone().  */

> 
>> diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
>> index 38f637c5cf..67e00ef007 100644
>> --- a/nptl/pthread_cancel.c
>> +++ b/nptl/pthread_cancel.c
>> @@ -63,7 +63,8 @@ __pthread_cancel (pthread_t th)
>>    volatile struct pthread *pd = (volatile struct pthread *) th;
>>  
>>    /* Make sure the descriptor is valid.  */
>> -  if (INVALID_TD_P (pd))
>> +  int state = atomic_load_acquire (&pd->joinstate);
>> +  if (state == THREAD_STATE_EXITED || state == THREAD_STATE_EXITING)
>>      /* Not a valid thread handle.  */
>>      return ESRCH;
> 
> This is fixed in patch 08/19, so it's okay.
> 
>> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
>> index 08e5189ad6..763e32bc3e 100644
>> --- a/nptl/pthread_create.c
>> +++ b/nptl/pthread_create.c
>> @@ -286,7 +286,7 @@ static int create_thread (struct pthread *pd,
>> const struct
>> @@ -351,13 +351,16 @@ start_thread (void *arg)
>>           and free any resource prior return to the pthread_create caller.  */
>>        setup_failed = pd->setup_failed == 1;
>>        if (setup_failed)
>> -	pd->joinid = NULL;
>> +	pd->joinstate = THREAD_STATE_JOINABLE;
>>  
>>        /* And give it up right away.  */
>>        lll_unlock (pd->lock, LLL_PRIVATE);
>>  
>>        if (setup_failed)
>> -	goto out;
>> +	{
>> +	  pd->tid = 0;
>> +	  goto out;
>> +	}
>>      }
> 
> What's the advantage of setting pd->tid here and below in start_thread?

We don't really need to clear the tid on setup_failed case in fact, since
in this case no pthread_t will be returned to caller.  I remove it.

> 
> It destroys information that could be useful for debugging purposes,
> answering the question what the TID was before the thread exited.
> 
>> @@ -481,9 +484,20 @@ start_thread (void *arg)
>>       the breakpoint reports TD_THR_RUN state rather than TD_THR_ZOMBIE.  */
>>    atomic_bit_set (&pd->cancelhandling, EXITING_BIT);
>>  
>> -  if (__glibc_unlikely (atomic_decrement_and_test (&__nptl_nthreads)))
>> -    /* This was the last thread.  */
>> -    exit (0);
>> +
>> +  /* CONCURRENCY NOTES:
>> +
>> +     Concurrent pthread_detach() will either set state to
>> +     THREAD_STATE_DETACHED or wait the thread to terminate.  The exiting state
> 
> “wait [for] the thread to terminate”

Ack.

> 
>> +     set here is set so a pthread_join() wait until all the required cleanup
>> +     steps are done.
>> +
>> +     The 'joinstate' field will be used to determine who is responsible to
>> +     call __nptl_free_tcb below.  */
>> +
>> +  unsigned int joinstate = THREAD_STATE_JOINABLE;
>> +  atomic_compare_exchange_weak_acquire (&pd->joinstate, &joinstate,
>> +					THREAD_STATE_EXITING);
> 
> I think you need a strong CAS here.  We don't have, so you'll have to
> add a loop.

Yeah, it seems right.  I changed to:

  unsigned int prevstate; 
  while (!atomic_compare_exchange_weak_acquire (&pd->joinstate, &prevstate,
                                                THREAD_STATE_EXITING))
    prevstate = atomic_load_relaxed (&pd->joinstate);

> 
>> diff --git a/nptl/pthread_detach.c b/nptl/pthread_detach.c
>> index ac50db9b0e..97d292cab8 100644
>> --- a/nptl/pthread_detach.c
>> +++ b/nptl/pthread_detach.c
>> @@ -26,32 +26,24 @@ ___pthread_detach (pthread_t th)
> 
>> +     For the case the thread is being terminated (THREAD_STATE_EXITING),
>> +     pthread_detach() will responsible to clean up the stack.  */
>> +
>> +  int curstate = THREAD_STATE_JOINABLE;
>> +  if (!atomic_compare_exchange_weak_acquire (&pd->joinstate, &curstate,
>> +					     THREAD_STATE_DETACHED))
> 
> This should be a strong CAS, so this needs a loop, I think.

I changed to:

  unsigned int prevstate = atomic_load_relaxed (&pd->joinstate);
  do
    {
      if (prevstate != THREAD_STATE_JOINABLE) 
        return EINVAL;
      return __pthread_join (th, 0);
    }
  while (!atomic_compare_exchange_weak_acquire (&pd->joinstate, &prevstate,
                                                THREAD_STATE_DETACHED));

> 
>> diff --git a/nptl/pthread_join_common.c b/nptl/pthread_join_common.c
>> index 7303069316..428ed35101 100644
>> --- a/nptl/pthread_join_common.c
>> +++ b/nptl/pthread_join_common.c
>> @@ -22,113 +22,73 @@
>>  #include <time.h>
>>  #include <futex-internal.h>
>>  
>> -static void
>> -cleanup (void *arg)
>> +/* Check for a possible deadlock situation where the threads are waiting for
>> +   each other to finish.  Note that this is a "may" error.  To be 100% sure we
>> +   catch this error we would have to lock the data structures but it is not
>> +   necessary.  In the unlikely case that two threads are really caught in this
>> +   situation they will deadlock.  It is the programmer's problem to figure
>> +   this out.  */
>> +static inline bool
>> +check_for_deadlock (int state, struct pthread *pd)
>>  {
>> -  /* If we already changed the waiter ID, reset it.  The call cannot
>> -     fail for any reason but the thread not having done that yet so
>> -     there is no reason for a loop.  */
>>    struct pthread *self = THREAD_SELF;
>> -  atomic_compare_exchange_weak_acquire (&arg, &self, NULL);
>> +  return ((pd == self
>> +	   || (atomic_load_acquire (&self->joinstate) == THREAD_STATE_DETACHED
>> +	       && (pd->cancelhandling
>> +		   & (CANCELED_BITMASK | EXITING_BITMASK
>> +		      | TERMINATED_BITMASK)) == 0))
>> +	   && !(self->cancelstate == PTHREAD_CANCEL_ENABLE
>> +		&& (pd->cancelhandling & (CANCELED_BITMASK | EXITING_BITMASK
>> +					  | TERMINATED_BITMASK))
>> +		== CANCELED_BITMASK));
>>  }
> 
> The state argument to check_for_deadlock appears to be unused.

I removed it.

> 
>> +      /* pthread_join() is a cancellation entrypoint and we use the same
>> +         rationale for pthread_timedjoin_np().
>> +
>> +	 The kernel notifies a process which uses CLONE_CHILD_CLEARTID via
>> +	 a memory zerouing and futex wake up when the process terminates.
> 
> “memory zero[]ing and futex wake[-]up”

Ack.

> 
>>  #if __TIMESIZE == 64
>> diff --git a/nptl/pthread_tryjoin.c b/nptl/pthread_tryjoin.c
>> index fd938e8780..726f2abc68 100644
>> --- a/nptl/pthread_tryjoin.c
>> +++ b/nptl/pthread_tryjoin.c
>> @@ -22,15 +22,17 @@
>>  int
>>  __pthread_tryjoin_np (pthread_t threadid, void **thread_return)
>>  {
>> -  /* Return right away if the thread hasn't terminated yet.  */
>> -  struct pthread *pd = (struct pthread *) threadid;
>> -  if (pd->tid != 0)
>> -    return EBUSY;
>> +  /* The joinable state (THREAD_STATE_JOINABLE) is straigthforward since the
>> +     thread hasn't finished yet and trying to join might block.
>> +     Both detached (THREAD_STATE_DETACHED) and exiting (THREAD_STATE_EXITING)
>> +     might also result in a possible blocking call: a detached thread might
>> +     change its state to exiting and a exiting thread my take some time to
>> +     exit (and thus let the kernel set the state to THREAD_STATE_EXITED).  */
> 
> pthread_tryjoin_np on a thread which is THREAD_STATE_DETACHED is
> invalid, so that case doesn't matter, I think.

I changed the comment to:

  /* The joinable state (THREAD_STATE_JOINABLE) is straigthforward since the
     thread hasn't finished yet and trying to join might block.
     The exiting thread (THREAD_STATE_EXITING) also mgith result in ablocking
     call: a detached thread might change its state to exiting and a exiting
     thread my take some time to exit (and thus let the kernel set the state
     to THREAD_STATE_EXITED).  */

> 
>> +  struct pthread *pd = (struct pthread *) threadid;
>> +  return atomic_load_acquire (&pd->joinstate) != THREAD_STATE_EXITED
>> +	 ? EBUSY
>> +	 : __pthread_clockjoin_ex (threadid, thread_return, 0, NULL);
>>  }
> 
> But the code is correct, I believe.
> 
>> diff --git a/sysdeps/pthread/tst-thrd-detach.c b/sysdeps/pthread/tst-thrd-detach.c
>> index c844767748..e1906a0e10 100644
>> --- a/sysdeps/pthread/tst-thrd-detach.c
>> +++ b/sysdeps/pthread/tst-thrd-detach.c
> 
>> -  if (thrd_join (id, NULL) == thrd_success)
>> -    FAIL_EXIT1 ("thrd_join succeed where it should fail");
>> +  TEST_COMPARE (thrd_join (id, NULL), thrd_error);
> 
> This is still a user-after-free bug, right?

Indeed, I think it would be better to just remove this test.