This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] malloc: Fix list_lock/arena lock deadlock [BZ #19182]

From: Torvald Riegel <triegel at redhat dot com>
To: Florian Weimer <fweimer at redhat dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 11 Dec 2015 17:49:53 +0100
Subject: Re: [PATCH v2] malloc: Fix list_lock/arena lock deadlock [BZ #19182]
Authentication-results: sourceware.org; auth=none
References: <56389C10 dot 6060001 at redhat dot com> <56642691 dot 7060806 at redhat dot com>

Disclaimer: I haven't yet cross-checked whether your analysis of which
lock protects which data is complete, nor do I know the malloc
synchronization details.  If you want, I can do the cross-check, but it
probably will take more time.  The comments below are just based on what
I saw in the immediate neighborhood of your changes.

On Sun, 2015-12-06 at 13:14 +0100, Florian Weimer wrote:
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 3dab7bb..834907c 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -69,9 +69,11 @@ extern int sanity_check_heap_info_alignment[(sizeof (heap_info)
>  static __thread mstate thread_arena attribute_tls_model_ie;
>  
>  /* Arena free list.  list_lock protects the free_list variable below,
> -   and the next_free and attached_threads members of the mstate
> -   objects.  No other (malloc) locks must be taken while list_lock is
> -   active, otherwise deadlocks may occur.  */
> +   the next_free and attached_threads members of struct malloc_state
> +   objects, and concurrent writes to the next member of these objects.
> +   (Read access to the next member is synchronized via a barrier.)
> +   When list_lock is acquired, no arena lock must be acquired, but it
> +   is permitted to acquire arena locks after list_lock.  */

What about main_arena?  _int_new_arena looks like main_area is protected
by list_lock too.

I would provide more detail on the synchronization of read-only
traversals of the list.  list_lock seems to be used to enforce mutual
exclusion between modification operations to the list (including
necessary reading/traversal of the list, so you can't go in without a
lock, figure out you need to change, and then grab list_lock).  For
traversal-only / write-op synchronization, atomic operations are used.
This is the reason why write ops do not need to use acquire fences when
traversing the list.  The acquire fences are missing in reused_arena,
and ptmalloc_[un]lock_all[2], it seems.

(Remember that if you want to use a synchronizes-with edge, there must
be a *pair* of release and acquire MOs -- just an release MO fence (or
atomic_write_barrier) is not sufficient.)

I would also say "atomic ops" instead of barriers in the docs because we
have to use atomics there instead of plain accesses (although we don't
currently), and once we do we'll either use atomic ops or something kind
of atomic_thread_fence, so it would be fences not barriers; this avoids
ambiguity with the higher-level sync construct called barriers.

>  static mutex_t list_lock = _LIBC_LOCK_INITIALIZER;
>  static size_t narenas = 1;
> @@ -790,7 +792,16 @@ _int_new_arena (size_t size)
>    mutex_init (&a->mutex);
>    (void) mutex_lock (&a->mutex);
>  
> -  (void) mutex_lock (&list_lock);
> +  /* Put the arena in the global list.  If we cannot acquire
> +     list_lock, we have to temporarily release the arena lock, to
> +     avoid deadlocks.  NB: mutex_trylock returns 0 if the lock was
> +     acquired.  */
> +  bool reacquire_arena_lock = mutex_trylock (&list_lock);
> +  if (reacquire_arena_lock)
> +    {
> +      mutex_unlock (&a->mutex);
> +      (void) mutex_lock (&list_lock);
> +    }
>  
>    detach_arena (replaced_arena);
>  
> @@ -801,6 +812,9 @@ _int_new_arena (size_t size)
>  
>    (void) mutex_unlock (&list_lock);
>  
> +  if (reacquire_arena_lock)
> +    (void) mutex_lock (&a->mutex);
> +

Why can't we simply acquire a->mutex later?  Is there a way that it can
be visible to other threads?  If so, how and why would it be okay to
release the lock temporarily then?  If not, we should add a brief
comment stating that (and thus explaining later acquisition of
a->mutex).

>    return a;
>  }
>  
> @@ -884,11 +898,22 @@ reused_arena (mstate avoid_arena)
>  
>  out:
>    {
> +    /* Update the arena thread attachment counters.  If we cannot
> +       acquire list_lock, we have to temporarily release the arena
> +       lock, to avoid deadlocks.  NB: mutex_trylock returns 0 if the
> +       lock was acquired.  */

Why can you temporarily release the lock?  Will the memory not be reused
for a different arena or something else?  The comment should explain
that as well or point to an existing comment.

>      mstate replaced_arena = thread_arena;
> -    (void) mutex_lock (&list_lock);
> +    bool reacquire_arena_lock = mutex_trylock (&list_lock);
> +    if (reacquire_arena_lock)
> +      {
> +	(void) mutex_unlock (&result->mutex);
> +	(void) mutex_lock (&list_lock);
> +      }
>      detach_arena (replaced_arena);
>      ++result->attached_threads;
>      (void) mutex_unlock (&list_lock);
> +    if (reacquire_arena_lock)
> +      (void) mutex_lock (&result->mutex);
>    }
>  
>    LIBC_PROBE (memory_arena_reuse, 2, result, avoid_arena);

HTH for now.

Follow-Ups:
- Re: [PATCH v2] malloc: Fix list_lock/arena lock deadlock [BZ #19182]
  - From: Florian Weimer

References:
- [PATCH v2] malloc: Fix list_lock/arena lock deadlock [BZ #19182]
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]