This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Remove atomic operations from malloc.c

From: Torvald Riegel <triegel at redhat dot com>
To: Leonhard Holz <leonhard dot holz at web dot de>
Cc: libc-alpha at sourceware dot org
Date: Wed, 11 Feb 2015 12:01:08 +0100
Subject: Re: [PATCH] Remove atomic operations from malloc.c
Authentication-results: sourceware.org; auth=none
References: <54DB130F dot 9070300 at web dot de>

On Wed, 2015-02-11 at 09:30 +0100, Leonhard Holz wrote:
> The fastbin path of malloc/free features atomic operations, which were probably an 
> attempt for lock-free code. But there is an ABA problem in malloc's
> 
> while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
>          != victim);
> 
> because even if *fd == victim, victim->fd might have changed in the meantime. 
> There is no easy way to fix this, an old comment mentions the need for a CAS2 
> operation with tagged pointers. Consequently malloc is right now wrapped by a lock 
> to avoid problems.

I'll have a look at the correctness side of this soon.

> Anyhow, while the atomic ops avoid a lock in free they are costly too. Especially 
> at high contention the compare_and_exchange may fail multiple times (see 
> http://blog.memsql.com/common-pitfalls-in-writing-lock-free-algorithms/ for a 
> discussion). So I measured how removing atomics and installing a lock around free 
> affects performance and it turns out (at least in my case with 2 cores), that it 
> has no effect:
> 
> Current implementation
> threads time per iteration
> 1       116.709
> 8       705.080
> 16      1394.74
> 32      2923.03
> 
> Without atomics
> threads time per iteration
> 1       112.541
> 8       715.897
> 16      1403.67
> 32      2881.30

If your machine has just two cores, then at the very least you should
measure for just two threads too; a bigger number of threads is not
putting more contention on any of the synchronization bits, there's just
some more likelihood to having to wait for a thread that isn't running.

Also, to really assess performance, this has to be benchmarked on a
machine with more cores.  Additionally, you could argue why it should
not make a difference, and if that's a compelling argument, we could
follow it instead of the benchmark (which, as Will mentions, is hard to
make representative of real-world workloads).

Follow-Ups:
- Re: [PATCH] Remove atomic operations from malloc.c
  - From: Adhemerval Zanella
- Re: [PATCH] Remove atomic operations from malloc.c
  - From: Siddhesh Poyarekar

References:
- [PATCH] Remove atomic operations from malloc.c
  - From: Leonhard Holz

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]