This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][malloc] Avoid atomics in have_fastchunks

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: Carlos O'Donell <carlos at redhat dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, "dj at redhat dot com" <dj at redhat dot com>
Cc: nd <nd at arm dot com>
Date: Tue, 19 Sep 2017 21:11:11 +0000
Subject: Re: [PATCH][malloc] Avoid atomics in have_fastchunks
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
Nodisclaimer: True
References: <DB6PR0801MB2053938869AF403F0D95BC5F83600@DB6PR0801MB2053.eurprd08.prod.outlook.com>,<d152332c-eded-aec7-f03b-efb4903f1670@redhat.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

Carlos O'Donell wrote:

> It is great to see someone looking at the details of malloc at a atomic by
> atomic cost analysis. I know we have looked briefly at fastbins and the
> tradeoff between taking the arena lock (one atomic) and CAS required to put
> the fastbin back in the list.

Yes it looks like doing free lock-free works fine overall. I wonder whether
malloc can do the same for the fastbin paths since it already has to deal
with free updating the fastbins concurrently.

> You use unadorned loads and stores of the variable av->have_fastchunks, and
> this constitutes a data race which is undefined behaviour in C11.
...
> Please use relaxed MO loads and stores if that is what we need.

I'll do that.

> After you add the relaxed MO loads and stores the comment for have_fastchunks
> will need a little more explicit language about why the relaxed MO loads and
> stores are OK from a P&C perspective.

That's easy given multithreaded interleaving already allows all possible
combinations before even considering memory ordering - see my reply to
DJ for the long version...

> Does this patch change the number of times malloc_consolidate might
> be called? Do you have any figures on this? That would be a user visible
> change (and require a bug #).

The number of calls isn't fixed already. I'll have a go at hacking the malloc
test to see how much variation there is and whether my patch changes it.

Btw what is your opinion on how to add generic single-threaded optimizations
that work for all targets? Rather than doing more target hacks, I'd like to add
something similar like we did with stdio getc/putc, ie. add a high-level check for
the single-threaded case that uses a different code path (with no/relaxed atomics
and no locks for the common cases).

To give an idea how much this helps, creating a dummy thread that does nothing
slows down x64 malloc/free by 2x (it has jumps that skip the 1-byte lock prefix...).

An alternative would be to move all the fastbin handling into the t-cache - but
then I bet it's much easier just to write a fast modern allocator...

Wilco

Follow-Ups:
- Re: [PATCH][malloc] Avoid atomics in have_fastchunks
  - From: DJ Delorie
- Re: [PATCH][malloc] Avoid atomics in have_fastchunks
  - From: Carlos O'Donell

References:
- [PATCH][malloc] Avoid atomics in have_fastchunks
  - From: Wilco Dijkstra
- Re: [PATCH][malloc] Avoid atomics in have_fastchunks
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]