This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Malloc improvements
- From: "Tulio Magno Quites Machado Filho" <tuliom at linux dot vnet dot ibm dot com>
- To: Florian Weimer <fweimer at redhat dot com>, Anton Blanchard <anton at au1 dot ibm dot com>
- Cc: "Carlos O'Donell" <carlos at redhat dot com>, Siddhesh Poyarekar <sid at reserved-bit dot com>, DJ Delorie <dj at redhat dot com>, libc-alpha at sourceware dot org
- Cc:
- Date: Fri, 15 Jul 2016 09:54:58 -0300
- Subject: Re: Malloc improvements
- Authentication-results: sourceware.org; auth=none
- References: <20160712101010.6e6cfecb@kryten> <5a954ab2-d74c-867d-e427-ffae95389beb@redhat.com> <20160714214910.6727c439@kryten> <8b72c439-a9c3-4cfd-f9a1-f67836ea4795@redhat.com>
Florian Weimer <fweimer@redhat.com> writes:
> On 07/14/2016 01:49 PM, Anton Blanchard wrote:
>>>> It's great to see the current focus on improving malloc. One thing
>>>> that would really help POWER is reducing the number of locks and
>>>> atomics in the fast path. Right now we have 3 in the malloc
>>>> fastpath and 2 in free. These add up.
>>>
>>> Does the hook variable read count as an atomic operation in this
>>> sense?
>>
>> The read hook shouldn't be. The atomic issue I was referring to was
>> something we've been trying to solve for a while:
>>
>> https://sourceware.org/ml/libc-alpha/2014-05/msg00118.html
>
> x86_64 checks __libc_multiple_threads and avoids atomics if possible.
> Do you already do this in POWER?
This was our last try:
http://patchwork.sourceware.org/patch/11307/
In summary, Torvald said:
We need to be consistent where we try to optimize for single-threaded
executions. Currently, we do in catomic_* and I believe in some pieces
of code using atomics. The atomic_* functions, even the old ones,
should not do that.
Eventually, we should put special cases for single-threaded executions
into the code using atomics and not into the atomics (also see above,
phasing out catomic_*) because avoiding concurrent algorithms altogether
is even faster than doing something in atomics (eg, one can avoid CAS
loops altogether if there's no other thread because the CAS will never
fail).
Another reason to do this is that this adds the overhead of the
single-thread check to all atomics, even in cases where it's clear that
the code will be used often in a multi-threaded setting.
--
Tulio Magno