This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH 1/2] Single thread optimization for malloc atomics
- From: Torvald Riegel <triegel at redhat dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Mon, 05 May 2014 12:17:12 +0200
- Subject: Re: [PATCH 1/2] Single thread optimization for malloc atomics
- Authentication-results: sourceware.org; auth=none
- References: <53610133 dot 3070908 at linux dot vnet dot ibm dot com> <1399041298 dot 32485 dot 6373 dot camel at triegel dot csb> <5363DB7D dot 5080805 at linux dot vnet dot ibm dot com>
On Fri, 2014-05-02 at 14:53 -0300, Adhemerval Zanella wrote:
> On 02-05-2014 11:34, Torvald Riegel wrote:
> > On Wed, 2014-04-30 at 10:57 -0300, Adhemerval Zanella wrote:
> >> This patch adds a single-thread optimization for malloc atomic usage to
> >> first check if process is single-thread (ST) and if so use normal
> >> load/store instead of atomic instructions.
> >> It is a respin of my initial attempt to add ST optimization on PowerPC.
> >> Now malloc optimization is arch-agnostic and rely in already defined macros
> >> in GLIBC (SINGLE_THREAD_P).
> > I'm not convinced that this is a good change. First, I'd like to see
> > some performance numbers that show whether the change leads to a
> > non-negligible increase in performance. That should be a
> > microbenchmark, so we can track it over time.
> In fact I would prefer to make it more arch-specific and let the the arch-maintainers
> evaluate if such change would be preferable. One way to do it is adding a
> new macro in the bits/libc-locks, some time __libc_atomic_*; and define it as
> catomic_* in ./nptl/sysdeps/pthread/bits/libc-lock.h. Then arch can be replace
> its implementation if it suit them.
I agree that there is arch-specific input to this optimization problem,
but that doesn't mean that the solution to it will be primarily
The arch-specific input we have is:
* Whether an atomic op (probably we'll be considering just RMW ops and
CAS) is significantly slower than non-atomic code plus a branch. For
example, the difference should be small on recent x86, but I assume it's
higher on PowerPC. We'd need a microbenchmark of some sort to track the
decision we make here.
* Whether an atomic op that's MT-Safe and AS-Safe is significantly
slower than an atomic op that's just AS-Safe. That needs a
microbenchmark as well. For example, x86 currently assumes this is the
case, but it's something we should revisit I guess.
* If the former is the case, we need arch-specific atomics (i.e.,
catomic_*) to make use of that.
Thus, we'd end up with two constants and potentially some arch-specific
variants. We could differentiate the constants as necessary per user
(e.g., make one that's malloc-specific), but I'd prefer not to, and that
can be done in the future.
With that in place, we can look at the atomics users and see how they
could exploit AS-Safe or AS-Unsafe+MT-Unsafe scenarios. We'd have a
normal branch and let the compiler remove the dead code (we can use the
preprocessor too, but I think that's worse).
That allows us to really look at what we can do differently in a
sequential-execution scenario -- so going beyond just tweaking one
atomic op at a time.
If we really need the single-atomic-op optimizations often, we can
consider adding something like catomic. That is, if, for example,
if (as_safe_atomics_faster) x+=1; else atomic_add(&x, 1);
is needed often, we can add something like catomic.
We might want to revisit current uses of catomic too.
> > malloc is currently marked as Async-Signal-Unsafe, but due to locks
> > elsewhere in the implementation. The MTSafety docs even talk about what
> > would be needed to make it AS-Safe. Your patch would prevent making it
> > AS-Safe. The catomic_add that you replace are AS-Safe, the sequential
> > code you add is not. Someone added those catomic_* and not sequential
> > code, so this is at least indication that there's a disconnect here.
> Another point to make this change arch specific.
Why? The motivation to use catomic might have been driven by
arch-specific properties, but you want to replace it with something
> And if we even make malloc
> functions AS-Safe in the future, the arch maintainer just need to reevaluate
> this change (since it will either make the optimization not required or make
> it unsafe).
Whether malloc is AS-Safe or not should never be arch-specific IMO. If
we use the constants as I outlined above, the malloc implementer can
just build code for the different HW performance scenarios, and all the
archs have to do is set the flags accordingly.
We could even further reduce the implementation overhead for the AS-Safe
MT-Unsafe atomics if using the newer GCC builtins for the C11 memory
model by adding support for as special memory order that just provides
> > You also add macros with nondescriptive names, that each have exactly
> > one use. There's no documentation for how they differ, and you don't
> > update the comments on the MT-Safe docs.
> >> x86 probably would kike an atomic.h refactor
> >> to avoid the double check for '__local_multiple_threads', but that is not
> >> the aim of this patch.
> > But your patch does introduce this, so you should take care of this as
> > well. I guess you wouldn't be happy if someone change x86 without
> > caring about Power?
> I will drop this patch and rework on a arch-specific one.
What do you think about the plan I outlined above?