This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/2] Single thread optimization for malloc atomics
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: Ondřej Bílka <neleai at seznam dot cz>, Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Wed, 30 Apr 2014 15:12:45 -0500
- Subject: Re: [PATCH 1/2] Single thread optimization for malloc atomics
- Authentication-results: sourceware.org; auth=none
- References: <53610133 dot 3070908 at linux dot vnet dot ibm dot com> <20140430141845 dot GA6882 at domone dot podge> <20140430160618 dot GJ26358 at brightrain dot aerifal dot cx>
- Reply-to: munroesj at us dot ibm dot com
On Wed, 2014-04-30 at 12:06 -0400, Rich Felker wrote:
> On Wed, Apr 30, 2014 at 04:18:45PM +0200, OndÅej BÃlka wrote:
> > On Wed, Apr 30, 2014 at 10:57:07AM -0300, Adhemerval Zanella wrote:
> > > This patch adds a single-thread optimization for malloc atomic usage to
> > > first check if process is single-thread (ST) and if so use normal
> > > load/store instead of atomic instructions.
> > >
> > How fast is tls on power? When we add a per-thread cache as I suggested
> > then it would have most of time same performance as singlethread, with
> > overhead one tls variable access per malloc call.
>
> Extremely fast: the TLS address is simply kept in a general-purpose
> register.
>
Depends on the TLS access model.
General Dynamic TLS Model requires a dynamic up-call to _tld_get_addr().
So slow.
If you can get to the Local Exec or Initial Exec form (where the dvt
slot or TLS offset can be known at static link time) it can be a simple
inline computation.
As we are talking about a dynamic library (libc.so) here, you have to
set this up carefully.