This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/2] Single thread optimization for malloc atomics


On Wed, Apr 30, 2014 at 03:12:45PM -0500, Steven Munroe wrote:
> On Wed, 2014-04-30 at 12:06 -0400, Rich Felker wrote:
> > On Wed, Apr 30, 2014 at 04:18:45PM +0200, OndÅej BÃlka wrote:
> > > On Wed, Apr 30, 2014 at 10:57:07AM -0300, Adhemerval Zanella wrote:
> > > > This patch adds a single-thread optimization for malloc atomic usage to
> > > > first check if process is single-thread (ST) and if so use normal
> > > > load/store instead of atomic instructions.
> > > > 
> > > How fast is tls on power? When we add a per-thread cache as I suggested
> > > then it would have most of time same performance as singlethread, with
> > > overhead one tls variable access per malloc call.
> > 
> > Extremely fast: the TLS address is simply kept in a general-purpose
> > register.
> > 
> Depends on the TLS access model.
> 
> General Dynamic TLS Model requires a dynamic up-call to _tld_get_addr().
> So slow.
> 
> If you can get to the Local Exec or Initial Exec form (where the dvt
> slot or TLS offset can be known at static link time) it can be a simple
> inline computation.
> 
> As we are talking about a dynamic library (libc.so) here, you have to
> set this up carefully.

On malloc we already use initial exec, see tsd_getspecific macro in
malloc/arena.c.

By the way general tls slowness is caused mostly by ineffective
implementation. If you do not mind adding a pointer variable p in ie form
to each binary then you could emulate tls by referencing p and two array
lookups (except for first access which triggers branch that calls
something.)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]