This is the mail archive of the
mailing list for the glibc project.
Re: powerpc __tls_get_addr call optimization
- From: Alan Modra <amodra at gmail dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: Carlos O'Donell <carlos at redhat dot com>, libc-alpha at sourceware dot org
- Date: Sat, 21 Mar 2015 13:37:02 +1030
- Subject: Re: powerpc __tls_get_addr call optimization
- Authentication-results: sourceware.org; auth=none
- References: <20150318061145 dot GE24573 at bubble dot grove dot modra dot org> <5509B0D4 dot 2020903 at redhat dot com> <20150319025631 dot GC28603 at bubble dot grove dot modra dot org> <550B94FC dot 3070903 at redhat dot com> <20150320075502 dot GC26234 at bubble dot grove dot modra dot org> <20150320152712 dot GK23507 at brightrain dot aerifal dot cx>
On Fri, Mar 20, 2015 at 11:27:12AM -0400, Rich Felker wrote:
> On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote:
> > On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote:
> > > On 03/18/2015 10:56 PM, Alan Modra wrote:
> > > > On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote:
> > > >> On 03/18/2015 02:11 AM, Alan Modra wrote:
> > > >>> Now that Alex's fixes for static TLS have gone in, I figure it's worth
> > > >>> revisiting an old patch of mine.
> > > >>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html
> > > >>
> > > >> I'm not against this patch, but it certainly seems like you would be
> > > >> better served by just implementing tls descriptors?
> > > >
> > > > I think this is one better than tls descriptors, because powerpc
> > > > avoids the indirect function call used by tls descriptors.
> > >
> > > You mean to say it is "faster" than tls descriptors, but at the same
> > To be honest, there isn't much difference in the optimized case where
> > static TLS is available. It boils down to an indirect call to a
> > function that loads one value vs. a direct call to a stub that loads
> > two values and compares one against zero. I think what I've
> > implemented is slightly better for PowerPC, but whether that would
> > carry over to other architectures is debatable.
> If the performance difference isn't measurable in real-world
> applications, I would think uniformity between targets would be a lot
> more valuable.
Think of my design as "TLS descriptors version 2". I take the best
features of TLS descriptors and add one trick, the special linker
stub, that allows you to omit many of the nasty details of the current
TLS descriptor design. A target that currently has TLS support but no
TLS descriptor support and follows the powerpc design:
1) won't need to implement gcc changes for tls descriptors,
2) won't need to define new relocations,
3) won't need to implement linker support for tls descriptors, quite a
large effort, and
4) won't need to implement dl-tlsdesc.S and tlsdesc.c in glibc, also
not a simple task.
Another benefit in terms of reliability (and repeatable user timing!)
is that extended TLS descriptors are not needed, so the locking and
mallocing in tlsdeschtab.h is avoided.
Admittedly, part of the reason a port is so much easier is due to
omitting lazy TLS resolution. Lazy TLS is complex. What's more, the
per-target support code is non-trivial. All of tlsdesc.c and half of
dl-tlsdesc.S is lazy TLS support. I question whether the added
complexity provides commensurate benefit in real-world applications,
apart from the degenerate case of loading a shared library that is
never used. (And even then, you'd need a lot of __thread variables to
make it worthwhile.)
In fact, I wouldn't be surprised to find lazy TLS has a net negative
benefit in real-world applications!
/me dons asbestos suit. :)
> I also don't see how your approach is a "direct call". The function
> being called is in a different DSO so it has to go through a pointer
> in the GOT or similar, in which case it's just as "indirect" as the
> TLSDESC call would be.
It is a direct call to the linker provided stub, which will return
after a few instructions in the optimized case when static TLS is
Control is passed to __tls_get_addr_opt only when no static TLS was
available for the shared library at the time the library was
dynamically relocated, ie. it was dlopen'ed and not enough spare
static TLS was free.
Note that __tls_get_addr_opt is currently an alias for
__tls_get_addr. I believe it could be implemented as a different
function with a few more bells and whistles to provide lazy TLS
resolution, but I haven't proven that.
Australia Development Lab, IBM