This is the mail archive of the
mailing list for the glibc project.
Re: powerpc __tls_get_addr call optimization
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Rich Felker <dalias at libc dot org>, Alan Modra <amodra at gmail dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 20 Mar 2015 11:48:34 -0400
- Subject: Re: powerpc __tls_get_addr call optimization
- Authentication-results: sourceware.org; auth=none
- References: <20150318061145 dot GE24573 at bubble dot grove dot modra dot org> <5509B0D4 dot 2020903 at redhat dot com> <20150319025631 dot GC28603 at bubble dot grove dot modra dot org> <550B94FC dot 3070903 at redhat dot com> <20150320075502 dot GC26234 at bubble dot grove dot modra dot org> <20150320152712 dot GK23507 at brightrain dot aerifal dot cx>
On 03/20/2015 11:27 AM, Rich Felker wrote:
> On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote:
>> On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote:
>>> On 03/18/2015 10:56 PM, Alan Modra wrote:
>>>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote:
>>>>> On 03/18/2015 02:11 AM, Alan Modra wrote:
>>>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth
>>>>>> revisiting an old patch of mine.
>>>>> I'm not against this patch, but it certainly seems like you would be
>>>>> better served by just implementing tls descriptors?
>>>> I think this is one better than tls descriptors, because powerpc
>>>> avoids the indirect function call used by tls descriptors.
>>> You mean to say it is "faster" than tls descriptors, but at the same
>> To be honest, there isn't much difference in the optimized case where
>> static TLS is available. It boils down to an indirect call to a
>> function that loads one value vs. a direct call to a stub that loads
>> two values and compares one against zero. I think what I've
>> implemented is slightly better for PowerPC, but whether that would
>> carry over to other architectures is debatable.
> If the performance difference isn't measurable in real-world
> applications, I would think uniformity between targets would be a lot
> more valuable.
> I also don't see how your approach is a "direct call". The function
> being called is in a different DSO so it has to go through a pointer
> in the GOT or similar, in which case it's just as "indirect" as the
> TLSDESC call would be.
I agree. And this was my initial inclination, but I'm not against what
Alan has implemented. As a machine maintainer he should be allowed some
leeway to argue this implementation is "N instructions less" and therefore
must be faster, but that such speed is harder to show in a microbenchmark,
it would in the mean result in say less CPU usage over billions of cycles.
IBM has to accept that the downside to all of this is that breakage in
this area may take longer to fix, and get less fixes than those arches
already using TLS DESC.