This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Consensus: Tuning runtime behaviour with environment variables.
- From: Alexandre Oliva <aoliva at redhat dot com>
- To: Rich Felker <dalias at aerifal dot cx>
- Cc: libc-alpha at sourceware dot org
- Date: Mon, 03 Jun 2013 19:41:01 -0300
- Subject: Re: Consensus: Tuning runtime behaviour with environment variables.
- References: <51A58A92 dot 4050508 at redhat dot com> <20130529055518 dot GA23030 at domone dot kolej dot mff dot cuni dot cz> <ormwraq3rx dot fsf at livre dot home> <20130601031151 dot GK20323 at brightrain dot aerifal dot cx> <ora9n9i3jc dot fsf at livre dot home> <20130602154150 dot GN20323 at brightrain dot aerifal dot cx> <ortxlgh2an dot fsf at livre dot home> <20130602215358 dot GB29800 at brightrain dot aerifal dot cx> <or38t0dslx dot fsf at livre dot home> <20130603022344 dot GD29800 at brightrain dot aerifal dot cx>
On Jun 2, 2013, Rich Felker <dalias@aerifal.cx> wrote:
> On Sun, Jun 02, 2013 at 08:02:02PM -0300, Alexandre Oliva wrote:
>> > The hot path of __tls_get_addr should be just a couple dereferences
>> > and branches which are always predicted correctly.
>>
>> For anyone who didn't know better, it would seem like you're arguing
>> that Initial Exec is pointless.
> Not pointless, but overrated. And it's not obvious to me that your
> optimization is closer to initial-exec in performance than it is to
> global-dynamic.
So, you find two branches and two loads (besides the call) ânot too
muchâ, but a call and a return âtoo muchâ?, or do you just enjoy to
pointless discussions? :-)
>> What you don't save is the cost of a naked call to a function that just
>> returns, but that's still a lot less than that plus dereferences plus
>> branches plus PLT plus frame setup plus saving and restoring
>> call-clobbered registers at the caller, don't you agree?
> For certain values of "a lot".
Comparatively, it surely is a lot.
>> Phrasing it another way, which of these two scenarios seem faster to
>> you?
> The one whose _measured_ performance is better.
Of course, because there are so many cases in benchmark history in which
doing a, b, c, d, e and f turned out to be faster than just a and f. Of
course you can cite several of them by heart, can't you? ;-)
> While measurement of the access itself (e.g. load from TLS and store
> to a volatile variable, surrounded by RDTSC) would be very
> interesting,
Glad you liked the paper ;-)
> what really matters is whether you can find a real-world example where
> the performance is measurably different.
Not even among industry benchmarks for multi-threaded programs.
I theorize that an important reason why TLS is not used in
performance-critical plugins is that its performance is so unbearable.
One of the goals of that work was to do away with that reason to avoid
TLS in plugins. The other is that, well, computing stuff you don't have
to just because Rich Felker thinks you should is not a mandate I live by
;-)
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist Red Hat Brazil Compiler Engineer