This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Consensus: Tuning runtime behaviour with environment variables.
- From: Alexandre Oliva <aoliva at redhat dot com>
- To: Rich Felker <dalias at aerifal dot cx>
- Cc: libc-alpha at sourceware dot org
- Date: Sun, 02 Jun 2013 20:02:02 -0300
- Subject: Re: Consensus: Tuning runtime behaviour with environment variables.
- References: <51A58A92 dot 4050508 at redhat dot com> <20130529055518 dot GA23030 at domone dot kolej dot mff dot cuni dot cz> <ormwraq3rx dot fsf at livre dot home> <20130601031151 dot GK20323 at brightrain dot aerifal dot cx> <ora9n9i3jc dot fsf at livre dot home> <20130602154150 dot GN20323 at brightrain dot aerifal dot cx> <ortxlgh2an dot fsf at livre dot home> <20130602215358 dot GB29800 at brightrain dot aerifal dot cx>
On Jun 2, 2013, Rich Felker <dalias@aerifal.cx> wrote:
> On Sun, Jun 02, 2013 at 02:04:32PM -0300, Alexandre Oliva wrote:
>> On Jun 2, 2013, Rich Felker <dalias@aerifal.cx> wrote:
>>
>> > Do you have any performance figures to justify this?
>>
>> Sure, in the paper.
> I read the text version
You read the spec for x86 relocations.
Saying it's a âtext versionâ is a bit like saying that the ODF ISO
Standard is a version of LibreOffice :-D
> The hot path of __tls_get_addr should be just a couple dereferences
> and branches which are always predicted correctly.
For anyone who didn't know better, it would seem like you're arguing
that Initial Exec is pointless.
> If it's slower than that in glibc, that's a bug in glibc.
It's not, but a couple of dereferences and branches is a lot more than
nothing. With the TLS access model I proposed and implemented, you save
all of that, including the cost of going through the PLT for the call.
What you don't save is the cost of a naked call to a function that just
returns, but that's still a lot less than that plus dereferences plus
branches plus PLT plus frame setup plus saving and restoring
call-clobbered registers at the caller, don't you agree?
Phrasing it another way, which of these two scenarios seem faster to
you?
1.
foo:
ret
foocaller:
call foo
2.
bar:
set up frame
load
compare
branch
load
compare
branch
# fast path:
load
branch end
...
# end
restore frame
ret
barcaller:
save call-clobbered regs
call bar
restore call-clobbered regs if needed
If you say both of them perform the same on your computer, how do I get
one of those? :-D
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist Red Hat Brazil Compiler Engineer