This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: New x86-64 memcpy

From: Jakub Jelinek <jakub at redhat dot com>
To: "Menezes, Evandro" <evandro dot menezes at amd dot com>
Cc: List GLIBC <libc-alpha at sourceware dot org>, "Meissner, Michael" <michael dot meissner at amd dot com>, "H. J. Lu" <hjl at lucon dot org>
Date: Wed, 28 Feb 2007 09:59:30 +0100
Subject: Re: New x86-64 memcpy
References: <1449F58C868D8D4E9C72945771150BDF0173A222@SAUSEXMB1.amd.com> <20070227155947.GA1826@sunsite.mff.cuni.cz> <1449F58C868D8D4E9C72945771150BDF018DF6ED@SAUSEXMB1.amd.com>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Tue, Feb 27, 2007 at 04:17:47PM -0600, Menezes, Evandro wrote:
> > 1) as the l1/l2 cache sizes and prefetchw flag are only used 
> > in libc.so
> > version, there is no point to have those vars (why were they 
> > 8 byte rather
> > than 4 byte btw?) in _rtld_global, they can very well be hidden inside
> > of libc.so, therefore they can be accessed like:
> > movl _x86_64_l1_cache_size_half(%rip), %r8d
> > which is certainly faster than loading its address from GOT and then
> > using second memory load read the actual value.  The values can be
> > initialized in a static routine with constructor attribute.
> 
> That sounds reasonable.  However, is having the constructor attribute soon enough?

Some DSO constructors are run before that and so is DT_PREINIT_ARRAY of the
executable.  But if the variables have reasonable defaults and as the vast
majority of programs do real performance sensitive work from main rather
than from DSO constructors, I think it is not a big deal.  If it turns out
to be a big problem, something can be certainly added (e.g. handling more
than one DF_1_INITFIRST, mark libc.so that way in addition to libpthread.so
and run this in its DT_INIT).

> They had 8 bytes each in order to allow direct comparisons with the count
> in a register without having to load the value.  Even if in memcpy they
> can be used as 4-byte variables, I have other routines that would benefit
> from them being 8 bytes long.

In the last round of routines you sent I haven't seen that, but sure, if
some var has justification for being 64-bit, so be it.  The important
is just (%rip) addressing.

> > 3) the function didn't have cfi directives, eventhough it changes %rsp
> > and saves/restores call saved registers
> 
> I guess that using the red zone is better.  As the routine has several
> exit points to improve performance, after each one new CFI directives
> would have to be added, which complicates maintaining the code.

Even with red zone you need some CFI directives (which say where %r12/$r13/%r14
have been saved or cfi_restore for them), but don't need any CFA
adjustments.

> > Also, for mempcpy, IMHO it is a bad idea to compute result 
> > value early,
> > I believe in all code paths the right return value is 
> > available in %rdi
> > register, so the pushq/popq %rax would be unneeded for 
> > mempcpy and instead
> > before each rep; retq you'd add #if MEMPCPY_P movq %rdi, %rax #endif.
> 
> I'll double-check that RDI has the expected value always.  Otherwise, I'll
> just use an entry in the red zone.

I believe so.  L(1{,a,b,c,d,loop}) always increment %rdi by the size they
stored into (%rdi).  All other ret's are preceeded by jnz L(1), which relies
on %rdi pointing after the last byte stored.

	Jakub

References:
- New x86-64 memcpy
  - From: Menezes, Evandro
- Re: New x86-64 memcpy
  - From: Jakub Jelinek
- RE: New x86-64 memcpy
  - From: Menezes, Evandro

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]