This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory


Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> writes:

> On 12/09/2017 01:10 AM, Tulio Magno Quites Machado Filho wrote:
>> 	* manual/tunables.texi (Hardware Capability Tunables): Document
>> 	glibc.tune.cached_memopt.
>> 	* sysdeps/powerpc/cpu-features.c: New file.
>> 	* sysdeps/powerpc/cpu-features.h: New file.
>> 	* sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
>> 	_dl_powerpc_cpu_features.
>> 	* sysdeps/powerpc/dl-tunables.list: New file.
>> 	* sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
>> 	* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: .
>
> Comment missing.

Ooops.

>> 	* sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize
>> 	use_aligned_memopt.
>
> Should this be moved to init-arch.h? (also use_cached_memopt)

Indeed.
Changed to:

	* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
	(INIT_ARCH): Initialize use_aligned_memopt.
	* sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED &&
	IS_IN(rtld))]: Restrict dl_platform_init availability and
	initialize CPU features used by tunables.

>> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
>> new file mode 100644
>> index 0000000..e5b6f25
>> --- /dev/null
>> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
>> @@ -0,0 +1,179 @@
>> +	stxvd2x	v0,r0,r3
>> +L(dst_is_align_16):
>> +	cmpldi	cr7,r5,127
>> +	ble	cr7,L(tail_copy)
>> +	addi	r8,r5,-128
>> +	mr	r9,r12
>> +	rldicr	r8,r8,0,56
>> +	li	r11,16
>> +	srdi	r10,r8,7
>> +	addi	r0,r8,128
>> +	addi	r10,r10,1
>
> Can we directly do
> 	rldicr  r0, r5, 0, 56
> 	srdi    r10,r5,7
> instead of this sequence?
> 79         addi    r8,r5,-128
> 81         rldicr  r8,r8,0,56
> 83         srdi    r10,r8,7
> 84         addi    r0,r8,128
> 85         addi    r10,r10,1

Yes.  I changed that and made more changes for clarity:
 - Replaced rldicr with clrrdi.
 - Replace r0 with 0 where it's treated as an immediate.

Pushed as c9cd7b0ce5c5.

-- 
Tulio Magno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]