This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory
- From: "Tulio Magno Quites Machado Filho" <tuliom at linux dot vnet dot ibm dot com>
- To: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>, libc-alpha at sourceware dot org
- Cc:
- Date: Mon, 11 Dec 2017 17:48:13 -0200
- Subject: Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory
- Authentication-results: sourceware.org; auth=none
- References: <87vaik8uxy.fsf@linux.vnet.ibm.com> <20171208194020.5005-1-tuliom@linux.vnet.ibm.com> <c1c65d6c-51cb-2be4-b292-cbad462e7983@linux.vnet.ibm.com>
Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> writes:
> On 12/09/2017 01:10 AM, Tulio Magno Quites Machado Filho wrote:
>> * manual/tunables.texi (Hardware Capability Tunables): Document
>> glibc.tune.cached_memopt.
>> * sysdeps/powerpc/cpu-features.c: New file.
>> * sysdeps/powerpc/cpu-features.h: New file.
>> * sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
>> _dl_powerpc_cpu_features.
>> * sysdeps/powerpc/dl-tunables.list: New file.
>> * sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
>> * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: .
>
> Comment missing.
Ooops.
>> * sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize
>> use_aligned_memopt.
>
> Should this be moved to init-arch.h? (also use_cached_memopt)
Indeed.
Changed to:
* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
(INIT_ARCH): Initialize use_aligned_memopt.
* sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED &&
IS_IN(rtld))]: Restrict dl_platform_init availability and
initialize CPU features used by tunables.
>> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
>> new file mode 100644
>> index 0000000..e5b6f25
>> --- /dev/null
>> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
>> @@ -0,0 +1,179 @@
>> + stxvd2x v0,r0,r3
>> +L(dst_is_align_16):
>> + cmpldi cr7,r5,127
>> + ble cr7,L(tail_copy)
>> + addi r8,r5,-128
>> + mr r9,r12
>> + rldicr r8,r8,0,56
>> + li r11,16
>> + srdi r10,r8,7
>> + addi r0,r8,128
>> + addi r10,r10,1
>
> Can we directly do
> rldicr r0, r5, 0, 56
> srdi r10,r5,7
> instead of this sequence?
> 79 addi r8,r5,-128
> 81 rldicr r8,r8,0,56
> 83 srdi r10,r8,7
> 84 addi r0,r8,128
> 85 addi r10,r10,1
Yes. I changed that and made more changes for clarity:
- Replaced rldicr with clrrdi.
- Replace r0 with 0 where it's treated as an immediate.
Pushed as c9cd7b0ce5c5.
--
Tulio Magno