This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PowerPC: memset optimization for POWER8/PPC64
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Mon, 21 Jul 2014 10:17:31 -0300
- Subject: Re: PowerPC: memset optimization for POWER8/PPC64
- Authentication-results: sourceware.org; auth=none
- References: <53C920CD dot 8030506 at linux dot vnet dot ibm dot com> <20140721054033 dot GA8087 at gate dot crashing dot org>
Hi Segher,
On 21-07-2014 02:40, Segher Boessenkool wrote:
> Hi,
>
> Some minor spellos... Looks fine otherwise.
>
>
>> + andi. r11,r10,r15 /* Check alignment of DST. */
> s/r15/15/
>
>> + /* Size betwen 32 and 255 bytes with constant different than 0, use
>> + doubleword store instruction to achieve best throughput. */
> s/betwen/between/
>
>> + /* Replicate set byte to quardword in VMX register. */
> s/quard/quad/
>
>> + addi 10,r10,64
> s/10/r10/
>
>> + /* Special case when value is 0 and we have a long length to deal
>> + with. Use dcbz to zero out a full cacheline of 128-bytes at a time.
>> + Before using dcbz though, we need to get the destination 128-bytes
>> + aligned. */
> s/128-bytes/128 bytes/ both times. Or "128-byte" the second time?
>
>> +L(write_LT_32):
>> + cmpldi cr6,5,8
>> + mtocrf 0x01,5
> s/5/r5/ both times.
>
>
> Segher
>
I have fixed all, thanks.