This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PowerPC: memset optimization for POWER8/PPC64
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Mon, 21 Jul 2014 00:40:33 -0500
- Subject: Re: PowerPC: memset optimization for POWER8/PPC64
- Authentication-results: sourceware.org; auth=none
- References: <53C920CD dot 8030506 at linux dot vnet dot ibm dot com>
Hi,
Some minor spellos... Looks fine otherwise.
> + andi. r11,r10,r15 /* Check alignment of DST. */
s/r15/15/
> + /* Size betwen 32 and 255 bytes with constant different than 0, use
> + doubleword store instruction to achieve best throughput. */
s/betwen/between/
> + /* Replicate set byte to quardword in VMX register. */
s/quard/quad/
> + addi 10,r10,64
s/10/r10/
> + /* Special case when value is 0 and we have a long length to deal
> + with. Use dcbz to zero out a full cacheline of 128-bytes at a time.
> + Before using dcbz though, we need to get the destination 128-bytes
> + aligned. */
s/128-bytes/128 bytes/ both times. Or "128-byte" the second time?
> +L(write_LT_32):
> + cmpldi cr6,5,8
> + mtocrf 0x01,5
s/5/r5/ both times.
Segher