[PATCH] powerpc64le: Optimize memset for POWER10
Raoni Fassina Firmino
raoni@linux.ibm.com
Thu Apr 29 18:40:49 GMT 2021
Thanks for the review Lucas, please let me know if I missed something.
On Wed, Apr 28, 2021 at 03:48:28PM -0300, Lucas A. M. Magalhaes wrote:
> > + /* After alignment, if there is 127B or less left
> s/127B/64B/
Done.
This is an awkward position/block, this comment is about the whole
block, up until the `beq L(tail_128)`, but there is the label
'L(aligned)' in the middle that makes it hard to underhand. But it truly
is that, after the alignment if there is less than 128 bytes is goes to
the tail, but there is this optimization to go straight to the part of
the tail depending on the amount left.
any way, refrased it to be a single line (less obstrusive) and add a new
one for the second branch:
> > + go directly to the tail. */
> > + cmpldi r5,64
> > + blt L(tail_64)
> > +
> > + .balign 16
> > +L(aligned):
> > + srdi. r0,r5,7
> > + beq L(tail_128)
Here^, added another after L(aligned).
> > +
> > + cmpldi cr5,r5,255
> > + cmpldi cr6,r4,0
> > + crand 27,26,21
> > + bt 27,L(dcbz)
> Maybe add a comment to explain this branch.
Done.
I was counting on the comment on the label itself, but I guess it makes
sense to add a brief comment here also, avoid going back and forward to
understand the condition check.
> > + .balign 16
> > +L(tail_128):
> The label tail_128 made me think that here would be copied 128 bytes.
> Maybe add a comment here.
Done.
Sorry, yes, all this "tail_*" sections are "up to", the number being the
maximum that it will write. But this one is in fact from 64 up to 128.
>
> > + stxv v0+32,0(r6)
> > + stxv v0+32,16(r6)
> > + stxv v0+32,32(r6)
> > + stxv v0+32,48(r6)
> > + addi r6,r6,64
> > + andi. r5,r5,63
> > + beqlr
> > +
> > + .balign 16
> > +L(tail_64):
> Maybe add a comment here to explay this section as well.
Done.
o/
Raoni
More information about the Libc-alpha
mailing list