This is the mail archive of the
mailing list for the libc-ports project.
Re: [PATCH 20/26] arm: Implement armv6t2 optimized strlen
Richard Henderson <email@example.com> writes:
> + @ r0 = start of string
> + pld [r0]
> + @ To cater to long strings, we want to search through a few
> + @ characters until we reach an aligned pointer. To cater to
> + @ small strings, we don't want to start doing word operations
> + @ immediately. The compromise is a maximum of 16 bytes less
> + @ whatever is required to end with an aligned pointer.
> + @ r3 = number of characters to search in alignment loop
> + and r3, r0, #7
> + s(mov) r1, r0 @ Save the input pointer
> + rsb r3, r3, #16
> + @ Loop until we find ...
> +1: ldrb r2, [r0], #1
> + subs r3, r3, #1 @ ... the aligment point
> + it ne
> + cmpne r2, #0 @ ... or EOS
> + bne 1b
> + @ Disambiguate the exit possibilites above
> + cmp r2, #0 @ Found EOS
> + ittt eq
> + subeq r0, r0, #1 @ Undo post-inc above
> + subeq r0, r0, r1 @ Subtract input to compute length
> + bxeq lr
> + @ So now we're aligned.
> + ldrd r2, r3, [r0], #8
> + movw ip, #0xfefe
> + pld [r0, #64]
> + movt ip, #0xfefe
> + pld [r0, #128]
> + pld [r0, #192]
> + @ Loop searching for EOS or C, 8 bytes at a time.
This comment seems to be for strchr().
> + @ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
> + @ that was originally zero and 0xff otherwise. Therefore we consider
> + @ the lsb of each byte the "found" bit, with 0 for a match.
> + .balign 16
> +2: uqadd8 r2, r2, ip @ Find EOS
> + uqadd8 r3, r3, ip
> + pld [r0, #256] @ Prefetch 4 lines ahead
> + s(and) r3, r3, r2 @ Combine the two words
> + mvns r3, r3 @ Test for any found bit true
> + it eq
> + ldrdeq r2, r3, [r0], #8
> + beq 2b
Subtracting the values (with UQSUB8) from 1 instead would result in a 0
result any non-zero input and a 1 for "found", i.e. the inverse of what
you have here. Testing for a match anywhere in the double-word then
becomes a single ORRS instruction. Unless I'm making some stupid mistake.
> + @ Found something. Disambiguate between first and second words.
> + @ Adjust r0 to point to the word containing the match.
> + @ Adjust r2 to the found bits for the word containing the match.
> + mvns r2, r2
> + itee ne
> + subne r0, r0, #8
> + moveq r2, r3
> + subeq r0, r0, #4
> + @ Find the bit-offset of the match within the word.
> +#ifdef __ARMEL__
> + rbit r2, r2 @ For LE we need count-trailing-zeros
> + clz r2, r2
> + add r0, r0, r2, lsr #3 @ Adjust the pointer to the found byte
> + s(sub) r0, r0, r1 @ Subtract input to compute length
> + bx lr
This code could be made to work for any ARMv6 by (conditionally)
replacing the MOVW/MOVT with some equivalent and the RBIT by REV. REV
works since only the lsb in each byte can be set, so the result of CLZ
will simply be 7 more than we want, and the 3 low-order bits are shifted