This is the mail archive of the
mailing list for the libc-ports project.
Re: [ARM] Optimised strchr and strlen
On 24 December 2011 21:01, Richard Henderson <email@example.com> wrote:
> On 12/23/2011 12:31 PM, David Gilbert wrote:
>> Sure; it's pretty much the same trick as my strlen routine.
>> OK, so I gave that a go - and the results are:
> I can't help but wonder if just the one branch in the first loop is best.
> Also, it appears one can use uqadd8 and do the aligned two words in parallel
> rather than having everything serialize on the GT flags and SEL.
> I've run this through glibc's test-strchr, but havn't gotten around to
> benchmarking it at all. ?Since you've already got that set up, perhaps
> you could give it a whirl.
Here we go - you're code is the green line; rth_strchr - your uqadd8
trick is very nice;
the peak speed is a nice bit higher than my version using a set of uadd8's and
sel (you get 1 instruction less in the main loop).
The simple routine is still easily winning below 32 bytes though, and
there is still that odd notch at 16.
(I think your uqadd8 trick would be a nice improvement on my strlen
and memchr routines).