This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, AArch64] Add optimized strrchr

On Wed, Jan 07, 2015 at 10:47:39AM +0000, Richard Earnshaw wrote:
> Similar to the strchr implementation, attached is an implementation of
> strrchr.  Unlike the generic C version, this finds the last instance of
> a string without having to repeatedly call strchr.  This gives notable
> performance improvements when there is more than one instance of the
> target character in the string (something the benchmarks never measure,
> by the way).
> OK?
It looks mostly ok.

As in strchr I would first do unaligned check if there is no page
crossing. That looks like best way to handle short strings, otherwise
you pay penalty due misalignment causes you could write only one byte.
There is potential drawback that you sometimes fetch extra cache line
beyond string but on x64 it did less harm than misalignment.

As for benchmarks you could measure that well but problem is decide if
its improvement or not. There are several alternative ways how write
function and each is best for some sort of workloads and bad for others.

A memrchr(s, c, strlen(s)+1) that I pinged will be likely faster than this for 32kb
strings as last math is likely close to end. 
It will be slower than this when there is only one character close to

You cannot decide what is better without profiling which would side with
generic one once strings are sufficiently large.

However that does not justify using that. Problem is that you spend most
time on relatively short strings. These are dominated by additive
overhead of setting loop and branch misprediction.

Also there are some microoptimizations.

> +ENTRY(strrchr)
> +       cbz     x1, L(null_search)
> +L(null_search):
> +       b       __strchrnul

Simple improvement is use rawmemchr. With some effort you could avoid
this check when you setup masks to include terminating 0.. 

> +       add     tmp3, tmp3, #2
> +       sub     result, src_match, tmp3, lsr #1
> +       /* But if the syndrome shows no match was found, then return NULL.  */
> +       cmp     src_offset, #0
> +       csel    result, result, xzr, ne
> +
> +       ret

You do not need this check. Just initialize src_match and src_offset to
values that produce null.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]