This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/*] Optimize generic strchrnul and strchr
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'Ondřej Bílka' <neleai at seznam dot cz>
- Cc: <libc-alpha at sourceware dot org>
- Date: Wed, 27 May 2015 13:35:58 +0100
- Subject: Re: [PATCH 2/*] Optimize generic strchrnul and strchr
- Authentication-results: sourceware.org; auth=none
Ondřej Bílka wrote:
> This is my generic strchr algorithm resubmitted to use skeleton.
>
> Idea to split into cases c<128 and c>128 didn't change.
Why do this?
> So comments? How this perform on different architectures?
In my view using 9 operations for a combined zero check and test
for another character is too much, it should be 5-7 operations at
most (the general form is (x - 0x01010101) & ~x & 0x80808080
which is just 3).
You can optimize things further by calculating partial masks for each
of the unrolled cases, ORing them together and only doing a single test
per loop iteration rather than 4 or 8. This also avoids adding a lot of
code and branches to the inner loop which makes the unrolling pointless.
The other thing is support for big-endian - this is generally tricky as
the mask returned by the zero check won't work even if byte-reversed.
Finally first_nonzero_byte should just use __builtin_ffsl (yet another
function that should be inlined by default in the generic string.h...).
Wilco