This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/*] Optimize generic strchrnul and strchr

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: 'Ondřej Bílka' <neleai at seznam dot cz>
Cc: <libc-alpha at sourceware dot org>
Date: Wed, 27 May 2015 13:35:58 +0100
Subject: Re: [PATCH 2/*] Optimize generic strchrnul and strchr
Authentication-results: sourceware.org; auth=none

Ondřej Bílka wrote:
> This is my generic strchr algorithm resubmitted to use skeleton.
>
> Idea to split into cases c<128 and c>128 didn't change.

Why do this?

> So comments? How this perform on different architectures?

In my view using 9 operations for a combined zero check and test 
for another character is too much, it should be 5-7 operations at 
most (the general form is (x - 0x01010101) & ~x & 0x80808080
which is just 3).

You can optimize things further by calculating partial masks for each
of the unrolled cases, ORing them together and only doing a single test
per loop iteration rather than 4 or 8. This also avoids adding a lot of
code and branches to the inner loop which makes the unrolling pointless.

The other thing is support for big-endian - this is generally tricky as
the mask returned by the zero check won't work even if byte-reversed.

Finally first_nonzero_byte should just use __builtin_ffsl (yet another
function that should be inlined by default in the generic string.h...).

Wilco

Follow-Ups:
- Re: [PATCH 2/*] Optimize generic strchrnul and strchr
  - From: OndÅej BÃlka
- Re: [PATCH 2/*] Optimize generic strchrnul and strchr
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]