This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] tilegx: provide optimized strnlen, strstr, and strcasestr

From: Chris Metcalf <cmetcalf at tilera dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: <libc-alpha at sourceware dot org>
Date: Tue, 9 Jun 2015 15:45:13 -0400
Subject: Re: [PATCH] tilegx: provide optimized strnlen, strstr, and strcasestr
Authentication-results: sourceware.org; auth=none
Authentication-results: sourceware.org; dkim=none (message not signed) header.d=none;
References: <201410021559 dot s92Fx8CN020856 at farm-0002 dot internal dot tilera dot com> <20150603082626 dot GA7952 at domone>
Reply-to: <cmetcalf at ezchip dot com>

On 06/03/2015 04:26 AM, OndÅej BÃlka wrote:

On Mon, Sep 15, 2014 at 08:10:18PM -0400, Chris Metcalf wrote:

strnlen() is based on the existing tile strlen() with length
checking added.  It speeds up by up to 5x, but on average across
the benchtest corpus by around 35%.  No regressions are seen.

strstr() does 8-byte aligned loads and compares using a 2-byte
filter on the first two bytes of the needle and then testing
the remaining bytes in needle using memcmp().  It speeds up
about 5x in the best case (for "found" needles), about 2x looking
at benchtest as a whole, with some slowdowns as much as 45%.
on a few cases (including the "fail" case for 128KB search).

strcasestr() is based on strstr() but uses a SIMD tolower
routine to convert 8-bytes to lower case in 5 instructions.
It also uses a 2-byte filter and then strncasecmp() for the
remaining bytes.  strncasecmp() is not optimized for SIMD, so
there is futher room for improvement.  However, it is still up
to 16x faster for "found" needles, averaging 2x faster on the
whole corpus of benchtests.  It does slow down by up to 35%
on a few cases, similarly to strstr().
---
sysdeps/tile/tilegx/strcasestr.c    |  55 ++++++++
sysdeps/tile/tilegx/string-endian.h |  22 ++-
sysdeps/tile/tilegx/strnlen.c       |  58 ++++++++
sysdeps/tile/tilegx/strstr.c        | 271 ++++++++++++++++++++++++++++++++++++
4 files changed, 401 insertions(+), 5 deletions(-)
create mode 100644 sysdeps/tile/tilegx/strcasestr.c
create mode 100644 sysdeps/tile/tilegx/strnlen.c
create mode 100644 sysdeps/tile/tilegx/strstr.c

I didn't notice this thread before so didn't comment.

First there is bug in strcasestr, as you couldn't always use vector ascii
conversion, you would need to check that with:

   __locale_t loc = _NL_CURRENT_LOCALE;
   struct __locale_data *ctype = loc->__locales[LC_CTYPE];
   int nonascii = ctype->values[_NL_ITEM_INDEX(_NL_CTYPE_NONASCII_CASE)].word;

But you don't need vector conversion there. Just do comparisons with
tolower(x) and toupper(x). I just realized that my strcasestr was
overcomplicated as I assumed that many characters could have same
tolower(x).

Best course of action would be wait until I merge my strstr skeleton and
you map instrincs. Then we could delete these. You use same idea,
skeleton adds many technical speedups like that there was bottleneck in
checking last character alone instead of as digraph which is solved by
merging last two loads to get full word with characters.


Thanks for spotting the bug.  I've filed it as bug 18510.  I'll
hold off trying to do a point fix until it becomes clear whether
or not your more general fixes will hit mainline.

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

References:
- Re: [PATCH] tilegx: provide optimized strnlen, strstr, and strcasestr
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]