This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: strstr optimization
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
- Cc: libc-alpha at sourceware dot org, sjmunroe at us dot ibm dot com
- Date: Tue, 12 May 2015 10:49:06 +0200
- Subject: Re: [PATCH] powerpc: strstr optimization
- Authentication-results: sourceware.org; auth=none
- References: <1429254584-124997-1-git-send-email-raji at linux dot vnet dot ibm dot com> <20150501120510 dot GB2203 at domone> <554704A1 dot 3010007 at linux dot vnet dot ibm dot com> <20150505211414 dot GA18085 at domone> <554A1703 dot 5090707 at linux dot vnet dot ibm dot com>
On Wed, May 06, 2015 at 06:58:35PM +0530, Rajalakshmi Srinivasaraghavan wrote:
>
>
> On 05/06/2015 02:44 AM, OndÅej BÃlka wrote:
> >
> >It does. AFAIR for practical strings on x64 its only around 3 times slower than
> >theoretical lower bound.
> >
> >A lower bound is strlen call as you must find terminating null when
> >needle isn't present.
> >
> >I hope that main idea was clear and you will
> >use it. Code I send is complex as it contains ten different speedups
> >that improve performance.
> >
> >
> >>In my optimization there are totally three branches.
> >>case 1# if needle len >= 8 (common case)
> >>case 2# if needle < 8 (common case)
> >>case 3# For page boundary cross haystack. (rare case)
> >>
> >Checking if its case 1 or 2 is relatively costy as in practice haystacks
> >tend to be short. Most apps that use strstr have 90% call haystack less than 64 bytes long.
> >
> >You don't need to know needle size if you know that it doesn't contain
> >first pair. Also its rare that it contains first pair of characters twice.
> strlen of needle is needed in all the cases so as to make sure
> haystack is longer than needle
> as part of initial check before proceeding to search. Once this
> check is passed, strchr is
> called to find the position of first character.
>
No, strlen is entirely unnecessary. When haystack doesn't contain first
triple of characters (and if it contains match is relatively likely)
then question if needle or haystack are larger is irrelevant.
> >
> >>For case 1, I always compare 8 characters at a time using cmpb instruction.
> >>-> perform basic strlen, strnlen, checks. Move haystack based on
> >>strchr result.
> >>-> Load first 8 bytes(taking care of aligned load)from haystack and
> >>needle and compare them.
> >>If they are same proceed to next 8 bytes of haystack and needle.
> >>Else load next doubleword from haystack.Form a doubleword with last
> >>seven characters from first load and first character from
> >>second load and compare again with needle. Proceed in a similar way.
> >>This gives good improvement.
> >>
> >Thats looks worse than naive algorithm on practical strings. Most
> >instructions you do are useless as you have mismatch at first character.
> >
> >Following function should beat it.
> >
> >char *strstr (char *s, char *n)
> >{
> > if (n[0] == 0)
> > return s;
> > while (s = strchr (s, n[0]))
> > {
> > long i;
> > for (i=1; n[i] && n[i] == s[i]; i++);
> > if (n[i] == 0)
> > return s;
> > s++;
> > }
> >}
> Calling strchr in a loop is not showing improvement when compared
> to the proposed patch.
> >
Evidence?