This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: strstr optimization
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot comcom>
- To: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
- Cc: OndÅej BÃlka <neleai at seznam dot cz>, libc-alpha at sourceware dot org, sjmunroe at us dot ibm dot com
- Date: Mon, 11 May 2015 13:10:48 -0500
- Subject: Re: [PATCH] powerpc: strstr optimization
- Authentication-results: sourceware.org; auth=none
- References: <1429254584-124997-1-git-send-email-raji at linux dot vnet dot ibm dot com> <20150501120510 dot GB2203 at domone> <554704A1 dot 3010007 at linux dot vnet dot ibm dot com> <20150505211414 dot GA18085 at domone> <554A1703 dot 5090707 at linux dot vnet dot ibm dot com>
- Reply-to: munroesj at linux dot vnet dot ibm dot com
On Wed, 2015-05-06 at 18:58 +0530, Rajalakshmi Srinivasaraghavan wrote:
>
> On 05/06/2015 02:44 AM, OndÅej BÃlka wrote:
> >
> > It does. AFAIR for practical strings on x64 its only around 3 times slower than
> > theoretical lower bound.
> >
> > A lower bound is strlen call as you must find terminating null when
> > needle isn't present.
> >
> > I hope that main idea was clear and you will
> > use it. Code I send is complex as it contains ten different speedups
> > that improve performance.
> >
> >
> >> In my optimization there are totally three branches.
> >> case 1# if needle len >= 8 (common case)
> >> case 2# if needle < 8 (common case)
> >> case 3# For page boundary cross haystack. (rare case)
> >>
> > Checking if its case 1 or 2 is relatively costy as in practice haystacks
> > tend to be short. Most apps that use strstr have 90% call haystack less than 64 bytes long.
> >
> > You don't need to know needle size if you know that it doesn't contain
> > first pair. Also its rare that it contains first pair of characters twice.
> strlen of needle is needed in all the cases so as to make sure haystack
> is longer than needle
> as part of initial check before proceeding to search. Once this check is
> passed, strchr is
> called to find the position of first character.
>
I think we should move on with (commit) Raji's patch as is.
While we appreciate you suggestions Ondrej, not all optimization apply
equally to all platforms.
> >
> >> For case 1, I always compare 8 characters at a time using cmpb instruction.
> >> -> perform basic strlen, strnlen, checks. Move haystack based on
> >> strchr result.
> >> -> Load first 8 bytes(taking care of aligned load)from haystack and
> >> needle and compare them.
> >> If they are same proceed to next 8 bytes of haystack and needle.
> >> Else load next doubleword from haystack.Form a doubleword with last
> >> seven characters from first load and first character from
> >> second load and compare again with needle. Proceed in a similar way.
> >> This gives good improvement.
> >>
> > Thats looks worse than naive algorithm on practical strings. Most
> > instructions you do are useless as you have mismatch at first character.
> >
> > Following function should beat it.
> >
> > char *strstr (char *s, char *n)
> > {
> > if (n[0] == 0)
> > return s;
> > while (s = strchr (s, n[0]))
> > {
> > long i;
> > for (i=1; n[i] && n[i] == s[i]; i++);
> > if (n[i] == 0)
> > return s;
> > s++;
> > }
> > }
> Calling strchr in a loop is not showing improvement when compared to
> the proposed patch.
> >
> > Or use musl strstr idea. While implementation is terrible as it in
> > practice unnecessary spends 90% time on preprocessing their loop is
> > relatively good for portable implementation.
> >
> >
>