This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v3] powerpc: strstr optimization
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
- Cc: carlos at redhat dot com, GNU C Library <libc-alpha at sourceware dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>, Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
- Date: Thu, 9 Jul 2015 16:15:46 +0200
- Subject: Re: [PATCH v3] powerpc: strstr optimization
- Authentication-results: sourceware.org; auth=none
- References: <558A5642 dot 5020107 at linux dot vnet dot ibm dot com> <558A5761 dot 2000409 at linux dot vnet dot ibm dot com> <87oajpm8nc dot fsf at totoro dot br dot ibm dot com> <871tgijuri dot fsf at linux dot vnet dot ibm dot com>
On Wed, Jul 08, 2015 at 04:48:17PM -0300, Tulio Magno Quites Machado Filho wrote:
> "Tulio Magno Quites Machado Filho" <tuliom@linux.vnet.ibm.com> writes:
>
> > Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> writes:
> >
> >> I have attached the modified version of the patch calling optimized
> >> strchr() which seems to give better results in both LE and BE.
> >> I have attached the benchresults ran on powerpc64le machine.
> >>
> >> 1) Ran using benchests/bench-strstr.c
> >> 2) Ran using ondrej proposed patches on string optimization and benchtests.
> >
> > This patch looks good to me and has been extensively reviewed in another
> > thread [1].
> >
> > I'm pushing it.
>
> Carlos,
>
> Right after I sent this message, Raji contacted me to inform she had found a
> bug in the POWER 7 implementation of strstr().
>
> The fix is very small and the new version of the patch is attached to this
> email.
>
> Old code:
> and r0, r3, r4
> cmpdi cr7, r0, 0
> beq cr7, L(retnull)
>
> New code:
> cmpdi cr7, r3, 0
> beq cr7, L(retnull)
> cmpdi cr7, r4, 0
> beq cr7, L(retnull)
>
> May I proceed with this commit?
>
No, you couldn't check unreviewed patches unless they are obvious. This patch wasn't reviewed at all. No reviewer said that he read it and
didn't found any bugs. That this contained bug clearly show that.
There was only discussion where I repeatedly asked to provide data that
shows performance of this implementation on ("aaaa..." , "aaa...b") with
2027 byte needle but it wasn't provided only related benchmarks. I also
asked for comparison with a simple generic strstr improvement that uses
strstr. I also didn't see results to see if assembly is really
necessary.
Comparison with my vectorized implementation doesn't show that as its
generic while strchr uses specialized instruction and with these you
would get much better performance, for fair comparison you would need
also better primitive that tests equality.
Then there is problem that benchmarks don't cover common performance
cases. My vectorized digraphs are used because they are more stable, If
first character has frequency 1/1000 then strchr will be optimal. Here
its smaller problem that x64 as strchr check 16 bytes at once instead 64
so we pay smaller branch misprediction penalties. But this also will be
smaller when you decrease probability to 1/16 where strchr overhead will
be more noticable.