This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] powerpc: strstr optimization


On Wed, Jul 08, 2015 at 04:48:17PM -0300, Tulio Magno Quites Machado Filho wrote:
> "Tulio Magno Quites Machado Filho" <tuliom@linux.vnet.ibm.com> writes:
> 
> > Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> writes:
> >
> >> I have attached the modified version of the patch calling optimized
> >> strchr() which seems to give better results in both LE and BE.
> >> I have attached the benchresults ran on powerpc64le machine.
> >>
> >> 1) Ran using benchests/bench-strstr.c
> >> 2) Ran using ondrej proposed patches on string optimization and benchtests.
> >
> > This patch looks good to me and has been extensively reviewed in another
> > thread [1].
> >
> > I'm pushing it.
> 
> Carlos,
> 
> Right after I sent this message, Raji contacted me to inform she had found a
> bug in the POWER 7 implementation of strstr().
> 
> The fix is very small and the new version of the patch is attached to this
> email.
> 
> Old code:
> and	r0, r3, r4
> cmpdi	cr7, r0, 0
> beq	cr7, L(retnull)
> 
> New code:
> cmpdi	cr7, r3, 0
> beq	cr7, L(retnull)
> cmpdi	cr7, r4, 0
> beq	cr7, L(retnull)
> 
> May I proceed with this commit?
> 
No, you couldn't check unreviewed patches unless they are obvious. This patch wasn't reviewed at all. No reviewer said that he read it and
didn't found any bugs. That this contained bug clearly show that.

There was only discussion where I repeatedly asked to provide data that
shows performance of this implementation on ("aaaa..." , "aaa...b") with
2027 byte needle but it wasn't provided only related benchmarks. I also
asked for comparison with a simple generic strstr improvement that uses
strstr. I also didn't see results to see if assembly is really
necessary.

Comparison with my vectorized implementation doesn't show that as its
generic while strchr uses specialized instruction and with these you
would get much better performance, for fair comparison you would need
also better primitive that tests equality.

Then there is problem that benchmarks don't cover common performance
cases. My vectorized digraphs are used because they are more stable, If
first character has frequency 1/1000 then strchr will be optimal. Here
its smaller problem that x64 as strchr check 16 bytes at once instead 64
so we pay smaller branch misprediction penalties. But this also will be
smaller when you decrease probability to 1/16 where strchr overhead will
be more noticable.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]