Differences between revisions 2 and 3
Revision 2 as of 2012-08-22 23:47:48
Size: 1324
Comment:
Revision 3 as of 2012-09-30 10:08:57
Size: 2037
Editor: neleai
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
  == Patch: faster string operations for bulldozer ==

Description: This patch changes selection of strlen/rawmemchr implementation into pminub based one. This implementation is upto three times faster than sse4 one that was selected previously.

Benchmark: Original results for fx10: http://kam.mff.cuni.cz/~ondra/benchmark_string/benchmark_strlen_fx10_27_9_2012.tar.bz2 , current version at
http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test_r.html . Benchmark source is https://github.com/neleai/benchmark_string/ commit 4cd8443bfd

Criteria for acceptance: This patch reduced strlen running time on AMD bulldozer by selecting variant that according to benchmark is the fastest one.

Benchmarking for 2.17

Patch: Optimize strstr, strcasestr and memmem

Description: This patch optimizes strstr, strcasestr and memmem functions. This patch speeds up strstr, strcase and memmem functions with short needle inputs by more than 2 times on i686, x86_64, MIPS and other architectures. GLIBC 2.9 introduced new, algorithmically-superior implementation of strstr, strcasestr and memmem functions. Unfortunately, this new implementation uses memchr to detect end-of-string condition which comes at significant overhead compared to piggy-backing matching procedure that GLIBC 2.8 and earlier versions used. The new implementation heavily regressed the case for short needles, making strstr more than 2 times slower. This patch cures the regression and even makes the GLIBC 2.9+ implementation faster than original GLIBC 2.8- version.

Benchmark: http://sourceware.org/bugzilla/show_bug.cgi?id=11607 was improved by 3 times on Core2 system. Beware, Core i7 and later systems will likely use SSE42-optimized implementation of strstr family of functions.

Criteria for acceptance: The patchset reduced linear factor of O(n) algorithm by optimizing detection of EOL and optimizing critical path for the small needles. For details see http://sourceware.org/ml/libc-alpha/2012-05/msg01910.html .

Patch: faster string operations for bulldozer

Description: This patch changes selection of strlen/rawmemchr implementation into pminub based one. This implementation is upto three times faster than sse4 one that was selected previously.

Benchmark: Original results for fx10: http://kam.mff.cuni.cz/~ondra/benchmark_string/benchmark_strlen_fx10_27_9_2012.tar.bz2 , current version at http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test_r.html . Benchmark source is https://github.com/neleai/benchmark_string/ commit 4cd8443bfd

Criteria for acceptance: This patch reduced strlen running time on AMD bulldozer by selecting variant that according to benchmark is the fastest one.

None: benchmarking/results_2_17 (last edited 2012-09-30 10:08:57 by neleai)