This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch] [BZ 15884] strcoll: improve performance by removing the cache


Ok, according to Siddhesh remarks I splitted the patch in two, this which removes the cache and a following one that introduces inline-ing. Formating should now be better and I tried to minimize the changed lines.

2014-09-29  Leonhard Holz  <leonhard.holz@web.de>

	[BZ #15884]
	* string/strcoll_l.c: Remove weight and rules cache.
	* benchtests/bench-strcoll.c: Benchmark for strcoll().
	* benchtests/Makefile: Likewise.
	* benchtests/strcoll-inputs/glibc_files.txt: Benchmark data.
	* benchtests/strcoll-inputs/lorem_ipsum_vietnamese.txt
	* benchtests/strcoll-inputs/lorem_ipsum_latin.txt
	* benchtests/strcoll-inputs/lorem_ipsum_arabic.txt
	* benchtests/strcoll-inputs/lorem_ipsum_l33tspeak.txt
	* benchtests/strcoll-inputs/lorem_ipsum_chinese.txt
	* benchtests/strcoll-inputs/lorem_ipsum_czech.txt
	* benchtests/strcoll-inputs/lorem_ipsum_old_english.txt
	* benchtests/strcoll-inputs/lorem_ipsum_danish.txt
	* benchtests/strcoll-inputs/lorem_ipsum_polish.txt
	* benchtests/strcoll-inputs/lorem_ipsum_french.txt
	* benchtests/strcoll-inputs/lorem_ipsum_portugese.txt
	* benchtests/strcoll-inputs/lorem_ipsum_greek.txt
	* benchtests/strcoll-inputs/lorem_ipsum_russian.txt
	* benchtests/strcoll-inputs/lorem_ipsum_hebrew.txt
	* benchtests/strcoll-inputs/lorem_ipsum_spain.txt
	* benchtests/strcoll-inputs/lorem_ipsum_hindi.txt
	* benchtests/strcoll-inputs/lorem_ipsum_swedish.txt
	* benchtests/strcoll-inputs/lorem_ipsum_hungarian.txt
	* benchtests/strcoll-inputs/lorem_ipsum_turkish.txt
	* benchtests/strcoll-inputs/lorem_ipsum_icelandic.txt
	* benchtests/strcoll-inputs/lorem_ipsum_italian.txt
	* benchtests/strcoll-inputs/lorem_ipsum_yugoslavian.txt
	* benchtests/strcoll-inputs/lorem_ipsum_japanese.txt
	* localedata/Makefile: Generate locales needed for benchtests.

The numbers are:

glibc_files.txt                     en_US.UTF-8    -46.72%
lorem_ipsum_vietnamese.txt          vi_VN.UTF-8    -36.60%
lorem_ipsum_latin.txt               en_US.UTF-8    -45.83%
lorem_ipsum_arabic.txt              ar_SA.UTF-8    -34.11%
lorem_ipsum_l33tspeak.txt           en_US.UTF-8    -46.28%
lorem_ipsum_chinese.txt             zh_CN.UTF-8    +30.95%
lorem_ipsum_czech.txt               cs_CZ.UTF-8    -36.17%
lorem_ipsum_old_english.txt         en_GB.UTF-8    -35.22%
lorem_ipsum_danish.txt              da_DK.UTF-8    -39.22%
lorem_ipsum_polish.txt              pl_PL.UTF-8    -42.62%
lorem_ipsum_french.txt              fr_FR.UTF-8    -31.09%
lorem_ipsum_portugese.txt           pt_PT.UTF-8    -30.27%
lorem_ipsum_greek.txt               el_GR.UTF-8    -32.07%
lorem_ipsum_russian.txt             ru_RU.UTF-8    -36.00%
lorem_ipsum_hebrew.txt              iw_IL.UTF-8    -41.44%
lorem_ipsum_spain.txt               es_ES.UTF-8    -35.64%
lorem_ipsum_hindi.txt               hi_IN.UTF-8    -00.17%
lorem_ipsum_swedish.txt             sv_SE.UTF-8    -38.85%
lorem_ipsum_hungarian.txt           hu_HU.UTF-8    -21.82%
lorem_ipsum_turkish.txt             tr_TR.UTF-8    -38.08%
lorem_ipsum_icelandic.txt           is_IS.UTF-8    -43.40%
lorem_ipsum_italian.txt             it_IT.UTF-8    -30.52%
lorem_ipsum_yugoslavian.txt         sr_RS.UTF-8    -36.41%
lorem_ipsum_japanese.txt            ja_JP.UTF-8    +18.00%

Chinese and japanese are a bit special as AFAIK in these languages every character is a word and the benchmark is probably comparing sentences. Also theses language complete much faster in absolute numbers, about 1e6 vs. 3e6 (new) / 5e6 (old) for alphabetic languages.

Best,
Leonhard

Am 24.09.2014 12:02, schrieb Siddhesh Poyarekar:
On Wed, Sep 24, 2014 at 10:37:21AM +0200, Leonhard Holz wrote:
Hello everybody,

this is a path that should solve bug 15884. It complains about the
performance of strcoll(). It was found out that the runtime of strcoll() is
actually bound to strlen which is needed for calculating the size of a cache
that was installed to improve the comparison performance.

The idea for this patch was that the cache is only useful in rare cases
(strings of same length and same first-level-chars) and that it would be
better to avoid memory allocation at all. To prove this I wrote a
performance test that is found in benchtests-strcoll.tar.bz2. Also
modifications in benchtests/Makefile and localedata/Makefile are necessary
to make it work.

After removing the cache the strcoll method showed the predicted behavior
(getting slightly faster) in all but the test case for hindi word sorting.
This was due the hindi text having much more equal words than the other
ones. For equal strings the performance was worse since all comparison
levels were run through and from the second level on the cache improved the
comparison performance of the original version.

Therefore I added a bytewise test via strcmp iff the first level comparison
found that both strings did match because in this case it is very likely
that equal strings are compared. This solved the problem with the hindi test
case and improved the performance of the others.

Thanks for working on this and also writing a benchmark for it.  The
general approach seems sound to me (I haven't done a deep review yet),
but there are quite a few nits that will need to be worked out, most
of them covered in the contributor checklist[1].

- There are a lot of unrelated whitespace and formatting changes in
   the patch.  Most of them seem to have been made using the GNU indent
   program, which is mostly accurate, but not completely.  Please
   review and fix them up.

- The change needs a changelog which mentions all your changes,
   including all the new files.

- Please include bench-strcoll.c in the patch as well.  It's OK if you
   post the input files in the tarball but the source needs to be
   reviewed.

- bench-strcoll.c has some code formatting issues, especially
   unnecessary braces around single line for/if blocks.

Another improvement was achieved by inlineing both static subroutines.

- Please post the inlining change separately with separate numbers for
   it.  In general we stay away from inlining functions and just let
   the compiler do its job.  However if there is a case where such
   inlining is especially useful, it needs to be accompanied with
   numbers.  So a separate patch with separate numbers for the change
   would be helpful.

- Finally, I don't know if you have signed a copyright assignment with
   the FSF for your changes.  Carlos seems to have mentioned that in
   your previous email thread, but I don't know if you've followed
   through on it since I am not an FSF maintainer.  Maybe one of the
   FSF maintainers can confirm that.

Siddhesh

[1] https://sourceware.org/glibc/wiki/Contribution%20checklist

Attachment: bench-strcoll.c
Description: Text document

Attachment: benchtests_Makefile.diff
Description: Text document

Attachment: localedata_Makefile.diff
Description: Text document

Attachment: strcoll-inputs.tar.bz2
Description: Binary data

Attachment: string_strcoll_l.c.diff
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]