This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch v2] [BZ 15884] strcoll: improve performance by removing the cache

From: Leonhard Holz <leonhard dot holz at web dot de>
To: libc-alpha at sourceware dot org
Date: Wed, 22 Oct 2014 22:39:49 +0200
Subject: Re: [Patch v2] [BZ 15884] strcoll: improve performance by removing the cache
Authentication-results: sourceware.org; auth=none
References: <543D20F8 dot 9000308 at web dot de> <m28hio$g07$1 at ger dot gmane dot org>

Hi Stefan,

is this a new problem or did it occure before the patch?

Best,
Leonhard

Am 22.10.2014 17:13, schrieb Stefan Liebler:

Hi Leonhard,

great job.
If I run make xcheck on s390x, I get a failure in
string/tst-strcoll-overflow testcase. See tst-strcoll-overflow.out:
0
Expected signal 'Alarm clock' from child, got none

I measured the execution time of this testcase between 35...60s
which is smaller than the defined timeout of 300s.
Can you change the behaviour of the testcase that it is passing
if it was timed out or if it returns successfully without timeout.

Bye
Stefan

On 10/14/2014 03:11 PM, Leonhard Holz wrote:

Hello everybody,

this is a path that should solve bug 15884. It complains about the
performance of strcoll(). It was found out that the runtime of strcoll()
is actually bound to strlen which is needed for calculating the size of
a cache that was installed to improve the comparison performance.

The idea for this patch was that the cache is only useful in rare cases
(strings of same length and same first-level-chars) and that it would be
better to avoid memory allocation at all. To prove this I wrote a
performance test bench-strcoll.c with test data in
benchtests-strcoll.tar.gz. Also modifications in benchtests/Makefile and
localedata/Makefile are necessary to make it work.

After removing the cache the strcoll method showed the predicted
behavior (getting slightly faster) in all but the test case for hindi
word sorting. This was due the hindi text having much more equal words
than the other ones. For equal strings the performance was worse since
all comparison levels were run through and from the second level on the
cache improved the comparison performance of the original version.

Therefore I added a bytewise test via strcmp iff the first level
comparison found that both strings did match because in this case it is
very likely that equal strings are compared. This solved the problem
with the hindi test case and improved the performance of the others.

The output of the benchmark is:

   "strcoll": {
    "": {
     "locale": "en_US.UTF-8",
     "duration": 7.41506e+09,
     "iterations": 16,
     "mean": 4.63441e+08
    },
    "vi_VN.UTF-8": {
     "duration": 7.4185e+07,
     "iterations": 16,
     "mean": 4.63656e+06
    },
    "en_US.UTF-8": {
     "duration": 7.59759e+07,
     "iterations": 16,
     "mean": 4.7485e+06
    },
    "ar_SA.UTF-8": {
     "duration": 7.61479e+07,
     "iterations": 16,
     "mean": 4.75924e+06
    },
    "zh_CN.UTF-8": {
     "duration": 1.37535e+07,
     "iterations": 16,
     "mean": 859594
    },
    "cs_CZ.UTF-8": {
     "duration": 7.90032e+07,
     "iterations": 16,
     "mean": 4.9377e+06
    },
    "en_GB.UTF-8": {
     "duration": 7.4454e+07,
     "iterations": 16,
     "mean": 4.65338e+06
    },
    "da_DK.UTF-8": {
     "duration": 7.265e+07,
     "iterations": 16,
     "mean": 4.54062e+06
    },
    "pl_PL.UTF-8": {
     "duration": 7.28964e+07,
     "iterations": 16,
     "mean": 4.55602e+06
    },
    "fr_FR.UTF-8": {
     "duration": 7.99239e+07,
     "iterations": 16,
     "mean": 4.99524e+06
    },
    "pt_PT.UTF-8": {
     "duration": 7.78347e+07,
     "iterations": 16,
     "mean": 4.86467e+06
    },
    "el_GR.UTF-8": {
     "duration": 1.0959e+08,
     "iterations": 16,
     "mean": 6.84937e+06
    },
    "ru_RU.UTF-8": {
     "duration": 9.39324e+07,
     "iterations": 16,
     "mean": 5.87077e+06
    },
    "iw_IL.UTF-8": {
     "duration": 8.8769e+07,
     "iterations": 16,
     "mean": 5.54806e+06
    },
    "es_ES.UTF-8": {
     "duration": 8.0782e+07,
     "iterations": 16,
     "mean": 5.04888e+06
    },
    "hi_IN.UTF-8": {
     "duration": 3.66962e+09,
     "iterations": 16,
     "mean": 2.29351e+08
    },
    "sv_SE.UTF-8": {
     "duration": 7.41934e+07,
     "iterations": 16,
     "mean": 4.63709e+06
    },
    "hu_HU.UTF-8": {
     "duration": 9.04538e+07,
     "iterations": 16,
     "mean": 5.65336e+06
    },
    "tr_TR.UTF-8": {
     "duration": 7.25579e+07,
     "iterations": 16,
     "mean": 4.53487e+06
    },
    "is_IS.UTF-8": {
     "duration": 6.83783e+07,
     "iterations": 16,
     "mean": 4.27364e+06
    },
    "it_IT.UTF-8": {
     "duration": 7.50307e+07,
     "iterations": 16,
     "mean": 4.68942e+06
    },
    "sr_RS.UTF-8": {
     "duration": 8.14996e+07,
     "iterations": 16,
     "mean": 5.09373e+06
    },
    "ja_JP.UTF-8": {
     "duration": 1.5325e+07,
     "iterations": 16,
     "mean": 957814
    }
   }

That is compared to the previous version:

glibc files    -33.77%
vi_VN.UTF-8    -34.12%
en_US.UTF-8    -42.42%
ar_SA.UTF-8    -27.49%
zh_CN.UTF-8    +07.90%
cs_CZ.UTF-8    -29.67%
en_GB.UTF-8    -28.50%
da_DK.UTF-8    -36.57%
pl_PL.UTF-8    -39.31%
fr_FR.UTF-8    -28.57%
pt_PT.UTF-8    -22.82%
el_GR.UTF-8    -26.77%
ru_RU.UTF-8    -35.81%
iw_IL.UTF-8    -35.34%
es_ES.UTF-8    -34.46%
hi_IN.UTF-8    -00.38%
sv_SE.UTF-8    -36.99%
hu_HU.UTF-8    -16.35%
tr_TR.UTF-8    -27.80%
is_IS.UTF-8    -33.24%
it_IT.UTF-8    -24.39%
sr_RS.UTF-8    -37.55%
ja_JP.UTF-8    +02.84%

Best,
Leonhard


2014-10-14  Leonhard Holz  <leonhard.holz@web.de>

     [BZ #15884]
     * string/strcoll_l.c (STRCOLL): Remove weight and rules cache.
     * benchtests/bench-strcoll.c: New benchmark.
     * benchtests/Makefile (bench-string): Use it.
     * benchtests/strcoll-inputs/lorem_ipsum_ar_SA New file.
     * benchtests/strcoll-inputs/lorem_ipsum_cs_CZ Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_da_DK Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_el_GR Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_en_GB Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_en_US Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_es_ES Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_fr_FR Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_hi_IN Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_hu_HU Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_is_IS Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_it_IT Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_iw_IL Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_ja_JP Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_pl_PL Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_pt_PT Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_ru_RU Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_sr_RS Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_sv_SE Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_tr_TR Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_vi_VN Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_zh_CN Likewise.
     * localedata/Makefile (LOCALES): Generate needed locales.

Follow-Ups:
- Re: [Patch v2] [BZ 15884] strcoll: improve performance by removing the cache
  - From: Stefan Liebler

References:
- [Patch v2] [BZ 15884] strcoll: improve performance by removing the cache
  - From: Leonhard Holz
- Re: [Patch v2] [BZ 15884] strcoll: improve performance by removing the cache
  - From: Stefan Liebler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]