This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH]: Optimization for strpbrk on PowerPC


Hi
    Agree, I am also seeing performance improvement with x86_64 approach
The advantage I am seeing with this approach is that it avoids scanning duplicate chars in the needle via hash table/dictionary in addition to loop unrolling.

Adhemerval ,
Lets push the current implementation and work on similar approach for Power7
shortly. thanks.

Regards
Vidya



On Friday 28 February 2014 02:39 AM, Adhemerval Zanella wrote:
Hi Vidya,

Ondrej suggestion is indeed worth of note and I did a quick sniff run by using it
as replacement of default one (string/strpbrk.c). With your patch applied it will
be set as __strpbrk_ppc. The benchmarks looks good, showing better performance
than power7 one:

                                         simple_strpbrk  stupid_strpbrk  __strpbrk_power7        __strpbrk_ppc
Length  512, alignment  0, rej len  0:  526.453 556.688 35.0156 212.016
Length  512, alignment  0, rej len  0:  525.781 555.922 34.3594 210.328
Length  512, alignment  0, rej len  1:  785.812 729.75  1162.66 209.031
Length  512, alignment  1, rej len  1:  786.172 728     1163.38 214.312
Length  512, alignment  0, rej len  2:  2689.11 989.25  1683.61 280.125
Length  512, alignment  2, rej len  2:  2728.47 991.875 1685.89 219.141
Length  512, alignment  0, rej len  3:  3122.05 2118.22 1894.25 178.328
Length  512, alignment  3, rej len  3:  3325.78 1242.62 1894.52 189
Length  512, alignment  0, rej len  4:  3985.55 1531.14 946.922 187.031
Length  512, alignment  4, rej len  4:  3719.88 1515.48 951.281 181.156
Length  512, alignment  0, rej len  5:  3993.5  1830.55 1250.27 189.75
Length  512, alignment  5, rej len  5:  3998.05 1809.23 1250.67 177.141
Length  512, alignment  0, rej len  6:  3973.31 3708.2  1893.2  188.812
Length  512, alignment  6, rej len  6:  3985.5  3722.45 1898.81 190.875
Length  512, alignment  0, rej len  7:  4690.47 2326.11 2623.25 186.031
Length  512, alignment  7, rej len  7:  4697.55 2323.59 2367.47 188.906
Length  512, alignment  0, rej len  8:  5269.11 2505.69 1545.86 185.75
Length  512, alignment  0, rej len  8:  5269.48 2505.09 1536.22 193.969
Length  512, alignment  0, rej len  9:  5512.75 2778.62 2509.73 187.25
Length  512, alignment  1, rej len  9:  5522.86 2778.44 2029.03 181.859
Length  512, alignment  0, rej len 10:  5890.72 3058.97 3110.77 312.438
Length  512, alignment  2, rej len 10:  5907.34 3042.25 2720.27 263.578
Length  512, alignment  0, rej len 11:  6286.42 3317.62 2945.11 187.891
Length  512, alignment  3, rej len 11:  6286.98 3275.25 2941.34 181.109
Length  512, alignment  0, rej len 12:  6633.17 3520.25 2176.12 186.109
Length  512, alignment  4, rej len 12:  6637.7  3530.88 2186.92 186.5
Length  512, alignment  0, rej len 13:  7035.5  3781.77 2654.42 188.312
Length  512, alignment  5, rej len 13:  7028.75 3815.91 2882.97 183.953
Length  512, alignment  0, rej len 14:  7410.67 7073.2  3119.92 188.344
Length  512, alignment  6, rej len 14:  7449.28 7209.66 3121.75 196.203
Length  512, alignment  0, rej len 15:  8533.19 7368.66 3529.3  187.688
Length  512, alignment  7, rej len 15:  7853.31 7347.17 3604    285.547
Length  512, alignment  0, rej len 16:  8166.94 7675.23 2770.53 187.484
Length  512, alignment  0, rej len 16:  8152.78 7695.42 2758.72 186.047
Length  512, alignment  0, rej len 17:  8654.25 7848.5  3257.39 186.078
Length  512, alignment  1, rej len 17:  8652.77 7848.34 3521    187.078
Length  512, alignment  0, rej len 18:  8931.33 5672.2  3703.25 188.906
Length  512, alignment  2, rej len 18:  8930.69 4997.8  3708.62 191.969
Length  512, alignment  0, rej len 19:  9396.97 5254.09 4202.88 193.984
Length  512, alignment  3, rej len 19:  9405.5  5253.7  4199.2  187.078
Length  512, alignment  0, rej len 20:  9989.86 5541.17 3433.25 189.906
Length  512, alignment  4, rej len 20:  9736.66 5529.2  3417.14 191.375
Length  512, alignment  0, rej len 21:  10193.5 7610    3952.86 192.484
Length  512, alignment  5, rej len 21:  10888.8 7641.55 3989.55 197.859
Length  512, alignment  0, rej len 22:  10602.6 7853.91 4420.22 191.312
Length  512, alignment  6, rej len 22:  10596.6 7846.45 4407.02 192.344
Length  512, alignment  0, rej len 23:  10935.5 8127.03 4879.16 190.75
Length  512, alignment  7, rej len 23:  11226.5 8114.55 4877.48 191.562
Length  512, alignment  0, rej len 24:  11346.8 8912.94 4041.44 192.266
Length  512, alignment  0, rej len 24:  11336.4 8375.3  4070.19 192.234
Length  512, alignment  0, rej len 25:  13179.7 8634.25 4577.47 189.547
Length  512, alignment  1, rej len 25:  11716.4 8646    4582.34 193.328
Length  512, alignment  0, rej len 26:  12086.8 8901.58 5074.72 191.625
Length  512, alignment  2, rej len 26:  12445.3 9456.22 5086.77 196.359
Length  512, alignment  0, rej len 27:  12553.8 9150.98 5531.34 194.359
Length  512, alignment  3, rej len 27:  12548   9152    5524.8  194
Length  512, alignment  0, rej len 28:  13158.4 9437.27 4818.75 191.688
Length  512, alignment  4, rej len 28:  12829.9 9422.06 4787.62 192.609
Length  512, alignment  0, rej len 29:  14055.6 9720.59 5463.33 193.734
Length  512, alignment  5, rej len 29:  13356.1 9687.08 5249.91 197.75
Length  512, alignment  0, rej len 30:  13594.3 9951.38 5778.5  193.438
Length  512, alignment  6, rej len 30:  13750.4 9997.89 5745.81 196.156
Length  512, alignment  0, rej len 31:  14082.3 10777.3 6227.55 195.297
Length  512, alignment  7, rej len 31:  14095.2 10200.6 6411.61 281
Length   32, alignment  0, rej len  4:  228.031 109.047 71.6562 45.1719
Length   32, alignment  1, rej len  4:  228.453 107.5   72.2656 47.0312
Length   64, alignment  0, rej len  4:  453.109 201.672 133.016 55.4062
Length   64, alignment  2, rej len  4:  446.625 202.656 136.797 49.2969
Length  128, alignment  0, rej len  4:  901.75  398.094 243.734 63.75
Length  128, alignment  3, rej len  4:  890.172 389     244.609 69.9219
Length  256, alignment  0, rej len  4:  1798.75 764.094 477.422 98.5781
Length  256, alignment  4, rej len  4:  1769.98 758.25  477.672 111.719
Length  512, alignment  0, rej len  4:  3594.88 1520.91 952.109 176.312
Length  512, alignment  5, rej len  4:  3536.95 1530.27 957.859 171.5
Length 1024, alignment  0, rej len  4:  10388.6 3030.94 1910    333.375
Length 1024, alignment  6, rej len  4:  7070.98 3011.75 1894.55 327
Length 2048, alignment  0, rej len  4:  14372.2 6188.47 3797.06 632.5
Length 2048, alignment  7, rej len  4:  20986.4 6026.5  3762.16 638.016
Length   64, alignment  1, rej len 10:  937.984 393.5   327.75  56.25
Length   64, alignment  2, rej len 10:  747.5   394.062 329.719 60.4062
Length   64, alignment  3, rej len 10:  748.062 404.391 331.922 56.1562
Length   64, alignment  4, rej len 10:  737.641 392.406 327.438 57.4219
Length   64, alignment  5, rej len 10:  743.75  397.328 326.609 61.5469
Length   64, alignment  6, rej len 10:  741.438 401.172 339.109 55.3438
Length   64, alignment  7, rej len 10:  750.406 397.312 328.516 56.25
Length    0, alignment  0, rej len  6:  3.54688 10.4844 10.8281 27.5
Length    1, alignment  0, rej len  6:  14.25   16.4531 16.5625 25.25
Length    2, alignment  0, rej len  6:  19.1875 21.25   21.9062 21.4375
Length    3, alignment  0, rej len  6:  30.5312 30.4375 29.2188 38.6719
Length    4, alignment  0, rej len  6:  45.6562 37.375  28.6719 35.75
Length    5, alignment  0, rej len  6:  45.2812 42.2188 31.6094 27.1094
Length    6, alignment  0, rej len  6:  53.5156 54.125  35.7031 35.5781
Length    7, alignment  0, rej len  6:  62.2031 58.5781 39.3281 36.25
Length    8, alignment  0, rej len  6:  70.1406 65      43.4531 30.6094
Length    9, alignment  0, rej len  6:  80.0156 75.2188 48.0625 35.8594
Length   10, alignment  0, rej len  6:  82.8125 81.375  50.9062 41.9219
Length   11, alignment  0, rej len  6:  95.25   85.5    53.7031 40.1406
Length   12, alignment  0, rej len  6:  103.625 94.9375 56.7031 32.3438
Length   13, alignment  0, rej len  6:  110.969 100.266 60.7031 38.25
Length   14, alignment  0, rej len  6:  117.719 115.547 65.1406 40.1875
Length   15, alignment  0, rej len  6:  124.547 115.172 68.4219 36.7031
Length   16, alignment  0, rej len  6:  136     122.594 76      40.5312
Length   17, alignment  0, rej len  6:  144     132.203 77      31.5
Length   18, alignment  0, rej len  6:  148.188 139.188 78.6094 35.1875
Length   19, alignment  0, rej len  6:  155.734 145.406 86.3594 37.4219
Length   20, alignment  0, rej len  6:  166.984 154.359 86.0938 41.25
Length   21, alignment  0, rej len  6:  172.797 161.078 93.375  38.7969
Length   22, alignment  0, rej len  6:  181.375 165.281 93.2344 43.9688
Length   23, alignment  0, rej len  6:  190.625 175.75  97.625  32.2812
Length   24, alignment  0, rej len  6:  240.922 185.094 101.422 43.1562
Length   25, alignment  0, rej len  6:  204.5   191.312 104.672 33.5
Length   26, alignment  0, rej len  6:  216.75  200.109 110.75  42.4531
Length   27, alignment  0, rej len  6:  224.766 207.25  115.109 45.2344
Length   28, alignment  0, rej len  6:  228.828 209.734 113.25  37.5938
Length   29, alignment  0, rej len  6:  235     219.5   119.828 30.2188
Length   30, alignment  0, rej len  6:  246.734 226.188 124.844 49.9375
Length   31, alignment  0, rej len  6:  254.062 237.25  127.625 45.5938
Length   32, alignment  0, rej len  6:  337.453 241.047 129.781 43.9219
Length   33, alignment  0, rej len  6:  273.594 246.109 134.406 49.0781
Length   34, alignment  0, rej len  6:  277.391 256.375 137.656 46.4531
Length   35, alignment  0, rej len  6:  368.562 264.688 140.062 52.0312
Length   36, alignment  0, rej len  6:  297.078 271.578 147.906 51.8594
Length   37, alignment  0, rej len  6:  310.828 279.156 149.734 44
Length   38, alignment  0, rej len  6:  312.656 287.641 155.703 45.5
Length   39, alignment  0, rej len  6:  393.203 287.297 154.578 47.5
Length   40, alignment  0, rej len  6:  346.25  294.469 205.375 52.6719
Length   41, alignment  0, rej len  6:  401.172 353.062 217.25  48.1562
Length   42, alignment  0, rej len  6:  341.703 309.203 167.172 47.4844
Length   43, alignment  0, rej len  6:  341.297 321.891 172.172 52.6094
Length   44, alignment  0, rej len  6:  363     329     177.25  50.0625
Length   45, alignment  0, rej len  6:  366.562 335.5   178.25  47.3594
Length   46, alignment  0, rej len  6:  373.672 342.203 182.594 53.2812
Length   47, alignment  0, rej len  6:  381.359 345.094 183.812 55.4062
Length   48, alignment  0, rej len  6:  390.016 351.938 188.516 52.8281
Length   49, alignment  0, rej len  6:  409.5   363.281 192.594 53.75
Length   50, alignment  0, rej len  6:  408.859 373.844 196.406 50
Length   51, alignment  0, rej len  6:  425.578 377.406 200.5   54.8125
Length   52, alignment  0, rej len  6:  422.281 380.438 202.406 55.4844
Length   53, alignment  0, rej len  6:  435.328 393.922 208.562 41.75
Length   54, alignment  0, rej len  6:  434.031 401.188 221.547 55.7812
Length   55, alignment  0, rej len  6:  448.094 410.297 217.5   46.8906
Length   56, alignment  0, rej len  6:  455.797 417.172 220.672 55.2812
Length   57, alignment  0, rej len  6:  461.766 415.438 220.109 57.3125
Length   58, alignment  0, rej len  6:  474.344 430.969 228.719 55.8125
Length   59, alignment  0, rej len  6:  476     437.984 230.516 58.5
Length   60, alignment  0, rej len  6:  607.422 449.484 234.281 57.0938
Length   61, alignment  0, rej len  6:  507.953 453.688 244.969 57.75
Length   62, alignment  0, rej len  6:  507.531 463.672 241.281 56.3125
Length   63, alignment  0, rej len  6:  530.281 477.938 246.547 55.8438

So I let you chose if you prefer to push current implementation and then start on
an optimized on based on Ondrej suggestion (which is how x86_64 does) or if you
want to work on a newer implementation to replace the one you just sent.

Reply the list with your decision.

On 27-02-2014 17:53, Adhemerval Zanella wrote:
On 27-02-2014 17:24, OndÅej BÃlka wrote:
On Thu, Feb 27, 2014 at 11:29:34PM +0530, R Vidya wrote:
>From 2784371b29cf426603438fb5cac2b2a1f41940bd Mon Sep 17 00:00:00 2001
From: Vidya Ranganathan <vidya@linux.vnet.ibm.com>
Date: Thu, 27 Feb 2014 09:34:10 -0500
Subject: [PATCH] Optimization for strpbrk() on ppc64 and ppc64le. I have
  attached the benchtest output to show the performance improvement.

The optimization is achieved by following techniques:
1.aligned memory access
2.loop unrolling P7 gain
3.CPU pre-fetch to avoid cache miss

These optimizations migth not be enough, a generic x64 uses standard
trick of creating array from needle and then do lookup per charecter
which should be faster when needle is say abcd...xyz. A bit optimized c
implementation of that is below, I will send patch with that.
Qualifying as 'not be enough' is misleading, since the optimization is aiming
to be better than the one PowerPC currently uses (string/strpbrk.c). But your
suggestion in indeed valid and I will ask Vidya to take a look if it is worth
for PowerPC as well. But I'd like to address it in a future thread, since
this optimization is not to change the default algorithm, but rather to optimize
it for POWER.


A main problem of optimizing strbrk is that you often spend more time in
preprocessing than in actual matches. For small constant needles a separate
implementation gets called.

I have in my todo list to cache preprocessed needle information, it
would allow match needles that consist from up to three character ranges
quite quickly but is not feasible now due of long startup time.

#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#undef strpbrk

char *
strpbrk (const char *_s, const char *n)
{
   unsigned char *s = (unsigned char *) _s;
   size_t i;
   unsigned char chars[256];
   memset (chars, 0, 256);
   for (i = 0; n[i]; i++)
     chars[(unsigned char) n[i]] = 1;
   chars[0] = 1;
   i = 0;
   while (1)
     {
       if (chars[s[i++]] == 1)
	return s[i - 1] ? s + i - 1 : NULL;

       if (chars[s[i++]] == 1)
	return s[i - 1] ? s + i - 1 : NULL;

       if (chars[s[i++]] == 1)
	return s[i - 1] ? s + i - 1 : NULL;

       if (chars[s[i++]] == 1)
	return s[i - 1] ? s + i - 1 : NULL;
     }
}



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]