Hi Vidya,
Ondrej suggestion is indeed worth of note and I did a quick sniff run by using it
as replacement of default one (string/strpbrk.c). With your patch applied it will
be set as __strpbrk_ppc. The benchmarks looks good, showing better performance
than power7 one:
simple_strpbrk stupid_strpbrk __strpbrk_power7 __strpbrk_ppc
Length 512, alignment 0, rej len 0: 526.453 556.688 35.0156 212.016
Length 512, alignment 0, rej len 0: 525.781 555.922 34.3594 210.328
Length 512, alignment 0, rej len 1: 785.812 729.75 1162.66 209.031
Length 512, alignment 1, rej len 1: 786.172 728 1163.38 214.312
Length 512, alignment 0, rej len 2: 2689.11 989.25 1683.61 280.125
Length 512, alignment 2, rej len 2: 2728.47 991.875 1685.89 219.141
Length 512, alignment 0, rej len 3: 3122.05 2118.22 1894.25 178.328
Length 512, alignment 3, rej len 3: 3325.78 1242.62 1894.52 189
Length 512, alignment 0, rej len 4: 3985.55 1531.14 946.922 187.031
Length 512, alignment 4, rej len 4: 3719.88 1515.48 951.281 181.156
Length 512, alignment 0, rej len 5: 3993.5 1830.55 1250.27 189.75
Length 512, alignment 5, rej len 5: 3998.05 1809.23 1250.67 177.141
Length 512, alignment 0, rej len 6: 3973.31 3708.2 1893.2 188.812
Length 512, alignment 6, rej len 6: 3985.5 3722.45 1898.81 190.875
Length 512, alignment 0, rej len 7: 4690.47 2326.11 2623.25 186.031
Length 512, alignment 7, rej len 7: 4697.55 2323.59 2367.47 188.906
Length 512, alignment 0, rej len 8: 5269.11 2505.69 1545.86 185.75
Length 512, alignment 0, rej len 8: 5269.48 2505.09 1536.22 193.969
Length 512, alignment 0, rej len 9: 5512.75 2778.62 2509.73 187.25
Length 512, alignment 1, rej len 9: 5522.86 2778.44 2029.03 181.859
Length 512, alignment 0, rej len 10: 5890.72 3058.97 3110.77 312.438
Length 512, alignment 2, rej len 10: 5907.34 3042.25 2720.27 263.578
Length 512, alignment 0, rej len 11: 6286.42 3317.62 2945.11 187.891
Length 512, alignment 3, rej len 11: 6286.98 3275.25 2941.34 181.109
Length 512, alignment 0, rej len 12: 6633.17 3520.25 2176.12 186.109
Length 512, alignment 4, rej len 12: 6637.7 3530.88 2186.92 186.5
Length 512, alignment 0, rej len 13: 7035.5 3781.77 2654.42 188.312
Length 512, alignment 5, rej len 13: 7028.75 3815.91 2882.97 183.953
Length 512, alignment 0, rej len 14: 7410.67 7073.2 3119.92 188.344
Length 512, alignment 6, rej len 14: 7449.28 7209.66 3121.75 196.203
Length 512, alignment 0, rej len 15: 8533.19 7368.66 3529.3 187.688
Length 512, alignment 7, rej len 15: 7853.31 7347.17 3604 285.547
Length 512, alignment 0, rej len 16: 8166.94 7675.23 2770.53 187.484
Length 512, alignment 0, rej len 16: 8152.78 7695.42 2758.72 186.047
Length 512, alignment 0, rej len 17: 8654.25 7848.5 3257.39 186.078
Length 512, alignment 1, rej len 17: 8652.77 7848.34 3521 187.078
Length 512, alignment 0, rej len 18: 8931.33 5672.2 3703.25 188.906
Length 512, alignment 2, rej len 18: 8930.69 4997.8 3708.62 191.969
Length 512, alignment 0, rej len 19: 9396.97 5254.09 4202.88 193.984
Length 512, alignment 3, rej len 19: 9405.5 5253.7 4199.2 187.078
Length 512, alignment 0, rej len 20: 9989.86 5541.17 3433.25 189.906
Length 512, alignment 4, rej len 20: 9736.66 5529.2 3417.14 191.375
Length 512, alignment 0, rej len 21: 10193.5 7610 3952.86 192.484
Length 512, alignment 5, rej len 21: 10888.8 7641.55 3989.55 197.859
Length 512, alignment 0, rej len 22: 10602.6 7853.91 4420.22 191.312
Length 512, alignment 6, rej len 22: 10596.6 7846.45 4407.02 192.344
Length 512, alignment 0, rej len 23: 10935.5 8127.03 4879.16 190.75
Length 512, alignment 7, rej len 23: 11226.5 8114.55 4877.48 191.562
Length 512, alignment 0, rej len 24: 11346.8 8912.94 4041.44 192.266
Length 512, alignment 0, rej len 24: 11336.4 8375.3 4070.19 192.234
Length 512, alignment 0, rej len 25: 13179.7 8634.25 4577.47 189.547
Length 512, alignment 1, rej len 25: 11716.4 8646 4582.34 193.328
Length 512, alignment 0, rej len 26: 12086.8 8901.58 5074.72 191.625
Length 512, alignment 2, rej len 26: 12445.3 9456.22 5086.77 196.359
Length 512, alignment 0, rej len 27: 12553.8 9150.98 5531.34 194.359
Length 512, alignment 3, rej len 27: 12548 9152 5524.8 194
Length 512, alignment 0, rej len 28: 13158.4 9437.27 4818.75 191.688
Length 512, alignment 4, rej len 28: 12829.9 9422.06 4787.62 192.609
Length 512, alignment 0, rej len 29: 14055.6 9720.59 5463.33 193.734
Length 512, alignment 5, rej len 29: 13356.1 9687.08 5249.91 197.75
Length 512, alignment 0, rej len 30: 13594.3 9951.38 5778.5 193.438
Length 512, alignment 6, rej len 30: 13750.4 9997.89 5745.81 196.156
Length 512, alignment 0, rej len 31: 14082.3 10777.3 6227.55 195.297
Length 512, alignment 7, rej len 31: 14095.2 10200.6 6411.61 281
Length 32, alignment 0, rej len 4: 228.031 109.047 71.6562 45.1719
Length 32, alignment 1, rej len 4: 228.453 107.5 72.2656 47.0312
Length 64, alignment 0, rej len 4: 453.109 201.672 133.016 55.4062
Length 64, alignment 2, rej len 4: 446.625 202.656 136.797 49.2969
Length 128, alignment 0, rej len 4: 901.75 398.094 243.734 63.75
Length 128, alignment 3, rej len 4: 890.172 389 244.609 69.9219
Length 256, alignment 0, rej len 4: 1798.75 764.094 477.422 98.5781
Length 256, alignment 4, rej len 4: 1769.98 758.25 477.672 111.719
Length 512, alignment 0, rej len 4: 3594.88 1520.91 952.109 176.312
Length 512, alignment 5, rej len 4: 3536.95 1530.27 957.859 171.5
Length 1024, alignment 0, rej len 4: 10388.6 3030.94 1910 333.375
Length 1024, alignment 6, rej len 4: 7070.98 3011.75 1894.55 327
Length 2048, alignment 0, rej len 4: 14372.2 6188.47 3797.06 632.5
Length 2048, alignment 7, rej len 4: 20986.4 6026.5 3762.16 638.016
Length 64, alignment 1, rej len 10: 937.984 393.5 327.75 56.25
Length 64, alignment 2, rej len 10: 747.5 394.062 329.719 60.4062
Length 64, alignment 3, rej len 10: 748.062 404.391 331.922 56.1562
Length 64, alignment 4, rej len 10: 737.641 392.406 327.438 57.4219
Length 64, alignment 5, rej len 10: 743.75 397.328 326.609 61.5469
Length 64, alignment 6, rej len 10: 741.438 401.172 339.109 55.3438
Length 64, alignment 7, rej len 10: 750.406 397.312 328.516 56.25
Length 0, alignment 0, rej len 6: 3.54688 10.4844 10.8281 27.5
Length 1, alignment 0, rej len 6: 14.25 16.4531 16.5625 25.25
Length 2, alignment 0, rej len 6: 19.1875 21.25 21.9062 21.4375
Length 3, alignment 0, rej len 6: 30.5312 30.4375 29.2188 38.6719
Length 4, alignment 0, rej len 6: 45.6562 37.375 28.6719 35.75
Length 5, alignment 0, rej len 6: 45.2812 42.2188 31.6094 27.1094
Length 6, alignment 0, rej len 6: 53.5156 54.125 35.7031 35.5781
Length 7, alignment 0, rej len 6: 62.2031 58.5781 39.3281 36.25
Length 8, alignment 0, rej len 6: 70.1406 65 43.4531 30.6094
Length 9, alignment 0, rej len 6: 80.0156 75.2188 48.0625 35.8594
Length 10, alignment 0, rej len 6: 82.8125 81.375 50.9062 41.9219
Length 11, alignment 0, rej len 6: 95.25 85.5 53.7031 40.1406
Length 12, alignment 0, rej len 6: 103.625 94.9375 56.7031 32.3438
Length 13, alignment 0, rej len 6: 110.969 100.266 60.7031 38.25
Length 14, alignment 0, rej len 6: 117.719 115.547 65.1406 40.1875
Length 15, alignment 0, rej len 6: 124.547 115.172 68.4219 36.7031
Length 16, alignment 0, rej len 6: 136 122.594 76 40.5312
Length 17, alignment 0, rej len 6: 144 132.203 77 31.5
Length 18, alignment 0, rej len 6: 148.188 139.188 78.6094 35.1875
Length 19, alignment 0, rej len 6: 155.734 145.406 86.3594 37.4219
Length 20, alignment 0, rej len 6: 166.984 154.359 86.0938 41.25
Length 21, alignment 0, rej len 6: 172.797 161.078 93.375 38.7969
Length 22, alignment 0, rej len 6: 181.375 165.281 93.2344 43.9688
Length 23, alignment 0, rej len 6: 190.625 175.75 97.625 32.2812
Length 24, alignment 0, rej len 6: 240.922 185.094 101.422 43.1562
Length 25, alignment 0, rej len 6: 204.5 191.312 104.672 33.5
Length 26, alignment 0, rej len 6: 216.75 200.109 110.75 42.4531
Length 27, alignment 0, rej len 6: 224.766 207.25 115.109 45.2344
Length 28, alignment 0, rej len 6: 228.828 209.734 113.25 37.5938
Length 29, alignment 0, rej len 6: 235 219.5 119.828 30.2188
Length 30, alignment 0, rej len 6: 246.734 226.188 124.844 49.9375
Length 31, alignment 0, rej len 6: 254.062 237.25 127.625 45.5938
Length 32, alignment 0, rej len 6: 337.453 241.047 129.781 43.9219
Length 33, alignment 0, rej len 6: 273.594 246.109 134.406 49.0781
Length 34, alignment 0, rej len 6: 277.391 256.375 137.656 46.4531
Length 35, alignment 0, rej len 6: 368.562 264.688 140.062 52.0312
Length 36, alignment 0, rej len 6: 297.078 271.578 147.906 51.8594
Length 37, alignment 0, rej len 6: 310.828 279.156 149.734 44
Length 38, alignment 0, rej len 6: 312.656 287.641 155.703 45.5
Length 39, alignment 0, rej len 6: 393.203 287.297 154.578 47.5
Length 40, alignment 0, rej len 6: 346.25 294.469 205.375 52.6719
Length 41, alignment 0, rej len 6: 401.172 353.062 217.25 48.1562
Length 42, alignment 0, rej len 6: 341.703 309.203 167.172 47.4844
Length 43, alignment 0, rej len 6: 341.297 321.891 172.172 52.6094
Length 44, alignment 0, rej len 6: 363 329 177.25 50.0625
Length 45, alignment 0, rej len 6: 366.562 335.5 178.25 47.3594
Length 46, alignment 0, rej len 6: 373.672 342.203 182.594 53.2812
Length 47, alignment 0, rej len 6: 381.359 345.094 183.812 55.4062
Length 48, alignment 0, rej len 6: 390.016 351.938 188.516 52.8281
Length 49, alignment 0, rej len 6: 409.5 363.281 192.594 53.75
Length 50, alignment 0, rej len 6: 408.859 373.844 196.406 50
Length 51, alignment 0, rej len 6: 425.578 377.406 200.5 54.8125
Length 52, alignment 0, rej len 6: 422.281 380.438 202.406 55.4844
Length 53, alignment 0, rej len 6: 435.328 393.922 208.562 41.75
Length 54, alignment 0, rej len 6: 434.031 401.188 221.547 55.7812
Length 55, alignment 0, rej len 6: 448.094 410.297 217.5 46.8906
Length 56, alignment 0, rej len 6: 455.797 417.172 220.672 55.2812
Length 57, alignment 0, rej len 6: 461.766 415.438 220.109 57.3125
Length 58, alignment 0, rej len 6: 474.344 430.969 228.719 55.8125
Length 59, alignment 0, rej len 6: 476 437.984 230.516 58.5
Length 60, alignment 0, rej len 6: 607.422 449.484 234.281 57.0938
Length 61, alignment 0, rej len 6: 507.953 453.688 244.969 57.75
Length 62, alignment 0, rej len 6: 507.531 463.672 241.281 56.3125
Length 63, alignment 0, rej len 6: 530.281 477.938 246.547 55.8438
So I let you chose if you prefer to push current implementation and then start on
an optimized on based on Ondrej suggestion (which is how x86_64 does) or if you
want to work on a newer implementation to replace the one you just sent.
Reply the list with your decision.
On 27-02-2014 17:53, Adhemerval Zanella wrote:
On 27-02-2014 17:24, OndÅej BÃlka wrote:
On Thu, Feb 27, 2014 at 11:29:34PM +0530, R Vidya wrote:
>From 2784371b29cf426603438fb5cac2b2a1f41940bd Mon Sep 17 00:00:00 2001
From: Vidya Ranganathan <vidya@linux.vnet.ibm.com>
Date: Thu, 27 Feb 2014 09:34:10 -0500
Subject: [PATCH] Optimization for strpbrk() on ppc64 and ppc64le. I have
attached the benchtest output to show the performance improvement.
The optimization is achieved by following techniques:
1.aligned memory access
2.loop unrolling P7 gain
3.CPU pre-fetch to avoid cache miss
These optimizations migth not be enough, a generic x64 uses standard
trick of creating array from needle and then do lookup per charecter
which should be faster when needle is say abcd...xyz. A bit optimized c
implementation of that is below, I will send patch with that.
Qualifying as 'not be enough' is misleading, since the optimization is aiming
to be better than the one PowerPC currently uses (string/strpbrk.c). But your
suggestion in indeed valid and I will ask Vidya to take a look if it is worth
for PowerPC as well. But I'd like to address it in a future thread, since
this optimization is not to change the default algorithm, but rather to optimize
it for POWER.
A main problem of optimizing strbrk is that you often spend more time in
preprocessing than in actual matches. For small constant needles a separate
implementation gets called.
I have in my todo list to cache preprocessed needle information, it
would allow match needles that consist from up to three character ranges
quite quickly but is not feasible now due of long startup time.
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#undef strpbrk
char *
strpbrk (const char *_s, const char *n)
{
unsigned char *s = (unsigned char *) _s;
size_t i;
unsigned char chars[256];
memset (chars, 0, 256);
for (i = 0; n[i]; i++)
chars[(unsigned char) n[i]] = 1;
chars[0] = 1;
i = 0;
while (1)
{
if (chars[s[i++]] == 1)
return s[i - 1] ? s + i - 1 : NULL;
if (chars[s[i++]] == 1)
return s[i - 1] ? s + i - 1 : NULL;
if (chars[s[i++]] == 1)
return s[i - 1] ? s + i - 1 : NULL;
if (chars[s[i++]] == 1)
return s[i - 1] ? s + i - 1 : NULL;
}
}