This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] Benchmark strstr hard needles
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: nd <nd at arm dot com>
- Date: Tue, 11 Jun 2019 10:13:05 -0300
- Subject: Re: [PATCH v2] Benchmark strstr hard needles
- References: <VI1PR0801MB2127C0E07A27296FDC38C9AB833E0@VI1PR0801MB2127.eurprd08.prod.outlook.com> <3cacddd7-308b-7579-e6f8-50f6c49778cc@linaro.org> <VI1PR0801MB21272CA7279C9A0AF5AE80CA83070@VI1PR0801MB2127.eurprd08.prod.outlook.com> <VI1PR0801MB2127A0894B0F931DF578CC8383130@VI1PR0801MB2127.eurprd08.prod.outlook.com>
On 10/06/2019 08:46, Wilco Dijkstra wrote:
> Hi Adhemerval,
>
>> +char * volatile null = NULL;
>
>> I am not seeing why exactly it requires a global non static volatile
>> variable to call do_one_test.
>
> You're right, this is not necessary in the current version (IIRC it was to avoid
> a warning), so I have removed it.
>
>> Maybe add some context from where these tests came from, from the
>> libc-alpha discussion?
>
> Sure, I've improved the comments and description to make it clearer what
> the hard needle test is doing and why. New version below.
>
> Wilco
LGTM, thanks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
>
>
> [PATCH v3] Benchmark strstr hard needles
>
> v3: Improve comments/description.
>
> Benchmark needles which exhibit worst-case performance. This shows that
> basic_strstr is quadratic and thus unsuitable for large needles.
> On the other hand the Two-way and new strstr implementations are linear with
> increasing needle sizes. The slowest cases of the two implementations are
> within a factor of 2 on several different microarchitectures. Two-way is
> slowest on inputs which cause a branch mispredict on almost every character.
> The new strstr is slowest on inputs which almost match and result in many
> calls to memcmp. Thanks to Szabolcs for providing various hard needles.
>
>
> ChangeLog:
> 2019-05-21 Wilco Dijkstra <wdijkstr@arm.com>
>
> * benchtests/bench-strstr.c (test_hard_needle): New function.
>
> --
> diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c
> index 31309b24029a96de7381e1050fd89e5d26642e5f..5ec6edcf1869dd198a3326cb85f2a4463ae3417c 100644
> --- a/benchtests/bench-strstr.c
> +++ b/benchtests/bench-strstr.c
> @@ -203,6 +203,81 @@ do_test (size_t align1, size_t align2, size_t len1, size_t len2,
> putchar ('\n');
> }
>
> +/* Test needles which exhibit worst-case performance. This shows that
> + basic_strstr is quadratic and thus unsuitable for large needles.
> + On the other hand Two-way and skip table implementations are linear with
> + increasing needle sizes. The slowest cases of the two implementations are
> + within a factor of 2 on several different microarchitectures. */
> +
> +static void
> +test_hard_needle (size_t ne_len, size_t hs_len)
> +{
> + char *ne = (char *) buf1;
> + char *hs = (char *) buf2;
> +
> + /* Hard needle for strstr algorithm using skip table. This results in many
> + memcmp calls comparing most of the needle. */
> + {
> + memset (ne, 'a', ne_len);
> + ne[ne_len] = '\0';
> + ne[ne_len - 14] = 'b';
> +
> + memset (hs, 'a', hs_len);
> + for (size_t i = ne_len; i <= hs_len; i += ne_len)
> + {
> + hs[i-5] = 'b';
> + hs[i-62] = 'b';
> + }
> +
> + printf ("Length %4zd/%3zd, complex needle 1:", hs_len, ne_len);
> +
> + FOR_EACH_IMPL (impl, 0)
> + do_one_test (impl, hs, ne, NULL);
> + putchar ('\n');
> + }
> +
> + /* 2nd hard needle for strstr algorithm using skip table. This results in
> + many memcmp calls comparing most of the needle. */
> + {
> + memset (ne, 'a', ne_len);
> + ne[ne_len] = '\0';
> + ne[ne_len - 6] = 'b';
> +
> + memset (hs, 'a', hs_len);
> + for (size_t i = ne_len; i <= hs_len; i += ne_len)
> + {
> + hs[i-5] = 'b';
> + hs[i-6] = 'b';
> + }
> +
> + printf ("Length %4zd/%3zd, complex needle 2:", hs_len, ne_len);
> +
> + FOR_EACH_IMPL (impl, 0)
> + do_one_test (impl, hs, ne, NULL);
> + putchar ('\n');
> + }
> +
> + /* Hard needle for Two-way algorithm - the random input causes a large number
> + of branch mispredictions which significantly reduces performance on modern
> + micro architectures. */
> + {
> + for (int i = 0; i < hs_len; i++)
> + hs[i] = (rand () & 255) > 155 ? 'a' : 'b';
> + hs[hs_len] = 0;
> +
> + memset (ne, 'a', ne_len);
> + ne[ne_len-2] = 'b';
> + ne[0] = 'b';
> + ne[ne_len] = 0;
> +
> + printf ("Length %4zd/%3zd, complex needle 3:", hs_len, ne_len);
> +
> + FOR_EACH_IMPL (impl, 0)
> + do_one_test (impl, hs, ne, NULL);
> + putchar ('\n');
> + }
> +}
> +
> static int
> test_main (void)
> {
> @@ -227,6 +302,10 @@ test_main (void)
> do_test (14, 5, hlen, klen, 1);
> }
>
> + test_hard_needle (64, 65536);
> + test_hard_needle (256, 65536);
> + test_hard_needle (1024, 65536);
> +
> return ret;
> }
>
>
>