This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Improve bench-strstr

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: Carlos O'Donell <carlos at redhat dot com>, 'GNU C Library' <libc-alpha at sourceware dot org>
Cc: nd <nd at arm dot com>
Date: Tue, 30 Oct 2018 15:17:04 +0000
Subject: Re: [PATCH] Improve bench-strstr
References: <DB5PR08MB10308881406BC7E59B6B9D0283F30@DB5PR08MB1030.eurprd08.prod.outlook.com> <de726b48-9ed6-281c-0bdf-0f8468634511@redhat.com> <DB5PR08MB1030ADAA1A8CBE85BFD2AEDE83F30@DB5PR08MB1030.eurprd08.prod.outlook.com>,<b36544c2-a5c8-bc79-219c-3746bb36ec6b@redhat.com>

Hi Carlos,

>>> Why are we adjusting MIN_PAGE_SIZE?
>> 
>> This is needed if you want the buffers to be a bit larger. It seems to reserve
>> only 2 pages by default, but that could mean only 2 512 byte pages rather
>> than a known fixed amount as required by many of the benchtests.
>
> Could you expand on this a bit more? Why are we using something called
> MIN_PAGE_SIZE to do something entirely different?

No idea, I didn't write the benchmark infrastructure. Maybe the original idea
was to test strings close to page boundaries but I have not seen any string
test that actually does that.

What is important is that each test has a minimum amount of workspace it
can use, and currently that is not guaranteed unless you do what various
tests do and set MIN_PAGE_SIZE explicitly.

> If I were to vectorize strstr to use AVX2 or AVX-512, alignment would make
> a difference, notably because you'd have unroll loops that run at the beginning
> before a suitable alignment of input is reached so vector operations can do
> needle searches in parallel?
>
> Why isn't alignment relevant in this case?

Vector instructions typically support unaligned accesses, so alignment doesn't
matter. Even if you explicitly align the search loop, that's a one-off thing - quite
unlike memcpy where you are forced to do unaligned accesses for the complete
copy.

Benchmarking the existing x64 implementation (which uses SSE2) shows
there is no measurable performance difference at any alignment.

>> In general alignment is way overrepresented: traces show strings are often 
>> aligned (far more than you'd think due to globals, alloca and malloc overaligning),
>> and even if there is alignment sensitivity, the exact alignment doesn't matter at all
>> (beyond it's not aligned). So all the combinations of alignments are wasted effort
>> and just clutter the results.
>
> Ah, this is a much more cogent argument against alignment being measured, but
> what we need here is data and comments around this particular issue. How did
> you measure this, what were your results, and can we put them into a comment
> in the sources?

I don't see the issue. My version still measures unaligned cases. However
measuring all possible alignment combinations makes no sense because they
can't be any different. You're either aligned or you're unaligned.

> My personal opinion is that we don't actually have the data to backup such
> claims, so we continue to *look* at performance over alignment as just a
> double check, but I agree that we should *not* weigh all of the tests equally.

We do have the data - unlike memcpy there is no alignment sensitivity in strstr.

Wilco

References:
- [PATCH] Improve bench-strstr
  - From: Wilco Dijkstra
- Re: [PATCH] Improve bench-strstr
  - From: Carlos O'Donell
- Re: [PATCH] Improve bench-strstr
  - From: Wilco Dijkstra
- Re: [PATCH] Improve bench-strstr
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]