[PATCH v7 2/2] Optimize bsearch() implementation for performance
Kuan-Wei Chiu
visitorckw@gmail.com
Thu Sep 12 16:00:21 GMT 2024
On Wed, Sep 11, 2024 at 06:16:32PM -0400, DJ Delorie wrote:
> Kuan-Wei Chiu <visitorckw@gmail.com> writes:
> > Optimize the bsearch() function to improve binary search performance.
> > Although the code size grew by 8 bytes, the new implementation achieves
> > a 15% reduction in execution time on my x86 machine, according to the
> > bench-bsearch benchmark results.
>
> LGTM as-is but I noted something to try that might speed it up more.
> Reviewed-by: DJ Delorie <dj@redhat.com>
Thank you for your review and suggestions!
>
> > diff --git a/bits/stdlib-bsearch.h b/bits/stdlib-bsearch.h
>
> > bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size,
> > __compar_fn_t __compar)
> > {
> > - size_t __l, __u, __idx;
> > const void *__p;
> > int __comparison;
> >
> > - __l = 0;
> > - __u = __nmemb;
> > - while (__l < __u)
> > + while (__nmemb)
> > {
> > - __idx = (__l + __u) / 2;
> > - __p = (const void *) (((const char *) __base) + (__idx * __size));
> > + __p = (const void *) (((const char *) __base) + ((__nmemb >> 1) * __size));
>
> We're replacing an add/shift with just a shift, but using the existing
> __base and __nmemb instead of the new __l and __u. Ok.
>
> After this, __p points to base[__nmemb>>1]
>
> > __comparison = (*__compar) (__key, __p);
> > - if (__comparison < 0)
> > - __u = __idx;
> > - else if (__comparison > 0)
> > - __l = __idx + 1;
> > - else
> > + if (__comparison == 0)
>
> It seems to me performance would be better if the more likely cases went
> first - checking "__comparison > 0" would, approximately 50% of the
> time, skip the "__comparison == 0" check, but not the other way around.
> Have you tried this with the ">0" test before the "==0" test, or with
> marking the "==0" test as unlikely?
>
> Keeping the >0 and <0 paths close to the top of the loop might help with
> cache hits too, depending on link alignment.
I tried adding unlikely(), but it seems that gcc generates the exact
same code (Note: I only checked on x86). As for moving the > 0 test
first, I also gave it a try, but it actually worsened the benchmark
results. I haven’t looked closely at the generated code to understand
why, but here are the benchmark results:
For the current patch:
{
"timing_type": "hp_timing",
"functions": {
"bsearch": {
"bench-variant": "default",
"results": [
{
"array-size": 100000,
"element-size": 4,
"key-pattern": "descending",
"contained": "no",
"simple": "yes",
"timing": 110.092
},
{
"array-size": 100000,
"element-size": 4,
"key-pattern": "descending",
"contained": "yes",
"simple": "yes",
"timing": 99.0945
},
{
"array-size": 100000,
"element-size": 4,
"key-pattern": "ascending",
"contained": "no",
"simple": "yes",
"timing": 112.395
},
{
"array-size": 100000,
"element-size": 4,
"key-pattern": "ascending",
"contained": "yes",
"simple": "yes",
"timing": 98.5953
}]
}
}
}
With the > 0 test moved to the front:
{
"timing_type": "hp_timing",
"functions": {
"bsearch": {
"bench-variant": "default",
"results": [
{
"array-size": 100000,
"element-size": 4,
"key-pattern": "descending",
"contained": "no",
"simple": "yes",
"timing": 113.875
},
{
"array-size": 100000,
"element-size": 4,
"key-pattern": "descending",
"contained": "yes",
"simple": "yes",
"timing": 101.111
},
{
"array-size": 100000,
"element-size": 4,
"key-pattern": "ascending",
"contained": "no",
"simple": "yes",
"timing": 113.713
},
{
"array-size": 100000,
"element-size": 4,
"key-pattern": "ascending",
"contained": "yes",
"simple": "yes",
"timing": 105.429
}]
}
}
}
>
> > {
> > #if __GNUC_PREREQ(4, 6)
> > # pragma GCC diagnostic push
> > @@ -46,6 +38,12 @@ bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size,
> > # pragma GCC diagnostic pop
> > #endif
> > }
> > + if (__comparison > 0)
> > + {
> > + __base = ((const char *) __p) + __size;
> > + --__nmemb;
> > + }
> > + __nmemb >>= 1;
>
> So if comparison < 0 (key is before probe):
> __nmemb is set to the number of elements before __p which is base[__nmemb>>1]
> __base is unchanged
> - range is 0..__nmemb-1
>
> if comparison is > 0 (key is after probe):
> __base is set to the element after the probe
> If (__nmemb is odd)
> probe point is at (__nmemb-1) >> 1
> same numbers before and after probe
> subtracting one is unneeded but has no effect
>
> 0 1 2 3 4 5 6 7 8 (n == 9) (n>>1 == 4)
> P
> 0 1 2 3 4 5 6 (n == 7) (n>>1 == 3)
> P
>
> If (__nmemb is even)
> probe point is at (__nmemb) >> 1
> even numbers before but odd number after
> subtracting one is needed
>
> 0 1 2 3 4 5 6 7 (n == 8) (n>>1 == 4)
> P
> 0 1 2 3 4 5 (n == 6) (n>>1 == 3)
> P
>
> Ok.
>
> > }
> >
> > return NULL;
>
More information about the Libc-alpha
mailing list