This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/* v3] Generic string function optimization: Add skeleton
- From: Richard Henderson <rth at twiddle dot net>
- To: OndÅej BÃlka <neleai at seznam dot cz>, libc-alpha at sourceware dot org
- Date: Thu, 28 May 2015 10:48:29 -0700
- Subject: Re: [PATCH 1/* v3] Generic string function optimization: Add skeleton
- Authentication-results: sourceware.org; auth=none
- References: <20150527060121 dot GA19105 at domone> <20150528142956 dot GA25176 at domone>
On 05/28/2015 07:29 AM, OndÅej BÃlka wrote:
Here is a new version of skeleton. I added a big endian support. This
reminded me that when I first wrote it I wanted to use opperations that
dont cause carry, then forgotten about it. As thats needed only for
first aligned load or always on big endian you need to supply expression
twice. one version shouldn't cause carry propagation.
* string/common.h: New file.
* string/skeleton.h: Likewise.
I like the idea of this common header for implementing these algorithms.
Though I'd like to see it not placed in string/, but sysdeps/generic/, so that
one can provide specialized versions for different targets.
+static const unsigned long int ones = (~0UL / 255); /* 0x0101...*/
+static const unsigned long int add = 127 * (~0UL / 255);
+static const unsigned long int high_bits = 128 * (~0UL / 255);
We're still using C, not C++. These are objects requiring static allocation,
not abstract constants. Please just use #defines.
+static __always_inline
+unsigned long int
+contains_zero (unsigned long int s)
+{
+ return (s - ones) & ~s & high_bits;
+}
On Alpha or PPC, the target-specific header could use cmpbge or cmpb insns
respectively.
On armv6t2/armv7, this can be done with the vector saturating addition, uqadd8,
by adding 0xfe and then inverting. Of course, this works on other targets for
which vector insns exist, but on arm uqadd8 works on normal integer registers.
+#define CROSS_PAGE(x, n) (((uintptr_t) x) % 4096 > 4096 - n)
Certainly different targets would like to override the minimal page size.
+# ifdef FAST_FFS
+ return (ffsl (u) - 1) / 8;
+# else
Why are you stuck on ffs instead of ctz? The later avoids all the -1 adjustments.
+ }
+ else
+ {
+#endif
...
+#if _STRING_ARCH_unaligned
+ }
+#endif
Better placement of the first #endif means you don't need the second.
r~