This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v3 04/18] Add string vectorized find and detection functions
On 11/01/2018 14:47, Paul Eggert wrote:
> On 01/10/2018 04:47 AM, Adhemerval Zanella wrote:
>> + op_t lsb = (op_t)-1 / 0xff;
>> + op_t msb = lsb << (CHAR_BIT - 1);
> This would be simpler and clearer if it were rewritten as:
>
> opt_t lsb = repeat_bytes (0x01);
> opt_t msb = repeat_bytes (0x80);
>
> There are several other opportunities for this kind of simplification.
Indeed, I changed it locally
>
>> +static inline op_t
>> +find_zero_eq_low (op_t x1, op_t x2)
>> +{
>> + op_t lsb = (op_t)-1 / 0xff;
>> + op_t msb = lsb << (CHAR_BIT - 1);
>> + op_t eq = x1 ^ x2;
>> + return (((x1 - lsb) & ~x1) | ((eq - lsb) & ~eq)) & msb;
>> +}
>
> How about the following simpler implementation instead? I expect it's just as fast:
>
> return find_zero_low (x1) | find_zero_low (x1 ^ x2);
>
> Similarly for find_zero_eq_all, find_zero_ne_low, find_zero_ne_all.
I think this seems ok and code generation for at least aarch64, powerpc64le,
sparc64, and x86_64 seems similar.
>
>> +static inline unsigned
>> +__clz (op_t x)
>> +{
>> +#if !HAVE_BUILTIN_CLZ
>> + unsigned r;
>> + op_t i;
>> +
>> + x |= x >> 1;
>> + x |= x >> 2;
>> + x |= x >> 4;
>> + x |= x >> 8;
>> + x |= x >> 16;
>> +# if __WORDSIZE == 64
>> + x |= x >> 32;
>> + i = x * 0x03F79D71B4CB0A89ull >> 58;
>> +# else
>> + i = x * 0x07C4ACDDU >> 27;
>> +# endif
>> + r = index_access (i);
>> + return r ^ (sizeof (op_t) * CHAR_BIT - 1);
>
> The Gnulib integer_length module has a faster implementation, at least for 32-bit platforms. Do we still care about 32-bit platforms? If so, you might want to take a look at it.
Do you mean the version which uses double to integer, the one with 6 comparisons
or the naive one? Indeed I think for 32 bits is issuing a lot of instruction,
the only advantage I see it is branchless (which might be a gain in some kind
of architectures).