This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3 04/18] Add string vectorized find and detection functions



On 11/01/2018 14:47, Paul Eggert wrote:
> On 01/10/2018 04:47 AM, Adhemerval Zanella wrote:
>> +  op_t lsb = (op_t)-1 / 0xff;
>> +  op_t msb = lsb << (CHAR_BIT - 1);
> This would be simpler and clearer if it were rewritten as:
> 
>     opt_t lsb = repeat_bytes (0x01);
>     opt_t msb = repeat_bytes (0x80);
> 
> There are several other opportunities for this kind of simplification.

Indeed, I changed it locally

> 
>> +static inline op_t
>> +find_zero_eq_low (op_t x1, op_t x2)
>> +{
>> +  op_t lsb = (op_t)-1 / 0xff;
>> +  op_t msb = lsb << (CHAR_BIT - 1);
>> +  op_t eq = x1 ^ x2;
>> +  return (((x1 - lsb) & ~x1) | ((eq - lsb) & ~eq)) & msb;
>> +}
> 
> How about the following simpler implementation instead? I expect it's just as fast:
> 
>    return find_zero_low (x1) | find_zero_low (x1 ^ x2);
> 
> Similarly for find_zero_eq_all, find_zero_ne_low, find_zero_ne_all.

I think this seems ok and code generation for at least aarch64, powerpc64le,
sparc64, and x86_64 seems similar.

> 
>> +static inline unsigned
>> +__clz (op_t x)
>> +{
>> +#if !HAVE_BUILTIN_CLZ
>> +  unsigned r;
>> +  op_t i;
>> +
>> +  x |= x >> 1;
>> +  x |= x >> 2;
>> +  x |= x >> 4;
>> +  x |= x >> 8;
>> +  x |= x >> 16;
>> +# if __WORDSIZE == 64
>> +  x |= x >> 32;
>> +  i = x * 0x03F79D71B4CB0A89ull >> 58;
>> +# else
>> +  i = x * 0x07C4ACDDU >> 27;
>> +# endif
>> +  r = index_access (i);
>> +  return r ^ (sizeof (op_t) * CHAR_BIT - 1);
> 
> The Gnulib integer_length module has a faster implementation, at least for 32-bit platforms. Do we still care about 32-bit platforms? If so, you might want to take a look at it.

Do you mean the version which uses double to integer, the one with 6 comparisons
or the naive one? Indeed I think for 32 bits is issuing a lot of instruction,
the only advantage I see it is branchless (which might be a gain in some kind
of architectures).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]