This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v3 04/18] Add string vectorized find and detection functions
On 12/01/2018 15:08, Joseph Myers wrote:
> On Thu, 11 Jan 2018, Paul Eggert wrote:
>
>> On 01/11/2018 10:54 AM, Adhemerval Zanella wrote:
>>>> The Gnulib integer_length module has a faster implementation, at least for
>>>> 32-bit platforms. Do we still care about 32-bit platforms? If so, you
>>>> might want to take a look at it.
>>> Do you mean the version which uses double to integer, the one with 6
>>> comparisons
>>> or the naive one?
>>
>> I meant the one that converts int to double. It can be branchless since we
>> assume the int is nonzero.
>
> Looking at glibc architectures (and architectures with recently proposed
> ports):
>
> * The following have clz patterns in GCC, unconditionally, meaning this
> glibc patch will always use __builtin_clz functions and any fallback code
> is irrelevant: aarch64 i386 ia64 powerpc tilegx x86_64. (On ia64 the
> pattern uses conversion to floating point.)
>
> * The following have clz patterns in GCC, conditionally: alpha arm m68k
> microblaze mips s390 sparc (and arc). I have not checked whether in some
> of those cases the conditions might in fact be true for every
> configuration for which glibc can be built.
>
> * The following lack clz patterns in GCC: hppa nios2 sh (and riscv).
>
> If the configuration lacking clz is also soft-float, converting int to
> double is an extremely inefficient way ending up calling the libgcc clz
> implementation (both soft-fp and fp-bit use __builtin_clz). I think
> that's sufficient reason to avoid an approach involving conversion to
> double unless an architecture has opted in to using it as an efficient
> approach on that architecture.
Thanks for remind about soft-float, also for some architectures that does
have hardware floating pointer units the int to/from float is also a costly
operation.
Regarding index_{first,last}_ fallback implementation, maybe simpler
implementation which just check the mask bits instead of fallback ones for
leading/trailing zero bit should better, I am open to suggestions here.
>
> (For arm, for example, clz is supported if "TARGET_32BIT && arm_arch5", so
> the only configurations without __builtin_clz expanded inline by the
> compiler are armv4t ones - which are also all soft-float, so the expansion
> using double can never make sense for arm.)
>