This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/* v3] Generic string function optimization: Add skeleton
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Richard Henderson <rth at twiddle dot net>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 28 May 2015 20:04:39 +0200
- Subject: Re: [PATCH 1/* v3] Generic string function optimization: Add skeleton
- Authentication-results: sourceware.org; auth=none
- References: <20150527060121 dot GA19105 at domone> <20150528142956 dot GA25176 at domone> <556754ED dot 70804 at twiddle dot net>
On Thu, May 28, 2015 at 10:48:29AM -0700, Richard Henderson wrote:
> On 05/28/2015 07:29 AM, OndÅej BÃlka wrote:
> >Here is a new version of skeleton. I added a big endian support. This
> >reminded me that when I first wrote it I wanted to use opperations that
> >dont cause carry, then forgotten about it. As thats needed only for
> >first aligned load or always on big endian you need to supply expression
> >twice. one version shouldn't cause carry propagation.
> >
> > * string/common.h: New file.
> > * string/skeleton.h: Likewise.
>
> I like the idea of this common header for implementing these
> algorithms. Though I'd like to see it not placed in string/, but
> sysdeps/generic/, so that one can provide specialized versions for
> different targets.
>
Will do.
> >+static __always_inline
> >+unsigned long int
> >+contains_zero (unsigned long int s)
> >+{
> >+ return (s - ones) & ~s & high_bits;
> >+}
>
> On Alpha or PPC, the target-specific header could use cmpbge or cmpb
> insns respectively.
>
> On armv6t2/armv7, this can be done with the vector saturating
> addition, uqadd8, by adding 0xfe and then inverting. Of course,
> this works on other targets for which vector insns exist, but on arm
> uqadd8 works on normal integer registers.
>
> >+# ifdef FAST_FFS
> >+ return (ffsl (u) - 1) / 8;
> >+# else
>
> Why are you stuck on ffs instead of ctz? The later avoids all the -1 adjustments.
>
Both comments are correct. We should do it generically and surround
these functions with ifdef to supply arch-specific versions.