This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] String optimization workflow for architectures.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: libc-alpha at sourceware dot org
- Cc: Richard Henderson <rth at twiddle dot net>, Joseph Myers <joseph at codesourcery dot com>, Wilco <wdijkstr at arm dot com>
- Date: Sun, 31 May 2015 20:36:33 +0200
- Subject: Re: [RFC] String optimization workflow for architectures.
- Authentication-results: sourceware.org; auth=none
- References: <20150529190952 dot GA23952 at domone> <20150531140506 dot GA5543 at domone>
Also forget to mention different implementations of builtins, these also
need to be selected by benchmarking so we are with same situation as
tunable with multiple values.
Here architecture maintainer could supply custom builtin but it may be
available only for some architectures and there could be several
alternatives or instruction may be too slow. Also there are several
altenative ways to implement generic builtins.
So there should be some system to test these.
I would like to keep system that I use, for each builtin we would make a
directory sysdeps/generic/builtin where each file contains implementation.
Arch maintainer would make builtin directory in his sysdeps.
Then we would first run benchmark that enumerates files
sysdeps/generic/builtin and sysdeps/arch/builtin and creates symlink to
builtin that should be used.
As example question for primitives now I dont know if broadcasting byte
is faster done by:
x * 0x0101010101010101
or
x |= x << 8
x |= x << 16
x |= x << 32
also for first_nonzero byte there are questions like how fast is clz,
and how exploit that you use only highest bits in bytes, and these could
wary based on cpu.
Comments?