This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] Statistics of non-ASCII characters in strings
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: Wilco Dijkstra <wdijkstr at arm dot com>, libc-alpha at sourceware dot org
- Date: Tue, 23 Dec 2014 16:20:49 +0100
- Subject: Re: [RFC] Statistics of non-ASCII characters in strings
- Authentication-results: sourceware.org; auth=none
- References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54997DBF dot 6070305 at redhat dot com>
On Tue, Dec 23, 2014 at 03:35:43PM +0100, Florian Weimer wrote:
> On 12/22/2014 03:46 PM, Wilco Dijkstra wrote:
> >Does anyone have statistics of how often strings contain non-ASCII characters? I'm asking because
> >it's feasible to make many string functions faster if they are predominantly ASCII by using a
> >different check for the null byte.
> Why can't you do the equivalent of
> X = ((X & 0x80) >> 1) | (X & 0x7F);
> before the new check? Does this lengthen the dependency chain too much?
When string is short and you do not enter loop its best to determine
these exactly. For longer you get considerable savings even by skipping one