This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] Statistics of non-ASCII characters in strings
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 24 Dec 2014 14:08:34 +0100
- Subject: Re: [RFC] Statistics of non-ASCII characters in strings
- Authentication-results: sourceware.org; auth=none
- References: <001401d01df6$0f7cc5a0$2e7650e0$ at com>
On Mon, Dec 22, 2014 at 02:46:24PM -0000, Wilco Dijkstra wrote:
> Does anyone have statistics of how often strings contain non-ASCII characters? I'm asking because
> it's feasible to make many string functions faster if they are predominantly ASCII by using a
> different check for the null byte. So if say 80-90% of strings in strcpy/strlen are ASCII then it
> would be well worth optimizing for it.
I just realized that you do not have to worry about these as you could
use runtime profiling with zero overhead in ascii case.
For that you need add plt rewriting function into dynamic linker,
without that overhead is few cycles to check that variable is zero.
Without that its few cycles per call to check that variable is zero.
You can use this pattern, you need to use fast way how get time and
adjust treshold, if getting time is slow you need to increase number of
false positives between sucessive checks.
strlen (const char *s)
return strlen_nonascii (s);
if (false_positives % 128 == 0)
time_t cur = gettime();
if (cur - prev > treshold)
nonascii = true;
prev = cur;