This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [RFC] Statistics of non-ASCII characters in strings
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: <ams at gnu dot org>
- Cc: <libc-alpha at sourceware dot org>
- Date: Mon, 22 Dec 2014 17:59:50 -0000
- Subject: RE: [RFC] Statistics of non-ASCII characters in strings
- Authentication-results: sourceware.org; auth=none
- References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <E1Y34Yu-0004LC-KH at fencepost dot gnu dot org>
> Alfred M. Szmidt wrote:
> Does anyone have statistics of how often strings contain non-ASCII
> characters? I'm asking because it's feasible to make many string
> functions faster if they are predominantly ASCII by using a
> different check for the null byte. So if say 80-90% of strings in
> strcpy/strlen are ASCII then it would be well worth optimizing for
> it.
>
> Not the whole world is ASCII...
Of course, but few people use native-language directory names or identifiers in their code etc, so I
bet most strings will still be pure ASCII. Anyway I was hoping for something more concrete, such as
what is the likelihood that a block of N characters contains a non-ASCII character if the previous
block did?
Wilco