This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [RFC] Statistics of non-ASCII characters in strings

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: <ams at gnu dot org>
Cc: <libc-alpha at sourceware dot org>
Date: Mon, 22 Dec 2014 17:59:50 -0000
Subject: RE: [RFC] Statistics of non-ASCII characters in strings
Authentication-results: sourceware.org; auth=none
References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <E1Y34Yu-0004LC-KH at fencepost dot gnu dot org>

> Alfred M. Szmidt wrote:
>    Does anyone have statistics of how often strings contain non-ASCII
>    characters? I'm asking because it's feasible to make many string
>    functions faster if they are predominantly ASCII by using a
>    different check for the null byte. So if say 80-90% of strings in
>    strcpy/strlen are ASCII then it would be well worth optimizing for
>    it.
> 
> Not the whole world is ASCII...

Of course, but few people use native-language directory names or identifiers in their code etc, so I
bet most strings will still be pure ASCII. Anyway I was hoping for something more concrete, such as
what is the likelihood that a block of N characters contains a non-ASCII character if the previous
block did?

Wilco

References:
- [RFC] Statistics of non-ASCII characters in strings
  - From: Wilco Dijkstra
- Re: [RFC] Statistics of non-ASCII characters in strings
  - From: Alfred M. Szmidt

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]