This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings

From: Florian Weimer <fweimer at redhat dot com>
To: "Carlos O'Donell" <carlos at redhat dot com>, Wilco Dijkstra <wdijkstr at arm dot com>, libc-alpha at sourceware dot org
Date: Tue, 23 Dec 2014 17:33:04 +0100
Subject: Re: [RFC] Statistics of non-ASCII characters in strings
Authentication-results: sourceware.org; auth=none
References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54998EA5 dot 3020606 at redhat dot com>

On 12/23/2014 04:47 PM, Carlos O'Donell wrote:

On 12/22/2014 09:46 AM, Wilco Dijkstra wrote:

Does anyone have statistics of how often strings contain non-ASCII
characters? I'm asking because it's feasible to make many string
functions faster if they are predominantly ASCII by using a different
check for the null byte. So if say 80-90% of strings in strcpy/strlen
are ASCII then it would be well worth optimizing for it.


I don't know that anyone has this data.

The OpenJDK folks are collecting somewhat similar data as part of thisproject:


  <http://openjdk.java.net/jeps/8054307>

The question is slightly different (how many strings exist which containnon-ASCII characters, and how many of them are not even ISO-8859-1?).Even though the application behavior under consideration is less dynamic(you can get that from a heap dump), it's difficult obtain such data.


--
Florian Weimer / Red Hat Product Security

References:
- [RFC] Statistics of non-ASCII characters in strings
  - From: Wilco Dijkstra
- Re: [RFC] Statistics of non-ASCII characters in strings
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]