This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] Statistics of non-ASCII characters in strings
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>
- Cc: Wilco Dijkstra <wdijkstr at arm dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 23 Dec 2014 08:50:34 -0800
- Subject: Re: [RFC] Statistics of non-ASCII characters in strings
- Authentication-results: sourceware.org; auth=none
- References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54998EA5 dot 3020606 at redhat dot com>
On Tue, Dec 23, 2014 at 7:47 AM, Carlos O'Donell <email@example.com> wrote:
> On 12/22/2014 09:46 AM, Wilco Dijkstra wrote:
>> Does anyone have statistics of how often strings contain non-ASCII
>> characters? I'm asking because it's feasible to make many string
>> functions faster if they are predominantly ASCII by using a different
>> check for the null byte. So if say 80-90% of strings in strcpy/strlen
>> are ASCII then it would be well worth optimizing for it.
> I don't know that anyone has this data.
> However, it brings us to a discussion on whole system benchmarking and
> data gathering.
> Your particular question is about the average workload, for which there
> is no real consensus yet. Note that Ondrej has posted patches for a whole
> system benchmarking framework based on his LD_PRELOAD libraries. I think
> that or a systemtap-based framework are sensible solutions. I don't care
> which goes forward really, but with such a path forward we might start
> getting users to run the whole system benchmark in data-gathering mode
> with a global LD_PRELOAD and provide us with raw or aggregate data.
You can use LD_AUDIT to collect such information on your