This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings

From: "Carlos O'Donell" <carlos at redhat dot com>
To: "H.J. Lu" <hjl dot tools at gmail dot com>
Cc: Wilco Dijkstra <wdijkstr at arm dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 23 Dec 2014 13:11:00 -0500
Subject: Re: [RFC] Statistics of non-ASCII characters in strings
Authentication-results: sourceware.org; auth=none
References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54998EA5 dot 3020606 at redhat dot com> <CAMe9rOogs+LDys9h=mcaFy0Q=ND28Fmqj2rB_JfyG217F1wEYQ at mail dot gmail dot com>

On 12/23/2014 11:50 AM, H.J. Lu wrote:
> On Tue, Dec 23, 2014 at 7:47 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 12/22/2014 09:46 AM, Wilco Dijkstra wrote:
>>> Does anyone have statistics of how often strings contain non-ASCII
>>> characters? I'm asking because it's feasible to make many string
>>> functions faster if they are predominantly ASCII by using a different
>>> check for the null byte. So if say 80-90% of strings in strcpy/strlen
>>> are ASCII then it would be well worth optimizing for it.
>>
>> I don't know that anyone has this data.
>>
>> However, it brings us to a discussion on whole system benchmarking and
>> data gathering.
>>
>> Your particular question is about the average workload, for which there
>> is no real consensus yet. Note that Ondrej has posted patches for a whole
>> system benchmarking framework based on his LD_PRELOAD libraries. I think
>> that or a systemtap-based framework are sensible solutions. I don't care
>> which goes forward really, but with such a path forward we might start
>> getting users to run the whole system benchmark in data-gathering mode
>> with a global LD_PRELOAD and provide us with raw or aggregate data.
>>
> 
> You can use LD_AUDIT to collect such information on your
> system.

Agreed, that is another way to do it.

Keep in mind this will be run by non-experts so we need a lot more
fluffy stuff around the bits we deliver to help non-experts collect
data and return that to us.

Cheers,
Carlos.

References:
- [RFC] Statistics of non-ASCII characters in strings
  - From: Wilco Dijkstra
- Re: [RFC] Statistics of non-ASCII characters in strings
  - From: Carlos O'Donell
- Re: [RFC] Statistics of non-ASCII characters in strings
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]