This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings

On 12/23/2014 11:50 AM, H.J. Lu wrote:
> On Tue, Dec 23, 2014 at 7:47 AM, Carlos O'Donell <> wrote:
>> On 12/22/2014 09:46 AM, Wilco Dijkstra wrote:
>>> Does anyone have statistics of how often strings contain non-ASCII
>>> characters? I'm asking because it's feasible to make many string
>>> functions faster if they are predominantly ASCII by using a different
>>> check for the null byte. So if say 80-90% of strings in strcpy/strlen
>>> are ASCII then it would be well worth optimizing for it.
>> I don't know that anyone has this data.
>> However, it brings us to a discussion on whole system benchmarking and
>> data gathering.
>> Your particular question is about the average workload, for which there
>> is no real consensus yet. Note that Ondrej has posted patches for a whole
>> system benchmarking framework based on his LD_PRELOAD libraries. I think
>> that or a systemtap-based framework are sensible solutions. I don't care
>> which goes forward really, but with such a path forward we might start
>> getting users to run the whole system benchmark in data-gathering mode
>> with a global LD_PRELOAD and provide us with raw or aggregate data.
> You can use LD_AUDIT to collect such information on your
> system.

Agreed, that is another way to do it.

Keep in mind this will be run by non-experts so we need a lot more
fluffy stuff around the bits we deliver to help non-experts collect
data and return that to us.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]