This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings

On Tue, Dec 23, 2014 at 7:47 AM, Carlos O'Donell <> wrote:
> On 12/22/2014 09:46 AM, Wilco Dijkstra wrote:
>> Does anyone have statistics of how often strings contain non-ASCII
>> characters? I'm asking because it's feasible to make many string
>> functions faster if they are predominantly ASCII by using a different
>> check for the null byte. So if say 80-90% of strings in strcpy/strlen
>> are ASCII then it would be well worth optimizing for it.
> I don't know that anyone has this data.
> However, it brings us to a discussion on whole system benchmarking and
> data gathering.
> Your particular question is about the average workload, for which there
> is no real consensus yet. Note that Ondrej has posted patches for a whole
> system benchmarking framework based on his LD_PRELOAD libraries. I think
> that or a systemtap-based framework are sensible solutions. I don't care
> which goes forward really, but with such a path forward we might start
> getting users to run the whole system benchmark in data-gathering mode
> with a global LD_PRELOAD and provide us with raw or aggregate data.

You can use LD_AUDIT to collect such information on your


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]