This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings

From: Rich Felker <dalias at libc dot org>
To: libc-alpha at sourceware dot org
Date: Mon, 22 Dec 2014 14:48:47 -0500
Subject: Re: [RFC] Statistics of non-ASCII characters in strings
Authentication-results: sourceware.org; auth=none
References: <001401d01df6$0f7cc5a0$2e7650e0$ at com>

On Mon, Dec 22, 2014 at 02:46:24PM -0000, Wilco Dijkstra wrote:
> Does anyone have statistics of how often strings contain non-ASCII characters? I'm asking because
> it's feasible to make many string functions faster if they are predominantly ASCII by using a
> different check for the null byte. So if say 80-90% of strings in strcpy/strlen are ASCII then it
> would be well worth optimizing for it.

Before even bothering to research this I think you should have numbers
on how much faster it would make these functions. I don't think the
difference is noteworthy.

In any case, I think it would be a regression for programs processing
large volumes of non-English text to become slower just because
someone thought it would be clever to optimize for ASCII only...

Rich

References:
- [RFC] Statistics of non-ASCII characters in strings
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]