This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings


On Mon, Dec 22, 2014 at 02:46:24PM -0000, Wilco Dijkstra wrote:
> Does anyone have statistics of how often strings contain non-ASCII characters? I'm asking because
> it's feasible to make many string functions faster if they are predominantly ASCII by using a
> different check for the null byte. So if say 80-90% of strings in strcpy/strlen are ASCII then it
> would be well worth optimizing for it.
> 
I just realized that you do not have to worry about these as you could
use runtime profiling with zero overhead in ascii case.

For that you need add plt rewriting function into dynamic linker,
without that overhead is few cycles to check that variable is zero.

Without that its few cycles per call to check that variable is zero.

You can use this pattern, you need to use fast way how get time and
adjust treshold, if getting time is slow you need to increase number of
false positives between sucessive checks.


bool nonascii;
time_t prev;
long false_positives;
size_t
strlen (const char *s)
{
  if (__libc_unlikely(nonascii))
    return strlen_nonascii (s);

  ...
     if (false_positive)
       {
         if (false_positives % 128 == 0)
           {
             time_t cur = gettime();
             if (cur - prev > treshold)
               nonascii = true;
             else
               prev = cur;
           }
         false_positives++;
       }
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]