This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Improve generic strcspn performance

Adhemerval Zanella Netto - Jan. 8, 2016, 8:05 p.m. wrote:
> > +  if (reject[0] == '\0')
> > +    return strlen (str);
> > +  if (reject[1] == '\0')
> > +    return __strchrnul (str, reject [0]) - str;
> I am not sure how often strcspn is used with empty or one char argument to
> validate this optimization in specific since it adds more branch cases for
> more general inputs.

An empty string is extremely unlikely, however one and two characters seem to
occur frequently (grep the GLIC sources for str(c)spn/strpbrk). My goal was
to get rid of the odd inlines in the headers and enable the generic C implementation
to beat the special cases by a good margin. Compared to the overhead of the
initialization of the table, these extra checks cost very little (and once you check
for a single-character string, you also need to check for an empty string).

> > -  return count;
> > +  /* Use multiple small memsets to enable inlining on most targets.  */
> > +  p = memset (table, 0, 64);
> > +  memset (p + 64, 0, 64);
> > +  memset (p + 128, 0, 64);
> > +  memset (p + 192, 0, 64);
> It is unfortunate we need to use this to force inline instead to let the
> compiler handle it directly (and also simplifying the code by using
> c99 initializers).  I noted x86_64 does no inline, although aarch64 and
> powerpc64le calls memset.  How bad is avoiding this explicit calls now
> and work on compiler side to detect this aligned memset?

Yes but unfortunately inlining of memset is essential to get reasonable
performance on small sizes. Eg. for sizes 30-60 the overhead of not inlining
is 25-30% on Cortex-A57.

We could maybe add a --param max-inline-memset=N option to a future GCC for 
building GLIBC (or just these files), however this doesn't help when GLIBC is
built using any current GCC versions.

Another possibility might be to write a loop with stores of size_t and build with
a huge value for max-completely-peeled-insns. Or just give up and use macros
to write out all stores explicitly...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]