This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Speed up fnmatch.


On Thu, Oct 17, 2013 at 03:38:10PM -0400, Rich Felker wrote:
> On Thu, Oct 17, 2013 at 05:45:36PM +0200, OndÅej BÃlka wrote:
> > ping
> > On Wed, Oct 09, 2013 at 07:43:59PM +0200, OndÅej BÃlka wrote:
> > > Hi, fnmatch on multibyte string is slow for several reasons.
> > > Main reason is that we spend ages converting strings to wide
> > > strings.
> > > 
> > > Most of time this overhead is not needed as user does a simple search
> > > with no special characters. 
> > > 
> > > For utf8 these could be checked by strstr as there is only one way how
> > > encode ascii-only sequence. I could generalize this to any encoding for
> > > which this is true.
> > > 
> > > A strstr will quickly determine that there is no match because there is
> > > no match with start.
> > > 
> > > I excluded all special characters from documentation and it passes test.
> > > 
> > > OK to commit?
> 
> I don't think so without significant further review.
> 
> > > 
> > > 	* posix/fnmatch.c: Optimize likely case.
> > > 
> > > diff --git a/posix/fnmatch.c b/posix/fnmatch.c
> > > index 0f26a2e..27aec2e 100644
> > > --- a/posix/fnmatch.c
> > > +++ b/posix/fnmatch.c
> > > @@ -329,7 +329,37 @@ fnmatch (pattern, string, flags)
> > >       int flags;
> > >  {
> > >  # if HANDLE_MULTIBYTE
> > > -  if (__builtin_expect (MB_CUR_MAX, 1) != 1)
> > > +  int multibyte_needed = 0;
> > > +  if (MB_CUR_MAX != 1)
> > > +    {
> > > +      char *encoding = nl_langinfo (CODESET);
> > > +      if (strcmp (encoding, "UTF8"))
> 
> The CODESET string is "UTF-8", not "UTF8", so you were not testing
> whatever codepath you intended to take place for UTF-8...
>
Ok, this is placeholder as check that really should be there is if
in encoding a ascii sequence cannot be written in alternative way.

> > > +  if (!multibyte_needed)
> > > +    {
> > > +      char start[16];
> > > +      size_t i;
> > > +      for (i = 0; i < 8 && normal[(unsigned char) pattern[i]]; i++)
> > > +        start[i] = pattern[i];
> > > +      start[i] = 0;
> > > +      char *string2 = strstr (string, start);
> > > +      if (!string2)
> > > +        return FNM_NOMATCH;
> 
> While it's better than the current approach of unnecessary conversion,
> I think strstr is still highly suboptimal in some corner cases. This
> code should all eventually be rewritten, but for now your solution is
> probably better than nothing. It does need more review for correctness
> though.
> 
There could be corner cases but cases that happen most of time come
first. Note that fnmatch is mostly used on filenames which are small so
lengthy preprocessing may not pay itself.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]