This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH v2] Improve fnmatch performance.


Hi, this is revised improvement of fnmatch. It improves queries of
locate command around three times for UTF locale on x64 by finding start
of pattern with fast strstr.

It uses fast locale type check as introduced in strdiff.

How to synchronize this with gnulib? Only implementation specific detail
is utf8 detection.

There are several possible improvements. Main idea is reject strings as
fast as possible. If string is matched then likely you will do some
expensive operation with it.

This for simplicity just tries to find starting quarted of strstr.
it doesn't apply on patterns that start with special character.
Natural extension would be parse pattern, find quartet sequence that
must occur and do same strstr test. Second would be use whole characters
in casefold strstr with nonascii UTF.

Passes test, OK to commit this?

	* posix/fnmatch.c (fnmatch): Improve performance.

diff --git a/posix/fnmatch.c b/posix/fnmatch.c
index a707847..7152055 100644
--- a/posix/fnmatch.c
+++ b/posix/fnmatch.c
@@ -333,7 +333,48 @@ fnmatch (pattern, string, flags)
      int flags;
 {
 # if HANDLE_MULTIBYTE
-  if (__builtin_expect (MB_CUR_MAX, 1) != 1)
+
+  struct __locale_data *current = _NL_CURRENT_LOCALE->__locales[LC_COLLATE];
+  uint_fast32_t encoding =
+    current->values[_NL_ITEM_INDEX (_NL_COLLATE_ENCODING_TYPE)].word;
+
+  /* ASCII with \+/.*?[{(@! excluded.  */
+  static unsigned char normal[256] = {
+ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
+ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+ };
+
+  if (encoding == !__cet_other)
+    {
+      char start[16];
+      char *string2;
+      size_t i;
+      for (i = 0; i < 4 && normal[(unsigned char) pattern[i]]; i++)
+        start[i] = pattern[i];
+      start[i] = 0;
+      if (flags & FNM_CASEFOLD)
+        string2 = strcasestr (string, start);
+      else  
+        string2 = strstr (string, start);
+      if (!string2)
+        return FNM_NOMATCH;
+    }
+
+  if (MB_CUR_MAX != 1)
     {
       mbstate_t ps;
       size_t n;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]