This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Regex performance improvements

I have obtained 10-15% performance improvements in GNU regex.  There are
two pieces of work:

- allowing one to conditionally remove gapped buffer support.  To limit
  the changes, re_match_2 and re_search_2 are still available, but raise
  an error if size1 != 0 && size2 != 0. A few macros are changed, and a
  few tests in re_match_2_internal are simplified if SINGLE_STRING is
  defined.  I did this mostly for GNU grep, but it applies to most other
  users of regex, like sed or awk; it improves performance by 5%.

- using GNU CC first-class labels if available.  This is a very small
  patch (apart from reindenting some code) which also eliminates some
  jumps-to-jumps that compilers usually leave as they are (so a few
  cycles can be squeezed even without GNU CC). Basically, every break
  in the main re_match_2_internal switch statement is changed to a macro,
  which in turn expands like this

      GNU CC available               |        GNU CC not available
    if (p > pend)                    |          if (p > pend)
      goto end_of_pattern;           |            goto end_of_pattern;
    else                             |          else
      goto *execute_command[...];    |            goto execute_command;

  (end_of_pattern is the piece of code inside the big "if (p > pend)",
  right after the "for (;;)", in re_match_2_internal; execute_command
  is either an array of labels, or a label before the switch statement
  if not using GNU CC).

  Performance improvement for this change is 5% without GNU CC labels,
  10% with GNU CC labels.

Tonight I can post the patch for scrutiny.

|_  _  _  ___
|_)(_)| )  ,'
--------- '-._.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]