This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Regex performance improvements
- To: libc-alpha at sources dot redhat dot com
- Subject: Regex performance improvements
- From: Paolo Bonzini <bonzini at pc-amo3 dot elet dot polimi dot it>
- Date: Tue, 8 May 2001 11:56:18 +0200 (CEST)
- cc: alainm at gnu dot org
- Reply-To: bonzini at gnu dot org
I have obtained 10-15% performance improvements in GNU regex. There are
two pieces of work:
- allowing one to conditionally remove gapped buffer support. To limit
the changes, re_match_2 and re_search_2 are still available, but raise
an error if size1 != 0 && size2 != 0. A few macros are changed, and a
few tests in re_match_2_internal are simplified if SINGLE_STRING is
defined. I did this mostly for GNU grep, but it applies to most other
users of regex, like sed or awk; it improves performance by 5%.
- using GNU CC first-class labels if available. This is a very small
patch (apart from reindenting some code) which also eliminates some
jumps-to-jumps that compilers usually leave as they are (so a few
cycles can be squeezed even without GNU CC). Basically, every break
in the main re_match_2_internal switch statement is changed to a macro,
which in turn expands like this
GNU CC available | GNU CC not available
-----------------------------------+------------------------------------
if (p > pend) | if (p > pend)
goto end_of_pattern; | goto end_of_pattern;
else | else
goto *execute_command[...]; | goto execute_command;
(end_of_pattern is the piece of code inside the big "if (p > pend)",
right after the "for (;;)", in re_match_2_internal; execute_command
is either an array of labels, or a label before the switch statement
if not using GNU CC).
Performance improvement for this change is 5% without GNU CC labels,
10% with GNU CC labels.
Tonight I can post the patch for scrutiny.
--
|_ _ _ ___
|_)(_)| ) ,'
--------- '-._.