Another RFC: regex in libiberty
Zack Weinberg
zackw@stanford.edu
Fri Jun 8 09:59:00 GMT 2001
On Fri, Jun 08, 2001 at 10:06:51AM +0300, Eli Zaretskii wrote:
>
> One notorious problem with GNU regex is that it is quite slow for many
> simple jobs, such as matching a simple regular expression with no
> backtracking. It seems that the main reason for this slowness is the
> fact that GNU regex supports null characters in strings. For
> examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower
> on simple jobs than the same Sed compiled with Spencer's regex
> library.
I think the null characters are a red herring. I looked into GNU
regex's performance in the context of GCC's fixincludes program, last
year. On a platform that has mostly-okay headers, fixincludes spends
most of its time matching regular expressions.
The regex.c that came with GDB 4.18, which I think is the one that got
spread around widely, had a bug in its implementation of the POSIX
regcomp/regexec interface, which caused a major performance hit. That
bug has been fixed in GNU libc for a long time. When I replaced
fixincludes' copy of regex.c with a more recent version from glibc,
fixincludes was sped up by a factor of nine. That same bug affects
Sed 3.02 - replace the regex.c it ships with with the one from glibc
2.2.x and I bet you'll see better performance.
There's some discussion in these messages:
http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00764.html
http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00765.html
The relevant fix is in there, too, if you want to pull it out and
apply it.
I did some benchmarking of fixincludes with Spencer's regexp library
as well. IIRC, it was about the same as the fixed GNU regex.c.
--
zw This is, no doubt, the rational strategy; quite possibly the
only one that will work. But it ignores the exigiencies of
the tenure system and is therefore impractical.
-- Jerry Fodor, _The Mind Doesn't Work That Way_
More information about the Gdb
mailing list