[PATCH] speedup matching of ^$ and cleanup some code in regex
Paolo Bonzini
bonzini@gnu.org
Thu Apr 24 23:11:00 GMT 2008
Regex matching has an optimization where only one match is tried for a
regex anchored to the beginning of the buffer. While other anchors are
resolved with the fastmap, this one allows further optimization and is
special cased. However, because of a bug in create_cd_newstate, ^$ would
be mistakenly treated as a non-anchoring match, and re_search_internal
would try matching it at every position.
In fact, the bug is (almost) fixed by this hunk:
@@ -1682,8 +1680,6 @@ create_cd_newstate (const re_dfa_t *dfa,
newstate->halt = 1;
else if (type == OP_BACK_REF)
newstate->has_backref = 1;
- else if (type == ANCHOR)
- constraint = node->opr.ctx_type;
if (constraint)
{
However, some complications in building the NFA prevent this from fixing
the problem. Therefore, this patch cleans up the handling of anchors so
that tests on type == ANCHOR are not necessary anymore. When creating
the NFA (calc_first), I move the opr.ctx_type to the constraint field of
re_token_t, and then I always look at it unconditionally, without
special-casing ANCHORs. This also allows some simplification of
duplicate_node_closure.
Patch at http://sourceware.org/bugzilla/attachment.cgi?id=2690&action=view
Paolo
More information about the Libc-alpha
mailing list