[PATCH] More regex fixes and testcases

Paolo Bonzini paolo.bonzini@polimi.it
Mon Oct 6 10:17:00 GMT 2003


The first of these patches fixes the infinite loop on a regex
like ()\1*\1*.  While doing this, I decided to rewrite the
check_dst_limits_calc_pos routine which had some IMHO uselessly
complicated expressions.  The patch has no new testsuite
failures, also in the sed testsuite.

I still have some XFAILs in the sed testsuite, that look easy
but are not completely so.  The problem is in matching empty
backreferences, which are not put in the epsilon closures and
hence are sifted away by sift_states_backward.  However the
easy way of adding OP_BACK_REF to the epsilon closures causes
some problems in the handling of the fail stack.

        ()(b)\1c\2     fails to match the whole of `bcb'
        (b())\2\1      fails to match a<bb>bbc
        (bb())\2\1     fails to match a<bbbb>c with a sigsegv

The first patch adds test cases for the fail stack handling bug
that was caused by my first attempt at solving them.  The
second patch adds the failing test cases; I made it separately
because you may not like XFAILs.

Time permitting, I have a set of patches in mind to:
1) add debugging code to print out the DFA
2) mark some functions as pure
3) add more comments to the functions (most of the times,
   there is something at the call-site but not at the heading).
4) simplify/hoist some expression like I did in
   check_dst_limits_calc_pos

How much would these be appreciated, and in what order?

Paolo

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fix-repeated-empty-backref.patch
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20031006/519b05c4/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: empty-backref-xfails.patch
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20031006/519b05c4/attachment-0001.ksh>


More information about the Libc-alpha mailing list