This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] fix the last sed testsuite XFAIL


This is a corner case in the POSIX specification which is not handled
correctly by the matcher.

The corner case is exposed by the regex (A?){3,6} with pattern "AAAA".
Here, the register must match the fourth "A", and not the empty string,
because a braced expression must match as few times as possible if the extra
matches were empty.  In other words, you must not match (A?) to an empty
string from the fourth time on.

The fix that the patch implements works backwards by keeping the void
matches and rejecting them when setting the registers.  Simply checking if
the register was already set fails on "AA", where instead the third match is
compulsory and the register must be set to empty.  To do so, since
parse_dup_op converts the above regex to (A?)(A?)(A?)(A?)?(A?)?(A?)?, I need
to mark specially the fake (A?)? nodes, when they are duplicated, by setting
the new OPT_SUBEXP flag.

When a node is marked OPT_SUBEXP, update_regs treats it specially.  A marked
OP_OPEN_SUBEXP must follow a (not necessarily marked) OP_CLOSE_SUBEXP, so we
know that the start of a marked subexpression is the same as the end of the
previous occurrence of the subexpressions.  So update_regs does nothing for
a marked OP_OPEN_SUBEXP, delaying the update to when the OP_CLOSE_SUBEXP is
found.  Upon a marked OP_CLOSE_SUBEXP, we check for an empty match and
discard it, otherwise we shift rm_eo to rm_so and set rm_eo.

This is against my other patch, but should apply cleanly even without it.
I'll send a testcase on Monday.  Interestingly, Perl and PCRE behave like
the current glibc implementation, so this may trigger a fake failure in the
new PCRE-based tests that Jakub wrote.

Thanks very much,

Paolo

2003-12-13 Paolo Bonzini <bonzini@gnu.org>

        * posix/regex_internal.h (re_token_t): Add the
        OPT_SUBEXP bitfield.
        * posix/regcomp.c (duplicate_tree_1): Extract out
        of duplicate_tree.
        (duplicate_tree): Add new FL_OPT parameter.
        (parse_dup_op): Pass it when compiling (RE){M,} with M > 0
        or (RE){M,N} with N > M > 0.
        * posix/regexec.c (update_regs): Honor OPT_SUBEXP.

Attachment: regex-fix-repeated-empty-regex.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]