This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug regex/52] Repeated and nested subexpressions (reproducible in most other engines)
- From: "bonzini at gnu dot org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Thu, 04 Jul 2013 07:21:56 +0000
- Subject: [Bug regex/52] Repeated and nested subexpressions (reproducible in most other engines)
- Auto-submitted: auto-generated
- References: <bug-52-131 at http dot sourceware dot org/bugzilla/>
http://sourceware.org/bugzilla/show_bug.cgi?id=52
--- Comment #11 from Paolo Bonzini <bonzini at gnu dot org> ---
The justification for the "suspended" state is that this would be very
complicated to fix and wouldn't really add anything to the quality of the
implementation.
Even if the "(a(b)*)*" case would not be hard to fix, I'm not sure we can say
the same of the backreference testcase in the RH bug ('(a(b)*)*\2' matched
against 'abab') or the more complicated '(a(b)*)*x\1\2' matched against
'abaxa'.
The RH bugzilla was opened by Eric doesn't really add anything to the urgency
of this bug; Fedora bugs that also exist upstream can be closed liberally, and
that's what I did.
The grep bug on Savannah (https://savannah.gnu.org/bugs/?37737) might add
something, but it is not clear if the user actually encountered it in a
real-world usecase. The same "bug" is present in hardly every regular
expression matcher, and I would suggest that the Austin group gives more leeway
to implementations. For example, the following rules could work:
- it is undefined _which_ occurrence of the sub-RE is captured by a
parenthesized group (and matched in backreferences) if the subgroup, or any of
its parents, is quantified with + * {};
- the backreference should only match the empty string only if the
corresponding sub-RE can be empty, or if the corresponding parenthesized group,
or any of its parents, is quantified with * or {0,...}.
--
You are receiving this mail because:
You are on the CC list for the bug.