Bug 52 - Repeated and nested subexpressions (reproducible in most other engines)
: Repeated and nested subexpressions (reproducible in most other engines)
Status: SUSPENDED
Product: glibc
Classification: Unclassified
Component: regex
: unspecified
: P3 minor
: ---
Assigned To: GOTO Masanori
:
:
:
:
  Show dependency treegraph
 
Reported: 2004-03-01 15:48 UTC by Paolo Bonzini
Modified: 2012-03-08 04:36 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Patch taken from bug comments. (1.31 KB, patch)
2006-02-21 16:28 UTC, Dwayne Grant McConnell
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paolo Bonzini 2004-03-01 15:48:08 UTC
Most engines, including glibc's have problems with the following regexes:

(b(c)|d(e))*    match against bcde  -->  \1=de, \3=e, but erroneously \2=c
(a(b)*)*        match against aba   -->  \1=a, but erroneously \2=b

This is just a nit, because most other regex matching engines, either DFA-based 
or syntax-directed, have problems with this kind of regex.

Note that while the second can be fixed by looking ahead of a OP_DUP_ASTERISK 
and preventively clearing an OP_OPEN_SUBEXP to which the OP_DUP_ASTERISK 
epsilon transits, the first is much harder and requires storing or building 
a "tree" of how the subexpressions nest (which would fix the second one as 
well).

Paolo
Comment 1 Paolo Bonzini 2004-11-13 12:55:29 UTC
I doubt this will be fixed.
Comment 2 Dwayne Grant McConnell 2006-02-21 15:54:26 UTC
Do you have a small testcase demonstrating this problem?
Comment 3 Dwayne Grant McConnell 2006-02-21 16:28:01 UTC
Created attachment 875 [details]
Patch taken from bug comments.
Comment 4 Dwayne Grant McConnell 2006-02-21 16:29:11 UTC
Sorry. That attachment was meant for bug 14.
Comment 5 Petr Baudis 2010-06-01 02:06:41 UTC
no feedback to testcase query (echo aba | sed -e 's/\(a\(b\)*\)*/\2/' does
produce b, but it's not clear to me what should it print instead)
Comment 6 Paolo Bonzini 2010-06-01 06:26:33 UTC
echo aba | sed -e 's/\(a\(b\)*\)*/\2/'

should print nothing because the final match of \1 is "a" and \2 must be a substring of \1.

Reopening...
Comment 7 Paolo Bonzini 2010-06-01 06:27:16 UTC
... and leaving suspended.