Sources Bugzilla – Full Text Bug Listing
|Summary:||Repeated and nested subexpressions (reproducible in most other engines)|
|Product:||glibc||Reporter:||Paolo Bonzini <bonzini>|
|Component:||regex||Assignee:||GOTO Masanori <gotom>|
|Attachments:||Patch taken from bug comments.|
Description Paolo Bonzini 2004-03-01 15:48:08 UTC
Most engines, including glibc's have problems with the following regexes: (b(c)|d(e))* match against bcde --> \1=de, \3=e, but erroneously \2=c (a(b)*)* match against aba --> \1=a, but erroneously \2=b This is just a nit, because most other regex matching engines, either DFA-based or syntax-directed, have problems with this kind of regex. Note that while the second can be fixed by looking ahead of a OP_DUP_ASTERISK and preventively clearing an OP_OPEN_SUBEXP to which the OP_DUP_ASTERISK epsilon transits, the first is much harder and requires storing or building a "tree" of how the subexpressions nest (which would fix the second one as well). Paolo
Comment 1 Paolo Bonzini 2004-11-13 12:55:29 UTC
I doubt this will be fixed.
Comment 2 Dwayne Grant McConnell 2006-02-21 15:54:26 UTC
Do you have a small testcase demonstrating this problem?
Comment 3 Dwayne Grant McConnell 2006-02-21 16:28:01 UTC
Created attachment 875 [details] Patch taken from bug comments.
Comment 4 Dwayne Grant McConnell 2006-02-21 16:29:11 UTC
Sorry. That attachment was meant for bug 14.
Comment 5 Petr Baudis 2010-06-01 02:06:41 UTC
no feedback to testcase query (echo aba | sed -e 's/\(a\(b\)*\)*/\2/' does produce b, but it's not clear to me what should it print instead)
Comment 6 Paolo Bonzini 2010-06-01 06:26:33 UTC
echo aba | sed -e 's/\(a\(b\)*\)*/\2/' should print nothing because the final match of \1 is "a" and \2 must be a substring of \1. Reopening...
Comment 7 Paolo Bonzini 2010-06-01 06:27:16 UTC
... and leaving suspended.