<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://sourceware.org/bugzilla/bugzilla.dtd">

<bugzilla version="4.0.10"
          urlbase="http://sourceware.org/bugzilla/"
          
          maintainer="overseers@sourceware.org"
>

    <bug>
          <bug_id>52</bug_id>
          
          <creation_ts>2004-03-01 15:48:00 +0000</creation_ts>
          <short_desc>Repeated and nested subexpressions (reproducible in most other engines)</short_desc>
          <delta_ts>2012-03-08 04:36:24 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>glibc</product>
          <component>regex</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>SUSPENDED</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>minor</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Paolo Bonzini">bonzini</reporter>
          <assigned_to name="GOTO Masanori">gotom</assigned_to>
          <cc>carlos</cc>
    
    <cc>glibc-bugs</cc>
          <cf_gcchost></cf_gcchost>
          <cf_gcctarget></cf_gcctarget>
          <cf_gccbuild></cf_gccbuild>
          

      

      

      

          <long_desc isprivate="0">
            <commentid>125</commentid>
            <who name="Paolo Bonzini">bonzini</who>
            <bug_when>2004-03-01 15:48:08 +0000</bug_when>
            <thetext>Most engines, including glibc&apos;s have problems with the following regexes:

(b(c)|d(e))*    match against bcde  --&gt;  \1=de, \3=e, but erroneously \2=c
(a(b)*)*        match against aba   --&gt;  \1=a, but erroneously \2=b

This is just a nit, because most other regex matching engines, either DFA-based 
or syntax-directed, have problems with this kind of regex.

Note that while the second can be fixed by looking ahead of a OP_DUP_ASTERISK 
and preventively clearing an OP_OPEN_SUBEXP to which the OP_DUP_ASTERISK 
epsilon transits, the first is much harder and requires storing or building 
a &quot;tree&quot; of how the subexpressions nest (which would fix the second one as 
well).

Paolo</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <commentid>2080</commentid>
            <who name="Paolo Bonzini">bonzini</who>
            <bug_when>2004-11-13 12:55:29 +0000</bug_when>
            <thetext>I doubt this will be fixed.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <commentid>8562</commentid>
            <who name="Dwayne Grant McConnell">decimal</who>
            <bug_when>2006-02-21 15:54:26 +0000</bug_when>
            <thetext>Do you have a small testcase demonstrating this problem?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <commentid>8565</commentid>
              <attachid>875</attachid>
            <who name="Dwayne Grant McConnell">decimal</who>
            <bug_when>2006-02-21 16:28:01 +0000</bug_when>
            <thetext>Created attachment 875
Patch taken from bug comments.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <commentid>8566</commentid>
            <who name="Dwayne Grant McConnell">decimal</who>
            <bug_when>2006-02-21 16:29:11 +0000</bug_when>
            <thetext>Sorry. That attachment was meant for bug 14.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <commentid>43407</commentid>
            <who name="Petr Baudis">pasky</who>
            <bug_when>2010-06-01 02:06:41 +0000</bug_when>
            <thetext>no feedback to testcase query (echo aba | sed -e &apos;s/\(a\(b\)*\)*/\2/&apos; does
produce b, but it&apos;s not clear to me what should it print instead)</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <commentid>43442</commentid>
            <who name="Paolo Bonzini">bonzini</who>
            <bug_when>2010-06-01 06:26:33 +0000</bug_when>
            <thetext>echo aba | sed -e &apos;s/\(a\(b\)*\)*/\2/&apos;

should print nothing because the final match of \1 is &quot;a&quot; and \2 must be a substring of \1.

Reopening...</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <commentid>43443</commentid>
            <who name="Paolo Bonzini">bonzini</who>
            <bug_when>2010-06-01 06:27:16 +0000</bug_when>
            <thetext>... and leaving suspended.</thetext>
          </long_desc>
      
          <attachment
              isobsolete="1"
              ispatch="1"
              isprivate="0"
              isurl="0"
          >
            <attachid>875</attachid>
            <date>2006-02-21 16:28:00 +0000</date>
            <delta_ts>2006-02-21 16:29:48 +0000</delta_ts>
            <desc>Patch taken from bug comments.</desc>
            <filename>glibc-bug-14.patch</filename>
            <type>text/plain</type>
            <size>1343</size>
            <attacher>decimal</attacher>
            
              <data encoding="base64">LS0tIGdldGRlbnRzLmMub3JpZwkyMDAzLTEyLTA0IDE5OjExOjM4LjAwMDAwMDAwMCArMDIwMAor
KysgZ2V0ZGVudHMuYwkyMDAzLTEyLTA0IDE5OjM4OjA2LjAwMDAwMDAwMCArMDIwMApAQCAtMTE3
LDcgKzExNyw3IEBACiAgICAgICBzaXplX3Qga2J5dGVzID0gbmJ5dGVzOwogICAgICAgaWYgKG9m
ZnNldG9mIChESVJFTlRfVFlQRSwgZF9uYW1lKQogCSAgPCBvZmZzZXRvZiAoc3RydWN0IGtlcm5l
bF9kaXJlbnQ2NCwgZF9uYW1lKQotCSAgJiYgbmJ5dGVzIDw9IHNpemVvZiAoRElSRU5UX1RZUEUp
KQorCSAgJiYgbmJ5dGVzIDw9IHNpemVvZiAoa2VybmVsX2RpcmVudDY0KSkKIAl7CiAJICBrYnl0
ZXMgPSBuYnl0ZXMgKyBvZmZzZXRvZiAoc3RydWN0IGtlcm5lbF9kaXJlbnQ2NCwgZF9uYW1lKQog
CQkgICAtIG9mZnNldG9mIChESVJFTlRfVFlQRSwgZF9uYW1lKTsKQEAgLTE3NSw4ICsxNzUsNyBA
QAogCSAgICAgIG91dHAtPnUuZF9vZmYgPSBkX29mZjsKIAkgICAgICBpZiAoKHNpemVvZiAob3V0
cC0+dS5kX2lubykgIT0gc2l6ZW9mIChpbnAtPmsuZF9pbm8pCiAJCSAgICYmIG91dHAtPnUuZF9p
bm8gIT0gZF9pbm8pCi0JCSAgfHwgKHNpemVvZiAob3V0cC0+dS5kX29mZikgIT0gc2l6ZW9mIChp
bnAtPmsuZF9vZmYpCi0JCSAgICAgICYmIG91dHAtPnUuZF9vZmYgIT0gZF9vZmYpKQorCQkgICkK
IAkJewogCQkgIC8qIE92ZXJmbG93LiAgSWYgdGhlcmUgd2FzIGF0IGxlYXN0IG9uZSBlbnRyeQog
CQkgICAgIGJlZm9yZSB0aGlzIG9uZSwgcmV0dXJuIHRoZW0gd2l0aG91dCBlcnJvciwKQEAgLTE5
MCw3ICsxODksMTAgQEAKIAkJICByZXR1cm4gLTE7CiAJCX0KIAotCSAgICAgIGxhc3Rfb2Zmc2V0
ID0gZF9vZmY7CisJICAgICAgaWYoIGxhc3Rfb2Zmc2V0ID09IC0xICkKKwkJbGFzdF9vZmZzZXQg
PSAwOworCSAgICAgIGxhc3Rfb2Zmc2V0ICs9IG9sZF9yZWNsZW47CisKIAkgICAgICBvdXRwLT51
LmRfcmVjbGVuID0gbmV3X3JlY2xlbjsKIAkgICAgICBvdXRwLT51LmRfdHlwZSA9IGRfdHlwZTsK
IApAQCAtMjEzLDYgKzIxNSw3IEBACiAgICAgY29uc3Qgc2l6ZV90IHNpemVfZGlmZiA9IChvZmZz
ZXRvZiAoRElSRU5UX1RZUEUsIGRfbmFtZSkKIAkJCSAgICAgIC0gb2Zmc2V0b2YgKHN0cnVjdCBr
ZXJuZWxfZGlyZW50LCBkX25hbWUpKTsKIAorICAgIC8qIGJ1Zz8gKG5ieXRlcyBtaWdodCBiZSBz
bWFsbGVyIHRoYW4gcmlnaHQgc2lkZSBvZiBtaW51cykgKi8KICAgICByZWRfbmJ5dGVzID0gTUlO
IChuYnl0ZXMKIAkJICAgICAgLSAoKG5ieXRlcyAvIChvZmZzZXRvZiAoRElSRU5UX1RZUEUsIGRf
bmFtZSkgKyAxNCkpCiAJCQkgKiBzaXplX2RpZmYpLAo=
</data>

          </attachment>
      

    </bug>

</bugzilla>