Created attachment 12452 [details] test program The attached test program just takes a regex pattern and a string at the command line. $ gcc -o regex regex.c $ ./regex 0+ 0 pattern: 0+ string: 0 regex matched $ ./regex 0++ 0 pattern: 0++ string: 0 regex matched $ ./regex 0++++++++++++++++++++++++++++++++++ 0 pattern: 0++++++++++++++++++++++++++++++++++ string: 0 <hangs consuming all system memory> I'm not even sure what consecutive + operators is supposed to mean, so I don't know why "0++" accepts "0". I tested this against the bionic/NetBSD regex implementation and compilation of "0++" fails with REG_BADRPT, which makes more sense.
RE_CONTEXT_INVALID_DUP is only part of RE_SYNTAX_POSIX_BASIC, but not RE_SYNTAX_POSIX_EXTENDED.
POSIX says that multiple adjacent duplication symbols produce undefined results (both BRE and ERE).
Created attachment 12453 [details] Same as regex.c, but BRE syntax
> RE_CONTEXT_INVALID_DUP is only part of RE_SYNTAX_POSIX_BASIC, but not RE_SYNTAX_POSIX_EXTENDED The same behavior is reproducible with basic syntax. See regex-basic.c attached. $ gcc -o regex-basic regex-basic.c $ ./regex-basic 0\\+ 0 pattern: 0\+ string: 0 regex matched $ ./regex-basic 0\\+\\+ 0 pattern: 0\+\+ string: 0 regex matched ./regex-basic 0\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+ 0 pattern: 0\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+ string: 0 <hangs consuming all system memory>
> POSIX says that multiple adjacent duplication symbols produce undefined results (both BRE and ERE). Thanks for pointing this out. I was unaware. So do you think this bug should be closed as INVALID or WONTFIX, or is there value in investigating the excessive memory consumption and/or rejecting on compilation like bionic does?
0\+ is not a valid BRE.
Really? The regex.h comments suggest otherwise. #define RE_SYNTAX_POSIX_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM | RE_CONTEXT_INVALID_DUP) /* If this bit is not set, then + and ? are operators, and \+ and \? are literals. If set, then \+ and \? are operators and + and ? are literals. */ # define RE_BK_PLUS_QM (RE_BACKSLASH_ESCAPE_IN_LISTS << 1)
Dup of 20095 *** This bug has been marked as a duplicate of bug 20095 ***