Bug 25814 - Consecutive + operators accepted but have no effect except consuming more memory
Summary: Consecutive + operators accepted but have no effect except consuming more memory
Status: RESOLVED DUPLICATE of bug 20095
Alias: None
Product: glibc
Classification: Unclassified
Component: regex (show other bugs)
Version: 2.27
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-12 04:06 UTC by David Mendenhall
Modified: 2020-07-31 09:03 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
test program (365 bytes, text/x-csrc)
2020-04-12 04:06 UTC, David Mendenhall
Details
Same as regex.c, but BRE syntax (348 bytes, text/x-csrc)
2020-04-13 02:58 UTC, David Mendenhall
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Mendenhall 2020-04-12 04:06:37 UTC
Created attachment 12452 [details]
test program

The attached test program just takes a regex pattern and a string at the command line.

$ gcc -o regex regex.c
$ ./regex 0+ 0
pattern: 0+
string: 0

regex matched
$ ./regex 0++ 0
pattern: 0++
string: 0

regex matched
$ ./regex 0++++++++++++++++++++++++++++++++++ 0
pattern: 0++++++++++++++++++++++++++++++++++
string: 0
<hangs consuming all system memory>

I'm not even sure what consecutive + operators is supposed to mean, so I don't know why "0++" accepts "0".

I tested this against the bionic/NetBSD regex implementation and compilation of "0++" fails with REG_BADRPT, which makes more sense.
Comment 1 Andreas Schwab 2020-04-12 11:55:53 UTC
RE_CONTEXT_INVALID_DUP is only part of RE_SYNTAX_POSIX_BASIC, but not RE_SYNTAX_POSIX_EXTENDED.
Comment 2 Andreas Schwab 2020-04-12 12:15:23 UTC
POSIX says that multiple adjacent duplication symbols produce undefined results (both BRE and ERE).
Comment 3 David Mendenhall 2020-04-13 02:58:18 UTC
Created attachment 12453 [details]
Same as regex.c, but BRE syntax
Comment 4 David Mendenhall 2020-04-13 02:59:57 UTC
> RE_CONTEXT_INVALID_DUP is only part of RE_SYNTAX_POSIX_BASIC, but not RE_SYNTAX_POSIX_EXTENDED

The same behavior is reproducible with basic syntax. See regex-basic.c attached.

$ gcc -o regex-basic regex-basic.c
$ ./regex-basic 0\\+ 0
pattern: 0\+
string: 0

regex matched
$ ./regex-basic 0\\+\\+ 0
pattern: 0\+\+
string: 0

regex matched
./regex-basic 0\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+\\+ 0
pattern: 0\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+
string: 0
<hangs consuming all system memory>
Comment 5 David Mendenhall 2020-04-13 03:02:13 UTC
> POSIX says that multiple adjacent duplication symbols produce undefined results (both BRE and ERE).

Thanks for pointing this out. I was unaware.

So do you think this bug should be closed as INVALID or WONTFIX, or is there value in investigating the excessive memory consumption and/or rejecting on compilation like bionic does?
Comment 6 Andreas Schwab 2020-04-13 11:05:14 UTC
0\+ is not a valid BRE.
Comment 7 David Mendenhall 2020-04-13 15:35:27 UTC
Really? The regex.h comments suggest otherwise.

#define RE_SYNTAX_POSIX_BASIC                                           \
  (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM | RE_CONTEXT_INVALID_DUP)

/* If this bit is not set, then + and ? are operators, and \+ and \? are
     literals.
   If set, then \+ and \? are operators and + and ? are literals.  */
# define RE_BK_PLUS_QM (RE_BACKSLASH_ESCAPE_IN_LISTS << 1)
Comment 8 David Mendenhall 2020-04-15 15:53:56 UTC
Dup of 20095

*** This bug has been marked as a duplicate of bug 20095 ***