Bug 20095 - parse_dup_op duplicates the tree exponentially when using repeated +
Summary: parse_dup_op duplicates the tree exponentially when using repeated +
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: regex (show other bugs)
Version: 2.24
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 25814 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-05-14 01:08 UTC by Eduardo Bustamante
Modified: 2023-08-24 22:43 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eduardo Bustamante 2016-05-14 01:08:41 UTC
For every repeated + in an extended regex, parse_dup_op seems to duplicate the parse tree.

dualbus@hp:~/v$ ulimit -a | grep cpu
cpu time               (seconds, -t) 1
dualbus@hp:~/v$ grep -E '.++++++++++++++++++++++++++++++++' <<< .
Killed

This seems to be special to +, since * doesn't behave that way.

My guess is that:

.+ is expanded to ..*

So

.+++ is expanded to ........*

And so on. Is this documented somewhere?
Comment 1 Eduardo Bustamante 2016-05-14 01:10:49 UTC
This seems to be related to bug 17150
Comment 2 David Mendenhall 2020-04-15 15:53:56 UTC
*** Bug 25814 has been marked as a duplicate of this bug. ***
Comment 3 Jonathan Wakely 2023-08-24 15:14:35 UTC
This behaviour can rapidly exhaust memory (Bug 25814, Bug 28864, Bug 20095
, Bug 29642), which seems unhelpful when ".++" is not even a valid regex. POSIX says it's undefined:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_06

Why doesn't regcomp just fail to compile it with REG_BADRPT?

Similarly for ".**" etc.

GNU grep seems to have tests for these that expect BADRPT:
https://git.savannah.gnu.org/cgit/grep.git/tree/tests/tests?h=v3.11#n234
Comment 4 Adhemerval Zanella 2023-08-24 20:21:35 UTC
I am not sure who exactly GNU grep handles this since it also uses gnulib regex code. Is this code really being tested by GNU grep? I deleted the fil, make check, and it seems not to affect the test's outcome.

I also tried to sync with gnulib master, but it also does not help.
Comment 5 Jonathan Wakely 2023-08-24 20:28:47 UTC
(In reply to Adhemerval Zanella from comment #4)
> Is this code really being tested by GNU grep?

No idea - I grepped the grep code for cases of '++' and found those tests, I don't know if they're actually run or not.

FWIW, Solaris 2.11 and AIX 7.3 both have the same behaviour for "a++"

jwakely@gcc-solaris11:~$ /usr/xpg4/bin/grep -E  a++ <<< a
a

$ /usr/bin/grep -E  a++ <<< a
a


So maybe POSIX says it's undefined to allow for this traditional/common behaviour.

Glibc's support for it seems poor though, given the memory exhaustion problems.