20095 – parse_dup_op duplicates the tree exponentially when using repeated +

Bug 20095 - parse_dup_op duplicates the tree exponentially when using repeated +

Summary: parse_dup_op duplicates the tree exponentially when using repeated +

Status:	NEW

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	regex (show other bugs)
Version:	2.24

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Duplicates (1):	25814 (view as bug list)
Depends on:
Blocks:

Reported:	2016-05-14 01:08 UTC by Eduardo Bustamante
Modified:	2023-08-24 22:43 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eduardo Bustamante 2016-05-14 01:08:41 UTC

For every repeated + in an extended regex, parse_dup_op seems to duplicate the parse tree.

dualbus@hp:~/v$ ulimit -a | grep cpu
cpu time               (seconds, -t) 1
dualbus@hp:~/v$ grep -E '.++++++++++++++++++++++++++++++++' <<< .
Killed

This seems to be special to +, since * doesn't behave that way.

My guess is that:

.+ is expanded to ..*

So

.+++ is expanded to ........*

And so on. Is this documented somewhere?

Comment 1 Eduardo Bustamante 2016-05-14 01:10:49 UTC

This seems to be related to bug 17150

Comment 2 David Mendenhall 2020-04-15 15:53:56 UTC

*** Bug 25814 has been marked as a duplicate of this bug. ***

Comment 3 Jonathan Wakely 2023-08-24 15:14:35 UTC

This behaviour can rapidly exhaust memory (Bug 25814, Bug 28864, Bug 20095
, Bug 29642), which seems unhelpful when ".++" is not even a valid regex. POSIX says it's undefined:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_06

Why doesn't regcomp just fail to compile it with REG_BADRPT?

Similarly for ".**" etc.

GNU grep seems to have tests for these that expect BADRPT:
https://git.savannah.gnu.org/cgit/grep.git/tree/tests/tests?h=v3.11#n234

Comment 4 Adhemerval Zanella 2023-08-24 20:21:35 UTC

I am not sure who exactly GNU grep handles this since it also uses gnulib regex code. Is this code really being tested by GNU grep? I deleted the fil, make check, and it seems not to affect the test's outcome.

I also tried to sync with gnulib master, but it also does not help.

Comment 5 Jonathan Wakely 2023-08-24 20:28:47 UTC

(In reply to Adhemerval Zanella from comment #4)
> Is this code really being tested by GNU grep?

No idea - I grepped the grep code for cases of '++' and found those tests, I don't know if they're actually run or not.

FWIW, Solaris 2.11 and AIX 7.3 both have the same behaviour for "a++"

jwakely@gcc-solaris11:~$ /usr/xpg4/bin/grep -E  a++ <<< a
a

$ /usr/bin/grep -E  a++ <<< a
a


So maybe POSIX says it's undefined to allow for this traditional/common behaviour.

Glibc's support for it seems poor though, given the memory exhaustion problems.