$ echo 87654321 | \ grep -E -e '^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\9\8\7\6\5\4\3\2\1$' Segmentation fault Will work on a C reproducer soon.
Minimized testcases (same regex): $ echo 8 | grep -E -e "$regex" 8 # >>> okay $ echo 87 | grep -E -e "$regex" Segmentation fault $ echo 88 | grep -E -e "$regex" 88 # >>> okay $ echo 887 | grep -E -e "$regex" Segmentation fault Also, everything I tried to feed that is of length 9 or higher and should not match, gives either a false positive or a segfault: $ echo 987654321 | grep -E -e "$regex" 887654321 $ echo 484635532 | grep -E -e "$regex" 484635532 $ echo 0123454321 | grep -E -e "$regex" Segmentation fault $ echo 0000123454321 | grep -E -e "$regex" Segmentation fault
I ran into what appears to be the same bug independently and came up with a simpler (all-C) reproducer; please see Bug#17356. I tried to merge the two bug reports via the web interface, but failed to do so.
*** Bug 17356 has been marked as a duplicate of this bug. ***
This bug causes GNU coreutils Bug#22793 "grep -E assertion failure with back references"; see <https://bugs.gnu.org/22793>. I'm adding comments to both bug reports so that the connection between the two bugs is clearer. Although this bug's current assignee is Paolo Bonzini (the original reporter), I think Paolo is pretty busy doing other stuff. Is someone else available to work on regex bugs? I suspect the fix for this bug will not be trivial.
Created attachment 9758 [details] C code to reproduce the bug I attached a slightly-simpler C-language reproducer for the bug, derived from the attachment in Bug#17356. If I compile and run this program, it outputs "a.out: regexec.c:1375: pop_fail_stack: Assertion `num >= 0' failed." and then aborts.
Created attachment 10674 [details] This test case silently returns the wrong answer Following up on a 'grep' bug report here: https://debbugs.gnu.org/29613 attached is a seemingly-related test case which illustrates a bug that causes 'grep' to quietly return the wrong answer instead of dumping core. This test case should exit successfully, but because of the bug regexec returns 0 so the test case exits with status 1. I compiled and ran it on Fedora 27 x86-64 with "gcc regbug.c; ./a.out".
Another test case for this bug can be found here: https://debbugs.gnu.org/32806
Another test case for this bug can be found here: https://debbugs.gnu.org/34238
The master branch has been updated by Andreas Schwab <schwab@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fc141ea78ee3d87c67b18488827fe2d89c9343e7 commit fc141ea78ee3d87c67b18488827fe2d89c9343e7 Author: Andreas Schwab <schwab@suse.de> Date: Wed Oct 30 10:38:36 2019 +0100 Fix array bounds violation in regex matcher (bug 25149) If the regex has more subexpressions than the number of elements allocated in the regmatch_t array passed to regexec then proceed_next_node may access the regmatch_t array outside its bounds. No testcase added because even without this bug it would then crash in pop_fail_stack which is bug 11053.
Created attachment 13204 [details] regex: fix undefined backref behavior I am attaching a proposed patch for this longstanding bug. I plan to email this to libc-alpha shortly.
Did the patch ever get sent to libc-alpha?
(In reply to Michael Hudson-Doyle from comment #11) > Did the patch ever get sent to libc-alpha? Unfortunately I never got around to it. Someone else can shepherd it if it's urgent; otherwise I suppose it can wait until someone gets around to syncing Gnulib with glibc.
The master branch has been updated by Paul Eggert <eggert@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0b5ca7c3e551e5502f3be3b06453324fe8604e82 commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82 Author: Paul Eggert <eggert@cs.ucla.edu> Date: Tue Sep 21 07:47:45 2021 -0700 regex: copy back from Gnulib Copy regex-related files back from Gnulib, to fix a problem with static checking of regex calls noted by Martin Sebor. This merges the following changes: * New macro __attribute_nonnull__ in misc/sys/cdefs.h, for use later when copying other files back from Gnulib. * Use __GNULIB_CDEFS instead of __GLIBC__ when deciding whether to include bits/wordsize.h etc. * Avoid duplicate entries in epsilon closure table. * New regex.h macro _REGEX_NELTS to let regexec say that its pmatch arg should contain nmatch elts. Use that for regexec, instead of __attr_access (which is incorrect). * New regex.h macro _Attr_access_ which is like __attr_access except portable to non-glibc platforms. * Add some DEBUG_ASSERTs to pacify gcc -fanalyzer and to catch recently-fixed performance bugs if they recur. * Add Gnulib-specific stuff to port the dynarray- and lock-using parts of regex code to non-glibc platforms. * Fix glibc bug 11053. * Avoid some undefined behavior when popping an empty fail stack.
(In reply to cvs-commit@gcc.gnu.org from comment #13) > The master branch has been updated by Paul Eggert <eggert@sourceware.org>: > > https://sourceware.org/git/gitweb.cgi?p=glibc.git; > h=0b5ca7c3e551e5502f3be3b06453324fe8604e82 > > commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82 > Author: Paul Eggert <eggert@cs.ucla.edu> > Date: Tue Sep 21 07:47:45 2021 -0700 [...] > * Fix glibc bug 11053. What is the status of this bug? The comment says that it is fixed, and I could check on an Ubuntu 22.04.1 LTS machine with libc6 2.35-0ubuntu3.1 that regbug.c and rebug2.c no longer fail, but the result is still incorrect with the grep example from Debian bug 884075: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075 vinc17@gcc92:~$ echo 11111111111 | grep -E '^(11+)\1+$|^1?$' ; echo $? 11111111111 0
On 9/5/22 18:06, vincent-srcware at vinc17 dot net wrote: > > What is the status of this bug? The comment says that it is fixed, and I could > check on an Ubuntu 22.04.1 LTS machine with libc6 2.35-0ubuntu3.1 that regbug.c > and rebug2.c no longer fail, but the result is still incorrect with the grep > example from Debian bug 884075: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075 > > vinc17@gcc92:~$ echo 11111111111 | grep -E '^(11+)\1+$|^1?$' ; echo $? > 11111111111 > 0 > It looks like my comment <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075#27> was incorrect, in that the two bugs are different bugs. glibc bug 11053 is fixed, but Debian bug 884075 is not fixed. Perhaps a better match for Debian bug 884075 is glibc bug 10844. It's not an important bug. However, if you have time to fix it please feel free to send in a fix.
(In reply to eggert from comment #15) > glibc bug 11053 is fixed, Shouldn't this bug be resolved as fixed, then? > but Debian bug 884075 is not fixed. Perhaps a better match for > Debian bug 884075 is glibc bug 10844. It seems different. With Debian bug 884075, the "|^1?$" part is important (it yields the incorrect output, even though this part isn't matched), and there is nothing like that in glibc bug 10844: vinc17@gcc92:~$ echo 11111111111 | grep --color=auto -E '^(11+)\1+$|^1?$' 11111111111 vinc17@gcc92:~$ echo 11111111111 | grep --color=auto -E '^(11+)\1+$' vinc17@gcc92:~$ Note that for the first command, nothing is colored in "11111111111", i.e. the line is output as appeared to be matched, but no matches are shown by colors. As a comparison, with ten 1s instead of eleven, the line is output and the ten 1s are colored.
This could be simplified a bit: vinc17@gcc92:~$ echo 11111111111 | grep --color=auto -E '^(11+)\1+$|^$' 11111111111 (nothing colored).
(In reply to Vincent Lefèvre from comment #16) > (In reply to eggert from comment #15) > > glibc bug 11053 is fixed, > > Shouldn't this bug be resolved as fixed, then? OK, done. > > Perhaps a better match for > > Debian bug 884075 is glibc bug 10844. > > It seems different. In that case it might be better to file a new glibc bug report, one that corresponds more closely to Debian bug 884075.
Sorry, actually both regbug.c and rebug2.c fail as they return the exit status 1 (with my usual configuration, my prompt shows any non-zero exit status, but this is not the case of the machine on which I had done the test, so that I missed the failure initially): vinc17@gcc92:~$ ./regbug vinc17@gcc92:~$ echo $? 1 vinc17@gcc92:~$ ./rebug2 vinc17@gcc92:~$ echo $? 1 However, in the test from Paolo Bonzini's bug report (comment 0), grep no longer crashes (while it still crashes with glibc 2.34, which does not have the fix). regbug.c is derived from the attachment in Bug#17356 (as said in comment 5). I've tested this original testcase: with glibc 2.34 on x86_64, it crashes (segmentation fault); with glibc 2.35 on riscv64 (host gcc92), it outputs "no match (incorrect)". So it seems that the fix mentioned in comment 13 fixed the crashes (which was the initial bug report), but not the misbehavior. Now, with these new details, is it still OK to regard this bug as fixed and that the misbehavior (rebug.c from Bug#17356; regbug.c and rebug2.c from this bug) is actually a new bug?
(In reply to Vincent Lefèvre from comment #19) > regbug.c is derived from the attachment in Bug#17356 (as said in comment 5). > I've tested this original testcase: with glibc 2.34 on x86_64, it crashes > (segmentation fault); with glibc 2.35 on riscv64 (host gcc92), it outputs > "no match (incorrect)". > > So it seems that the fix mentioned in comment 13 fixed the crashes (which > was the initial bug report), but not the misbehavior. OK, so in that case how about if we update Bug#17356 by (1) saying it is no longer a duplicate of Bug#11053 (as we've fixed the latter but not the former), and (2) reopening Bug#17536? If I understand you correctly, that would match the symptoms you describe.
(In reply to eggert from comment #20) > OK, so in that case how about if we update Bug#17356 by (1) saying it is no > longer a duplicate of Bug#11053 (as we've fixed the latter but not the > former), and (2) reopening Bug#17536? If I understand you correctly, that > would match the symptoms you describe. Yes, I think that this is the best solution.
(In reply to Vincent Lefèvre from comment #21) > (In reply to eggert from comment #20) > > OK, so in that case how about if we update Bug#17356 by (1) saying it is no > > longer a duplicate of Bug#11053 (as we've fixed the latter but not the > > former), and (2) reopening Bug#17536? If I understand you correctly, that > > would match the symptoms you describe. > > Yes, I think that this is the best solution. OK, done.
What about attachment 10674 [details] ("This test case silently returns the wrong answer"), with the pattern "^(11+)\\1+$|^1?$" and the string "1111111111111"? Should it be regarded as part of Bug#17356 or another bug? This case seems quite different from Bug#10844 and Bug#17356. Unless the intent is to group all the bugs about regexp involving backreferences giving a wrong answer[*] (in which case Bug#10844 and Bug#17356 should be regarded as duplicates to each other), I think that this should be a new bug. [*] as opposed to a crash like in this bug 11053.
Sure, feel free to file it as a new bug.
(In reply to eggert from comment #24) > Sure, feel free to file it as a new bug. Bug 29560.
The release/2.34/master branch has been updated by Florian Weimer <fw@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=86a701a20479dfbc23540b3143fd5b28660a2447 commit 86a701a20479dfbc23540b3143fd5b28660a2447 Author: Paul Eggert <eggert@cs.ucla.edu> Date: Tue Sep 21 07:47:45 2021 -0700 regex: copy back from Gnulib Copy regex-related files back from Gnulib, to fix a problem with static checking of regex calls noted by Martin Sebor. This merges the following changes: * New macro __attribute_nonnull__ in misc/sys/cdefs.h, for use later when copying other files back from Gnulib. * Use __GNULIB_CDEFS instead of __GLIBC__ when deciding whether to include bits/wordsize.h etc. * Avoid duplicate entries in epsilon closure table. * New regex.h macro _REGEX_NELTS to let regexec say that its pmatch arg should contain nmatch elts. Use that for regexec, instead of __attr_access (which is incorrect). * New regex.h macro _Attr_access_ which is like __attr_access except portable to non-glibc platforms. * Add some DEBUG_ASSERTs to pacify gcc -fanalyzer and to catch recently-fixed performance bugs if they recur. * Add Gnulib-specific stuff to port the dynarray- and lock-using parts of regex code to non-glibc platforms. * Fix glibc bug 11053. * Avoid some undefined behavior when popping an empty fail stack. (cherry picked from commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82)