This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: question about regex
- From: Paul Eggert <eggert at cs dot ucla dot edu>
- To: Tim Rühsen <tim dot ruehsen at gmx dot de>, liqingqing <liqingqing3 at huawei dot com>
- Cc: libc-alpha at sourceware dot org, Florian Weimer <fweimer at redhat dot com>, Carlos O'Donell <carlos at redhat dot com>, Hushiyuan <hushiyuan at huawei dot com>, Liusirui <liusirui at huawei dot com>
- Date: Thu, 2 Jan 2020 14:14:27 -0800
- Subject: Re: question about regex
- References: <fe3eef9c-740e-9b83-c472-59f8e2353b86@huawei.com> <e92f791d-7448-17ce-2820-a968770791b6@huawei.com> <fe84ed3d-6bbc-ea87-4e50-93e11736b005@gmx.de>
On 1/2/20 8:16 AM, Tim Rühsen wrote:
Meanwhile grep (or libc) seems to exit gracefully:
Yes, there's no core dump if the operating system supports stack
overflow detection that grep can use. The problem occurs only on OSes
that don't do that, or on apps that don't try to detect stack overflow
and simply dump core (or worse).
On 1/2/20 2:54 AM, liqingqing wrote:
do we have any plan or good ways to fix up the bug as below
The best way would be to fix bug#24269, i.e., fix the glibc regex code
so that it doesn't blow the stack. If you could write a patch for this
bug (something that doesn't hurt performance for ordinary regexps), that
would be welcome.
For that particular test case, you can use an OS that does proper stack
overflow checking that grep can use.
PS. The next version of the grep manual is planned to nearly wash its
hands of the matter. Here's the current draft:
----
Back-references can greatly slow down matching, as they can generate
exponentially many matching possibilities that can consume both time
and memory to explore. Also, the POSIX specification for
back-references is at times unclear. Furthermore, many regular
expression implementations have back-reference bugs that can cause
programs to return incorrect answers or even crash, and fixing these
bugs has often been low-priority: for example, as of 2020 the
@url{https://sourceware.org/bugzilla/,GNU C library bug database}
contained back-reference bugs
@url{https://sourceware.org/bugzilla/show_bug.cgi?id=52,,52},
@url{https://sourceware.org/bugzilla/show_bug.cgi?id=10844,,10844},
@url{https://sourceware.org/bugzilla/show_bug.cgi?id=11053,,11053},
@url{https://sourceware.org/bugzilla/show_bug.cgi?id=24269,,24269}
and @url{https://sourceware.org/bugzilla/show_bug.cgi?id=25322,,25322},
with little sign of forthcoming fixes. Luckily,
back-references are rarely useful and it should be little trouble to
avoid them in practical applications.