This is the mail archive of the
mailing list for the Cygwin project.
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Christopher Faylor wrote:
On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote:This is likely to be triggered by the transition to UTF-8 as a default
charset. The same problem is observed on Linux, with grep as well as
Running grep on a 20MB file with ~100,000 matches takes an incredible almost
8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
(on a 2nd machine).
I've seen nasty behavior with grep that isnt' cygwin specific. Try
"pcregrep" and see if you have the same issue.
I found it to be about ~100 times faster under _some_ searches though
2-3x is more typical. The gnu re-parser isn't real efficient under
If you find a big difference, you might also want to report it to the
firstname.lastname@example.org mailing list, but last time I did, they told me
"that's the way it is" due to some posix conformance thing...
The fact that it behaves differently between Cygwin 1.5 and 1.7 would
suggest that this isn't a grep problem.
That's why I have changed most of my shell scripts to use something like
LC_ALL=C grep or LC_ALL=C sed
where possible. Please try this.
The problem *is* with grep (and sed), however, because there is no good
reason that UTF-8 should give us a penalty of being 100times slower on
most search operations, this is just poor programming of grep and sed.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple