This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

On Fri, Nov 06, 2009 at 02:09:59PM +0100, Thomas Wolff wrote:
>Christopher Faylor wrote:
>> On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote:
>>> aputerguy wrote:
>>>> Running grep on a 20MB file with ~100,000 matches takes an incredible almost
>>>> 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
>>>> (on a 2nd machine).
>>> I've seen nasty behavior with grep that isnt' cygwin specific.  Try
>>> "pcregrep" and see if you have the same issue.
>>> I found it to be about ~100 times faster under _some_ searches though
>>> 2-3x is more typical.  The gnu re-parser isn't real efficient under
>>> some circumstances.
>>> If you find a big difference, you might also want to report it to the
>>> mailing list, but last time I did, they told me
>>> "that's the way it is" due to some posix conformance thing...
>> The fact that it behaves differently between Cygwin 1.5 and 1.7 would
>> suggest that this isn't a grep problem.
>This is likely to be triggered by the transition to UTF-8 as a default 
>charset. The same problem is observed on Linux, with grep as well as 
>with sed.
>That's why I have changed most of my shell scripts to use something like
>LC_ALL=C grep or LC_ALL=C sed
>where possible. Please try this.

Thanks for catching this.  I'll hold off on trying the test case until I
hear a report about running the same test with LC_ALL=C.


Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]