This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

Thomas Wolff wrote:
> Christopher Faylor wrote:
>>> aputerguy wrote:

>>>> Running grep on a 20MB file with ~100,000 matches takes an
>>>> incredible almost 8 minutes under Cygwin 1.7 while taking just 0.2
>>>> seconds under Cygwin 1.5 (on a 2nd machine).
>> The fact that it behaves differently between Cygwin 1.5 and 1.7 would
>> suggest that this isn't a grep problem.
> This is likely to be triggered by the transition to UTF-8 as a
> default charset. The same problem is observed on Linux, with grep as
> well as with sed.  
> That's why I have changed most of my shell scripts to use something
> like LC_ALL=C grep or LC_ALL=C sed where possible. Please try this. 

I don't have Cygwin 1.5 for comparison, but the testcase provided does show grep using a long time on *my* Cygwin 1.7.
And LC_ALL=C didn't seem to help:

$ time grep dog testfile | wc
 100000  900000 4500000

real    3m28.229s
user    3m26.951s
Sys     0m0.170s

$ LC_ALL=C time grep dog testfile | wc
207.26user 0.06system 3:28.32elapsed 99%CPU (0avgtext+0avtdata 278784maxresident)k
0inputs+0outputs (1091major+0minor)pagefaults 0swaps
 100000  900000 4500000

$ time LC_ALL=C grep dog testfile | wc
 100000  900000 4500000

real    3m24.265s
user    3m24.124s
sys     0m0.202s

(Well . . . Doesn't help very *much*, anyway; a few seconds.)
I don't have the *latest* 1.7:

$ uname -a
CYGWIN_NT-5.1 gldlkcooper 1.7.0(0.214/5/3) 2009-10-03 14:33 i686 Cygwin

Karl Cooper

Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]