This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

Hash: SHA1

According to Corinna Vinschen on 11/6/2009 6:51 AM:
>> The problem *is* with grep (and sed), however, because there is no
>> good reason that UTF-8 should give us a penalty of being 100times
>> slower on most search operations, this is just poor programming of
>> grep and sed.
> The penalty on Linux is much smaller, about 15-20%.  It looks like
> grep is calling malloc for every input line if MB_CUR_MAX is > 1.
> Then it evaluates for each byte in the line whether the byte is a
> single byte or the start of a multibyte sequence using mbrtowc on
> every charatcer on the input line.  Then, for each potential match,
> it checks if it's the start byte of a multibyte sequence and ignores
> all other matches.  Eventually, it calls free, and the game starts
> over for the next line.

Adding bug-grep, since this slowdown caused by additional mallocs is
definitely the sign of a poor algorithm that could be improved by reusing
existing buffers.

- --
Don't work too hard, make some time for fun as well!

Eric Blake   
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at
Comment: Using GnuPG with Mozilla -


Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]