This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Corinna Vinschen on 11/6/2009 6:51 AM:
>> The problem *is* with grep (and sed), however, because there is no
>> good reason that UTF-8 should give us a penalty of being 100times
>> slower on most search operations, this is just poor programming of
>> grep and sed.
>
> The penalty on Linux is much smaller, about 15-20%. It looks like
> grep is calling malloc for every input line if MB_CUR_MAX is > 1.
> Then it evaluates for each byte in the line whether the byte is a
> single byte or the start of a multibyte sequence using mbrtowc on
> every charatcer on the input line. Then, for each potential match,
> it checks if it's the start byte of a multibyte sequence and ignores
> all other matches. Eventually, it calls free, and the game starts
> over for the next line.
Adding bug-grep, since this slowdown caused by additional mallocs is
definitely the sign of a poor algorithm that could be improved by reusing
existing buffers.
- --
Don't work too hard, make some time for fun as well!
Eric Blake ebb9@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkr0KxUACgkQ84KuGfSFAYCOCACgvjz2v65vK8DIcGg6zfnLQgcT
tfQAmwbpWbriBJSv0rjYobYgsh4KXOiZ
=B3nZ
-----END PGP SIGNATURE-----
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple