BUG: grep (GNU grep) 2.24

Eric Blake eblake@redhat.com
Mon Mar 21 16:30:00 GMT 2016

On 03/21/2016 07:40 AM, Gordon Grimes wrote:
> Hi,
> I had generated a FILE by simply doing a 'find' on a directory and used grep to cull the results.  I wasn't working so I repeated and tried the following trivial 'grep':
> % wc -l FILE
> 48786
> % grep . FILE
> 2240
> Very wrong. 

Umm, grep doesn't output counts unless you use 'grep -c'.  Also, one of
the big changes in recent grep is more efficient handling of encoding
errors; remember, the regular expression '.' is only supposed to match
valid characters, and that an encoding error can cause grep to quit
checking; so in all likelihood, your problem stems from the fact that
the contents of FILE contain an encoding error in your current locale.

But, as others have already pointed out, you didn't post a simple
reproducible example for us to confirm, nor tell us what locale you are
using, nor tell us whether you have tried LC_ALL=C to see if forcing a
single-byte locale with no encoding errors cleans up the problem.

So, as the grep maintainer, I'm awaiting proof that there is a problem
(or confirmation that the bug is on your end, and not in grep) before I
worry about putting out another build of grep.  Something like this is

$ printf 'a\n\x80\nc\n' | LC_ALL=en_US.UTF-8 wc -l
$ printf 'a\n\x80\nc\n' | LC_ALL=en_US.UTF-8 src/grep -c .
$ printf 'a\n\x80\nc\n' | LC_ALL=C src/grep -c .

Note how wc counts \n characters, regardless of encoding errors
elsewhere, while grep -c skips the \x80 line because it contains nothing
but encoding errors in UTF-8.

Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 604 bytes
Desc: OpenPGP digital signature
URL: <http://cygwin.com/pipermail/cygwin/attachments/20160321/b34aa8c2/attachment.sig>

More information about the Cygwin mailing list