This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: BUG: grep (GNU grep) 2.24

On 03/21/2016 07:40 AM, Gordon Grimes wrote:
> Hi,
> I had generated a FILE by simply doing a 'find' on a directory and used grep to cull the results.  I wasn't working so I repeated and tried the following trivial 'grep':
> % wc -l FILE
> 48786
> % grep . FILE
> 2240
> Very wrong. 

Umm, grep doesn't output counts unless you use 'grep -c'.  Also, one of
the big changes in recent grep is more efficient handling of encoding
errors; remember, the regular expression '.' is only supposed to match
valid characters, and that an encoding error can cause grep to quit
checking; so in all likelihood, your problem stems from the fact that
the contents of FILE contain an encoding error in your current locale.

But, as others have already pointed out, you didn't post a simple
reproducible example for us to confirm, nor tell us what locale you are
using, nor tell us whether you have tried LC_ALL=C to see if forcing a
single-byte locale with no encoding errors cleans up the problem.

So, as the grep maintainer, I'm awaiting proof that there is a problem
(or confirmation that the bug is on your end, and not in grep) before I
worry about putting out another build of grep.  Something like this is

$ printf 'a\n\x80\nc\n' | LC_ALL=en_US.UTF-8 wc -l
$ printf 'a\n\x80\nc\n' | LC_ALL=en_US.UTF-8 src/grep -c .
$ printf 'a\n\x80\nc\n' | LC_ALL=C src/grep -c .

Note how wc counts \n characters, regardless of encoding errors
elsewhere, while grep -c skips the \x80 line because it contains nothing
but encoding errors in UTF-8.

Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library

Attachment: signature.asc
Description: OpenPGP digital signature

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]