1.3.18: BUG: Piping DOS files to grep (v2.5) doesn't work properl y

Stacey Sheldon ssheldon@catena.com
Thu Jan 16 06:31:00 GMT 2003


Mailing list search didn't find this, nor does it appear
in the FAQ... hopefully this isn't old news to all of you.

Files read from a pipe are treated differently by grep
than files read directly.  This results in some unexpected
(by me) behaviour when using grep on files which use
the a DOS line-end (cr/nl).  This looks like a bug to me.

I'd expect the following commands to have equivalent
results:

  grep myregex blah
  grep myregex < blah
  cat blah | grep myregex

They are equivalent when the regular file blah uses
Unix line ends, but they differ for a file blahdos which
uses DOS line ends.  It appears to me as though grep
is treating its input as binary when reading from a pipe,
but correctly using "undossify_input()" in other cases.

Here is an example.  I've created two files, blah (nl line-end)
and blahdos (cr/nl line-end).

   $ cat blah
   foobarTest
   $ od -Ax -a blah
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ od -Ax -a blahdos
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

These files should match the regex 'Test$' in all cases,
but grep on blahdos fails for this case:

   $ cat blahdos | grep 'Test$'
   $

And here's why (not the -v to invert the match so we have
something to look at):

   $ cat blahdos | grep -v 'Test$' | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

There's still a cr/nl on the output which wouldn't be there if
grep had interpreted its input as having DOS line ends.  Here's
what a successful grep of the UNIX line end file looks like:

   $ cat blah | grep 'Test$' | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b

In fact, if I read the blahdos file in any other way except through
a pipe, it successfully matches (note the stripped out cr on the output):

   $ grep 'Test$' blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ grep 'Test$' < blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b

Just in case you might think that this has something to do with cat
(I did), here's the output of cat for each file:

   $ cat blah | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ cat blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

Using head instead of cat gives the same results as well, just to 
completely remove cat from the picture.

I'm currently running these versions of tools on win2k:
  cygwin     1.3.18-1
  textutils  2.0.21 (cat, od, head)
  grep       2.5
  bash       2.05b.0(8)-release

I also tried this out with cygwin 1.3.17-1 with identical results.

If you need any further information, please cc me directly since I
don't read the mailing lists very often.

Stacey.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list