Grepping Unicode files?

Nellis, Kenneth Kenneth.Nellis@xerox.com
Thu May 14 16:32:00 GMT 2015


> Does Cygwin’s grep support Unicode files? The output from a SQL Server SQL
> Agent job is a Unicode file, i.e. if you look at it in a hex editor every
> other character is 00 because each character is taking up two bytes. The
> filename itself is fine, it’s the contents that is Unicode. I can’t get
> grep to work on it, either with or without -a.
> 
> This may not be a Cygwin-specific question, but I haven’t been able to
> find anything after several Google searches, including the archives, and
> neither --help nor the man page for grep references Unicode.
> 
> By default I have neither LC_ALL nor LC_COLLATE set.
> 
> A pointer to a better search or a website that explains this would be
> great, or if it can’t currently be done, that’s OK, too.
> 
> Thanks for your help!

If you don't have iconv, install the libiconv package.

Then, if what your searching for is in the ascii character set,
then the following should work:

iconv -f utf16 -t utf8 {your file} | grep {your RE}

--Ken Nellis


More information about the Cygwin mailing list