sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

Michael Moser michael.moser@sunrise.ch
Mon Mar 30 12:36:00 GMT 2009


I need to mangle a file containing "8-bit ASCII" characters (i.e. the
file contains also characters in the upper 8-bit range, namely a few
umlauts as well as some french accented characters). 

Strange enough, the SED version that came as part of cygwin emits the
result of the mangling using 16-bit characters (I believe those are
Unicode-16 characters, but not sure. The Hexeditor shows each second
byte as always 00, execpt for the first two bytes which read FF FE).

Alas, this makes the next program in the chain to throw up and die.

How can one suppress this conversion? I found no option or flag to
tell SED to stay with 8-bit characters.

Just in case: I need this only to strip some trailing blanks and
convert tabs to spaces, etc. the conversion doesn't need to do
anything with those characaters that have the 8th bit set (except that
it needs to maintain them as is).

Michael


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list