This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

I need to mangle a file containing "8-bit ASCII" characters (i.e. the
file contains also characters in the upper 8-bit range, namely a few
umlauts as well as some french accented characters). 

Strange enough, the SED version that came as part of cygwin emits the
result of the mangling using 16-bit characters (I believe those are
Unicode-16 characters, but not sure. The Hexeditor shows each second
byte as always 00, execpt for the first two bytes which read FF FE).

Alas, this makes the next program in the chain to throw up and die.

How can one suppress this conversion? I found no option or flag to
tell SED to stay with 8-bit characters.

Just in case: I need this only to strip some trailing blanks and
convert tabs to spaces, etc. the conversion doesn't need to do
anything with those characaters that have the 8th bit set (except that
it needs to maintain them as is).


Unsubscribe info:
Problem reports:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]