This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: Need help with multibyte UTF-8 characters
On 2017-12-11 16:36, Thomas Taylor wrote:
> Thank you for your advice on setting my locale to en_US.UTF-8. Unfortunately,
> Cygwin still seems to have trouble displaying some three-byte UTF-8 encoded
> characters correctly. For example, see the following snippet from a "sed"
> file. This file attempts to convert XML-encoded filenames to UTF-8. As you can
> see, it converts one- and two-byte encodings correctly, but fails on some
> three-byte encodings (the en dash, the em dash, and the ellipsis, all of which
> are displayed as a filled-in rectangle):
Going back to first principles - what is your script encoded as and run as?
What characters are in your script?
$ wc -lwmc ...
What does vim say for that script:
:set enc? tenc? fenc? fencs? eol? bomb?
What does locale say sed runs as:
$ locale
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple