Removing ^X in paths

Corinna Vinschen
Thu Feb 3 08:53:01 GMT 2022

On Feb  2 21:12, Dennis Heimbigner wrote:
> I am using 64bit.
> And it has nothing to do misreading characters.
> The ^X is described in this document:
> There you will see this text:
> "If you don't want or can't use UTF-8 as character set for
> whatever reason, you will nevertheless be able to access the
> file. How does that work? When Cygwin converts the filename from
> UTF-16 to your character set, it recognizes characters which
> can't be converted. If that occurs, Cygwin replaces the
> non-convertible character with a special character sequence. The
> sequence starts with an ASCII CAN character (hex code 0x18,
> equivalent Control-X), followed by the UTF-8 representation of
> the character. The result is a filename containing some ugly
> looking characters. While it doesn't look nice, it is nice,
> because Cygwin knows how to convert this filename back to
> UTF-16. The filename will be converted using your usual
> character set. However, when Cygwin recognizes an ASCII CAN
> character, it skips over the ASCII CAN and handles the following
> bytes as a UTF-8 character. Thus, the filename is symmetrically
> converted back to UTF-16 and you can access the file."
> There is no obvious good reason to continue this convention.

You're probably using a non-UTF-8 locale, e. g., LANG=en_US using
ISO-8859-1 as charset.  See the output of `locale -av' to learn what
charset your locale uses.  Either way, converting the UTF-16 filenames
to a non-UTF charset is not lossless.  That's what the ASCII CAN stuff
is for.  If you want to avoid that, use a UTF-8 locale, e.g.


More information about the Cygwin mailing list