Removing ^X in paths
Corinna Vinschen
corinna-cygwin@cygwin.com
Thu Feb 3 08:53:01 GMT 2022
On Feb 2 21:12, Dennis Heimbigner wrote:
> I am using 64bit.
> And it has nothing to do misreading characters.
>
> The ^X is described in this document:
> https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,
>
> There you will see this text:
>
> "If you don't want or can't use UTF-8 as character set for
> whatever reason, you will nevertheless be able to access the
> file. How does that work? When Cygwin converts the filename from
> UTF-16 to your character set, it recognizes characters which
> can't be converted. If that occurs, Cygwin replaces the
> non-convertible character with a special character sequence. The
> sequence starts with an ASCII CAN character (hex code 0x18,
> equivalent Control-X), followed by the UTF-8 representation of
> the character. The result is a filename containing some ugly
> looking characters. While it doesn't look nice, it is nice,
> because Cygwin knows how to convert this filename back to
> UTF-16. The filename will be converted using your usual
> character set. However, when Cygwin recognizes an ASCII CAN
> character, it skips over the ASCII CAN and handles the following
> bytes as a UTF-8 character. Thus, the filename is symmetrically
> converted back to UTF-16 and you can access the file."
>
> There is no obvious good reason to continue this convention.
You're probably using a non-UTF-8 locale, e. g., LANG=en_US using
ISO-8859-1 as charset. See the output of `locale -av' to learn what
charset your locale uses. Either way, converting the UTF-16 filenames
to a non-UTF charset is not lossless. That's what the ASCII CAN stuff
is for. If you want to avoid that, use a UTF-8 locale, e.g.
en_US.UTF-8.
Corinna
More information about the Cygwin
mailing list