/bin/ls -l cannot handle printable Unicode characters outside the BMP ...
Cedric Blancher
cedric.blancher@gmail.com
Sat Nov 23 11:21:56 GMT 2024
On Sat, 23 Nov 2024 at 11:44, Cedric Blancher <cedric.blancher@gmail.com> wrote:
>
> Good morning!
>
> /bin/ls -l cannot handle printable Unicode characters outside the BMP
>
> Example using '𝒯'
> bash -c 'printf "\U0001D4AF\n"' # MATHEMATICAL SCRIPT CAPITAL T
> (yes, our mathematicians want to use THAT as file name)
>
> On Linux:
> LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
> ls -la
> total 8
> -rw-r--r-- 1 ced staden 0 Nov 23 11:29 ööööööö
> -rw-r--r-- 2 ced staden 4 Nov 23 11:31 𝒯
> -rw-r--r-- 2 ced staden 4 Nov 23 11:31𝒯𝒯
>
> On Cygwin:
> LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
> $ ls -la
> -rw-r--r-- 1 ced staden 0 Nov 23 11:29 ööööööö
> -rw-r--r-- 2 ced staden 4 Nov 23 11:31 ''$'\360\235\222\257'
> -rw-r--r-- 2 ced staden 4 Nov 23 11:31 ''$'\360\235\222\257\360\235\222\257'
>
> Looks like the Cygwin locale has a problem with non-BMP chars.
find(1) is even worse:
$ find .
.
./ööööööö
./????
./x??x
The Microsoft Explorer GUI shows the file names correctly, so IMO this
is not a Windows or Win32 API problem.
Ced
--
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur
More information about the Cygwin
mailing list