/bin/ls -l cannot handle printable Unicode characters outside the BMP ...
Christian Franke
Christian.Franke@t-online.de
Sat Nov 23 14:01:47 GMT 2024
Cedric Blancher via Cygwin wrote:
> On Sat, 23 Nov 2024 at 11:44, Cedric Blancher <cedric.blancher@gmail.com> wrote:
>> Good morning!
>>
>> /bin/ls -l cannot handle printable Unicode characters outside the BMP
>>
>> Example using '𝒯'
>> bash -c 'printf "\U0001D4AF\n"' # MATHEMATICAL SCRIPT CAPITAL T
>> (yes, our mathematicians want to use THAT as file name)
>>
>> On Linux:
>> LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
>> ls -la
>> total 8
>> -rw-r--r-- 1 ced staden 0 Nov 23 11:29 ööööööö
>> -rw-r--r-- 2 ced staden 4 Nov 23 11:31 𝒯
>> -rw-r--r-- 2 ced staden 4 Nov 23 11:31𝒯𝒯
>>
>> On Cygwin:
>> LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
>> $ ls -la
>> -rw-r--r-- 1 ced staden 0 Nov 23 11:29 ööööööö
>> -rw-r--r-- 2 ced staden 4 Nov 23 11:31 ''$'\360\235\222\257'
>> -rw-r--r-- 2 ced staden 4 Nov 23 11:31 ''$'\360\235\222\257\360\235\222\257'
>>
>> Looks like the Cygwin locale has a problem with non-BMP chars.
> find(1) is even worse:
> $ find .
> .
> ./ööööööö
> ./????
> ./x??x
>
> The Microsoft Explorer GUI shows the file names correctly, so IMO this
> is not a Windows or Win32 API problem.
Slightly different filename problem which may be related or not:
https://sourceware.org/pipermail/cygwin/2024-September/256451.html
--
Regards,
Christian
More information about the Cygwin
mailing list