readdir() returns inaccessible name if file was created with invalid UTF-8

Thomas Wolff towo@towo.net
Sun Sep 15 18:15:48 GMT 2024


Am 15.09.2024 um 19:47 schrieb Christian Franke via Cygwin:
> If a file name contains an invalid (truncated) UTF-8 sequence, open()
> does not refuse to create the file. Later readdir() returns a
> different name which could not be used to access the file.
>
> Testcase with U+1F321 (Thermometer):
>
> $ uname -r
> 3.5.4-1.x86_64
>
> $ printf $'\U0001F321' | od -A none -t x1
>  f0 9f 8c a1
>
> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext'
>
> $ touch 'file2-'$'\xf0\x9f\x8c''.ext'
>
> $ touch 'file3-'$'\xf0\x9f\x8c'
>
> $ ls -1
> ls: cannot access 'file2-.?ext': No such file or directory
> ls: cannot access 'file3-': No such file or directory
> 'file1-'$'\360\237\214\241''.ext'
> file2-.?ext
> file3-
I don't reproduce this.
While the file name gets mangled, all resulting file names are valid and
listed:
In file2 the sequence is turned into U+17B3 but exchanged with the dot.
In file3 the same sequence is just dropped.
$ ls -1|cat
file1-🌡.ext
file2-.ឳext
file3-

However, ls file2* fails, as does ls *.

>
>
> Name mapping according to "fhandler_disk_file::readdir" strace lines:
>
> "file1-\xF0\x9F\x8C\xA1.ext" -(open)-> L"file1-\xD83C\xDF21.ext"
> -(readdir)->
> "file1-\xF0\x9F\x8C\xA1.ext"
>
> "file2-\xF0\x9f\x8C.ext" -(open)-> L"file2-\xD83C\xF02Eext" -(readdir)->
> "file2-.\xE1\x9E\xB3ext"
>
> "file3-\xF0\x9F\x8C" -(open)-> L"file3-\xD83C\xF000" -(readdir)->
> "file3-"
>
> Issue found because 'stress-ng --filename ...' could not cleanup its
> temp directory.
>



More information about the Cygwin mailing list