readdir() returns inaccessible name if file was created with invalid UTF-8
Christian Franke
Christian.Franke@t-online.de
Wed Jun 25 14:59:04 GMT 2025
On Sun, 15 Sep 2024 19:47:11 +0200, Christian Franke wrote:
> If a file name contains an invalid (truncated) UTF-8 sequence, open()
> does not refuse to create the file. Later readdir() returns a
> different name which could not be used to access the file.
>
> Testcase with U+1F321 (Thermometer):
>
> $ uname -r
> 3.5.4-1.x86_64
>
> $ printf $'\U0001F321' | od -A none -t x1
> f0 9f 8c a1
>
> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext'
>
> $ touch 'file2-'$'\xf0\x9f\x8c''.ext'
>
> $ touch 'file3-'$'\xf0\x9f\x8c'
>
> $ ls -1
> ls: cannot access 'file2-.?ext': No such file or directory
> ls: cannot access 'file3-': No such file or directory
> 'file1-'$'\360\237\214\241''.ext'
> file2-.?ext
> file3-
>
>
> Name mapping according to "fhandler_disk_file::readdir" strace lines:
>
> "file1-\xF0\x9F\x8C\xA1.ext" -(open)-> L"file1-\xD83C\xDF21.ext"
> -(readdir)->
> "file1-\xF0\x9F\x8C\xA1.ext"
>
> "file2-\xF0\x9f\x8C.ext" -(open)-> L"file2-\xD83C\xF02Eext" -(readdir)->
> "file2-.\xE1\x9E\xB3ext"
>
> "file3-\xF0\x9F\x8C" -(open)-> L"file3-\xD83C\xF000" -(readdir)->
> "file3-"
>
> Issue found because 'stress-ng --filename ...' could not cleanup its
> temp directory.
>
A closer look many month later with Cygwin 3.7.0-0.137.g756669312c97 and
current upstream of stress-ng reveals a related problem which is
possibly more serious:
In cases like file3-... above, the converted Windows path ends with
0xF000. This suggests that this is an accidental conversion of the
terminating null to the 0xF0xx range.
In some cases, the created Windows file name has random garbage behind
the 0xF000. Then even Cygwin is not able to access or unlink the file
after creation.
In fortunately very rare cases, the created Windows file is not
accessible from Win32 layer itself because it looks like
L"file3-\xD83C\xF000garbage."
or
L"file3-\xD83C\xF000garbage "
which is invalid on Win32 layer due to trailing '.' or space. Then a
tool which removes the file via Nt*() layer is required.
Could not provide a reproducible testcase, sorry.
'stress-ng --filename 1' succeeds, but may silently leave temp files
behind. The next stress-ng release will report an error if unlink() of
such a file fails.
Caution: Files created that way may be not removable with "onboard"
tools, see above.
--
Regards,
Christian
More information about the Cygwin
mailing list