readdir() returns inaccessible name if file was created with invalid UTF-8
Jeremy Drake
cygwin@jdrake.com
Thu Sep 19 17:18:50 GMT 2024
On Thu, 19 Sep 2024, Brian Inglis via Cygwin wrote:
> On 2024-09-19 07:27, Christian Franke via Cygwin wrote:
> >
> >
> > Yes, but Cygwin does not provide consistent forward/reverse UTF-8 <-> UTF-16
> > mappings.
>
> Surrogates halves are invalid for UTF-8 encoding; they should be first be
> encoded as a valid UTF-16 code point.
> The encoder should just fail if it encounters any invalid sequence!
> Handling surrogates or other invalid values as anything other than invalid
> turns
> the encoding into what has been called WTF-8 where W may be for Windows! ;^>
This may be necessary though, in order to round-trip anything which
is valid in NTFS. In my opinion, rm -rf not failing in the face of
potentially maliciously named files/directories is more important than
strictly adhering to a standard that says 'fail if you see these values'.
https://cygwin.com/pipermail/cygwin/2024-June/256111.html
More information about the Cygwin
mailing list