With bad UTF-8, cygwin can create files it can't read
Corinna Vinschen
corinna-cygwin@cygwin.com
Wed Apr 1 16:10:00 GMT 2015
On Apr 1 15:34, Corinna Vinschen wrote:
> Hi Stuart,
>
> On Mar 30 13:04, Corinna Vinschen wrote:
> > On Mar 25 14:34, Kyzer wrote:
> > > Hello,
> > >
> > > I've found that if you use cygwin to create a file with badly-encoded
> > > UTF-8, readdir() gives out an entry with a name that cygwin won't
> > > subsequently accept.
> > >
> > > * create a file using filename with hex bytes F4 8F BF BF
> > > * readdir() reports the filename as hex bytes E2 8E B3 ED BF BF
> > > * attempting to open or unlink the filename E2 8E B3 ED BF BF fails
> > > * attempting to open or unlink the filename F4 8F BF BF succeeds
> >
> > Thanks for the testcase. I'll have a look later this week (I hope).
>
> Wow. Just wow. You found a long-standing bug in the wctomb conversion
> from UTF-16 to UTF-8.
>
> As you probably know, Unicode values beyond the base plane (that is,
> everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
> are represented as so-called surrogate pairs in UTF-16, two UTF-16
> values in the 0xd800 - 0xdfff range.
>
> While the conversion from UTF-8 f4 8f Bf Bf to UTF-16 dbff dfff
> worked fine, the conversion back to UTF-8 has a subtil bug. There's
> a test for a lone high surrogate pair in the underlying conversion
> function. This tests the next UTF-16 value like this:
>
> if (wchar < 0xdc00 || wchar >= 0xdfff)
> /* Handle lone high surrogate */
>
> Notice the >= 0xdfff? That should have been > 0xdfff. Duh. This
> bug is only a bit over 5 years old...
>
> Fixed in the git repo. I'l regenerate the today's fool..., erm, the
> today's developer snapshot on https://cygwin.com/snapshots/ later today.
Snapshot is up. Please give it a try.
Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin/attachments/20150401/07a81c2f/attachment.sig>
More information about the Cygwin
mailing list