This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Fwd: [1.7] wcwidth failing configure tests]

Forwarded to newlib.

----- Forwarded message from Eric Blake -----
> Date: Tue, 12 May 2009 16:02:04 +0000 (UTC)
> From: Eric Blake
> Subject:  [1.7] wcwidth failing configure tests
> To: cygwin AT cygwin DOT com
> I noticed this failure in various configure scripts (findutils, coreutils, ...):
> checking whether wcwidth works reasonably in UTF-8 locales... no
> I've reduced it to a STC:
> #include <locale.h>
> #include <wchar.h>
> int main ()
> {
>   int i = 0;
>   if (setlocale (LC_ALL, "fr_FR.UTF-8") != NULL)
>     {
>       if (wcwidth (0x0301) > 0)
>         i |= 1;
>       if (wcwidth (0x200B) > 0)
>         i |= 2;
>     }
>   return i;
> }
> The return value should be 0 but is coming back as 3; 0x0301 is a combining 
> mark which should occupy no space on its own, and 0x200b is a 0-width space, 
> according to Unicode 5.1 (and earlier, to some extent).  And that probably 
> means that other places within wcwidth() are broken.
----- End forwarded message -----

wcwidth returns 1 if iswprint returns true.  I had a quick debug attempt
and it turns out that the entire range 0x0300..0x034f is marked as
printable in the u3 array in libc/ctype/utf8print.h.  The entire range
0x0300..0x034f are combining characters which are printable, but have
zero width.

200b..200d are all three zero-width characters but all three are also

Scanning the Unicode 5.1 standard, I see a couple of these characters,
which are printable but have zero width:

fe20..fe23 (not sure about them.  Each of them is the half of a full combined
	    char which doesn't make sense alone, afaics)
and a couple of musical symbols in the 0x1d1xx range

How can we fix this problem?  Should we hardcode a check for the above
character values in wcwidth?

And here's another question.  The utf8*.h files claim they have been
generated from the unicode.txt file of the Unicode 3.2 standard.  Do we
have the script which generated the utf8*.h files?  Can we regenerate
the files to match the current Unicode 5.1 standard?


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

Unsubscribe info:
Problem reports:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]