[RFC] Refresh iswblank and iswspace (was Re: Update wctype functions to Unicode 5.2?)

Corinna Vinschen vinschen@redhat.com
Mon Feb 15 11:23:00 GMT 2010


On Feb 13 15:38, Corinna Vinschen wrote:
> For a start, here are patches to iswblank and iswspace [...]
> Ok to apply?

Nope.

I checked against the definition of iswspace and iswblank on Linux and
the important factor is that spaces and blanks must not be non-breaking
space characters.  That excludes U+2007 and U+202f again.  That also
excludes U+00a0, and, consequentially, when calling iswspace(0xa0) or
iswblank(0xa0) on Linux (don't try that in the "C" locale!), the
non-breaking space U+00a0 is no space or blank character.  I reverted
the formatting change as well to keep the patch simple.


Corinna


 	* libc/ctype/iswblank.c (iswblank): Remove Unicode characters
	U+00A0 and U+200B.  Add Unicode character U+180E.  Add comment
	to explain how to generate from Unicode data file.
	* libc/ctype/iswspace.c (iswspace): Ditto.


Index: libc/ctype/iswblank.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/ctype/iswblank.c,v
retrieving revision 1.8
diff -u -p -r1.8 iswblank.c
--- libc/ctype/iswblank.c	24 Aug 2009 16:59:35 -0000	1.8
+++ libc/ctype/iswblank.c	15 Feb 2010 11:20:47 -0000
@@ -67,10 +67,13 @@ _DEFUN(iswblank,(c), wint_t c)
 {
 #ifdef _MB_CAPABLE
   c = _jp2uc (c);
+  /* Based on Unicode 5.2.  Control char 09, plus all characters
+     from general category "Zs", which are not marked as decomposition
+     type "noBreak". */
   return (c == 0x0009 || c == 0x0020 ||
-	  c == 0x00A0 || c == 0x1680 ||
+	  c == 0x1680 || c == 0x180e ||
 	  (c >= 0x2000 && c <= 0x2006) ||
-	  (c >= 0x2008 && c <= 0x200b) ||
+	  (c >= 0x2008 && c <= 0x200a) ||
 	  c == 0x205f || c == 0x3000);
 #else
   return (c < 0x100 ? isblank (c) : 0);
Index: libc/ctype/iswspace.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/ctype/iswspace.c,v
retrieving revision 1.8
diff -u -p -r1.8 iswspace.c
--- libc/ctype/iswspace.c	24 Aug 2009 16:59:35 -0000	1.8
+++ libc/ctype/iswspace.c	15 Feb 2010 11:20:47 -0000
@@ -67,10 +67,13 @@ _DEFUN(iswspace,(c), wint_t c)
 {
 #ifdef _MB_CAPABLE
   c = _jp2uc (c);
+  /* Based on Unicode 5.2.  Control chars 09-0D, plus all characters
+     from general category "Zs", which are not marked as decomposition
+     type "noBreak". */
   return ((c >= 0x0009 && c <= 0x000d) || c == 0x0020 ||
-	  c == 0x00A0 || c == 0x1680 ||
+	  c == 0x1680 || c == 0x180e ||
 	  (c >= 0x2000 && c <= 0x2006) ||
-	  (c >= 0x2008 && c <= 0x200b) ||
+	  (c >= 0x2008 && c <= 0x200a) ||
 	  c == 0x2028 || c == 0x2029 ||
 	  c == 0x205f || c == 0x3000);
 #else


-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat



More information about the Newlib mailing list