This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFC] Refresh iswblank and iswspace (was Re: Update wctype functions to Unicode 5.2?)


Both C99 and POSIX say that blank is only space and horizontal tab in
the C/POSIX locale.  So strictly speaking, it would appear that a
locale check would be needed.  (On the other hand, it could possibly be
considered a mistake in the standards to make such a statement,
ignoring any definitions provided with the extended character set.)
 
This observation, however, is not directly related to the patch under
consideration, but is pointing to an existing flaw.

Craig
 
See C99 7.25.2.1.3 The iswblank function, and
http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html,
under "blank".

-----Original Message-----
From: newlib-owner@sourceware.org [mailto:newlib-owner@sourceware.org]
On Behalf Of Corinna Vinschen
Sent: Monday, February 15, 2010 6:24 AM
To: newlib@sourceware.org
Subject: Re: [RFC] Refresh iswblank and iswspace (was Re: Update wctype
functions to Unicode 5.2?)

On Feb 13 15:38, Corinna Vinschen wrote:
> For a start, here are patches to iswblank and iswspace [...]
> Ok to apply?

Nope.

I checked against the definition of iswspace and iswblank on Linux and
the important factor is that spaces and blanks must not be non-breaking
space characters.  That excludes U+2007 and U+202f again.  That also
excludes U+00a0, and, consequentially, when calling iswspace(0xa0) or
iswblank(0xa0) on Linux (don't try that in the "C" locale!), the
non-breaking space U+00a0 is no space or blank character.  I reverted
the formatting change as well to keep the patch simple.


Corinna


 	* libc/ctype/iswblank.c (iswblank): Remove Unicode characters
	U+00A0 and U+200B.  Add Unicode character U+180E.  Add comment
	to explain how to generate from Unicode data file.
	* libc/ctype/iswspace.c (iswspace): Ditto.


Index: libc/ctype/iswblank.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/ctype/iswblank.c,v
retrieving revision 1.8
diff -u -p -r1.8 iswblank.c
--- libc/ctype/iswblank.c	24 Aug 2009 16:59:35 -0000	1.8
+++ libc/ctype/iswblank.c	15 Feb 2010 11:20:47 -0000
@@ -67,10 +67,13 @@ _DEFUN(iswblank,(c), wint_t c)
 {
 #ifdef _MB_CAPABLE
   c = _jp2uc (c);
+  /* Based on Unicode 5.2.  Control char 09, plus all characters
+     from general category "Zs", which are not marked as decomposition
+     type "noBreak". */
   return (c == 0x0009 || c == 0x0020 ||
-	  c == 0x00A0 || c == 0x1680 ||
+	  c == 0x1680 || c == 0x180e ||
 	  (c >= 0x2000 && c <= 0x2006) ||
-	  (c >= 0x2008 && c <= 0x200b) ||
+	  (c >= 0x2008 && c <= 0x200a) ||
 	  c == 0x205f || c == 0x3000);
 #else
   return (c < 0x100 ? isblank (c) : 0);
Index: libc/ctype/iswspace.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/ctype/iswspace.c,v
retrieving revision 1.8
diff -u -p -r1.8 iswspace.c
--- libc/ctype/iswspace.c	24 Aug 2009 16:59:35 -0000	1.8
+++ libc/ctype/iswspace.c	15 Feb 2010 11:20:47 -0000
@@ -67,10 +67,13 @@ _DEFUN(iswspace,(c), wint_t c)
 {
 #ifdef _MB_CAPABLE
   c = _jp2uc (c);
+  /* Based on Unicode 5.2.  Control chars 09-0D, plus all characters
+     from general category "Zs", which are not marked as decomposition
+     type "noBreak". */
   return ((c >= 0x0009 && c <= 0x000d) || c == 0x0020 ||
-	  c == 0x00A0 || c == 0x1680 ||
+	  c == 0x1680 || c == 0x180e ||
 	  (c >= 0x2000 && c <= 0x2006) ||
-	  (c >= 0x2008 && c <= 0x200b) ||
+	  (c >= 0x2008 && c <= 0x200a) ||
 	  c == 0x2028 || c == 0x2029 ||
 	  c == 0x205f || c == 0x3000);
 #else


-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]