[RFC] Refresh iswblank and iswspace (was Re: Update wctype functions to Unicode 5.2?)
Corinna Vinschen
vinschen@redhat.com
Mon Feb 15 11:23:00 GMT 2010
On Feb 13 15:38, Corinna Vinschen wrote:
> For a start, here are patches to iswblank and iswspace [...]
> Ok to apply?
Nope.
I checked against the definition of iswspace and iswblank on Linux and
the important factor is that spaces and blanks must not be non-breaking
space characters. That excludes U+2007 and U+202f again. That also
excludes U+00a0, and, consequentially, when calling iswspace(0xa0) or
iswblank(0xa0) on Linux (don't try that in the "C" locale!), the
non-breaking space U+00a0 is no space or blank character. I reverted
the formatting change as well to keep the patch simple.
Corinna
* libc/ctype/iswblank.c (iswblank): Remove Unicode characters
U+00A0 and U+200B. Add Unicode character U+180E. Add comment
to explain how to generate from Unicode data file.
* libc/ctype/iswspace.c (iswspace): Ditto.
Index: libc/ctype/iswblank.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/ctype/iswblank.c,v
retrieving revision 1.8
diff -u -p -r1.8 iswblank.c
--- libc/ctype/iswblank.c 24 Aug 2009 16:59:35 -0000 1.8
+++ libc/ctype/iswblank.c 15 Feb 2010 11:20:47 -0000
@@ -67,10 +67,13 @@ _DEFUN(iswblank,(c), wint_t c)
{
#ifdef _MB_CAPABLE
c = _jp2uc (c);
+ /* Based on Unicode 5.2. Control char 09, plus all characters
+ from general category "Zs", which are not marked as decomposition
+ type "noBreak". */
return (c == 0x0009 || c == 0x0020 ||
- c == 0x00A0 || c == 0x1680 ||
+ c == 0x1680 || c == 0x180e ||
(c >= 0x2000 && c <= 0x2006) ||
- (c >= 0x2008 && c <= 0x200b) ||
+ (c >= 0x2008 && c <= 0x200a) ||
c == 0x205f || c == 0x3000);
#else
return (c < 0x100 ? isblank (c) : 0);
Index: libc/ctype/iswspace.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/ctype/iswspace.c,v
retrieving revision 1.8
diff -u -p -r1.8 iswspace.c
--- libc/ctype/iswspace.c 24 Aug 2009 16:59:35 -0000 1.8
+++ libc/ctype/iswspace.c 15 Feb 2010 11:20:47 -0000
@@ -67,10 +67,13 @@ _DEFUN(iswspace,(c), wint_t c)
{
#ifdef _MB_CAPABLE
c = _jp2uc (c);
+ /* Based on Unicode 5.2. Control chars 09-0D, plus all characters
+ from general category "Zs", which are not marked as decomposition
+ type "noBreak". */
return ((c >= 0x0009 && c <= 0x000d) || c == 0x0020 ||
- c == 0x00A0 || c == 0x1680 ||
+ c == 0x1680 || c == 0x180e ||
(c >= 0x2000 && c <= 0x2006) ||
- (c >= 0x2008 && c <= 0x200b) ||
+ (c >= 0x2008 && c <= 0x200a) ||
c == 0x2028 || c == 0x2029 ||
c == 0x205f || c == 0x3000);
#else
--
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat
More information about the Newlib
mailing list