[RFC] Refresh iswblank and iswspace (was Re: Update wctype functions to Unicode 5.2?)

Jeff Johnston jjohnstn@redhat.com
Tue Feb 16 18:49:00 GMT 2010


On 15/02/10 06:23 AM, Corinna Vinschen wrote:
> On Feb 13 15:38, Corinna Vinschen wrote:
>> For a start, here are patches to iswblank and iswspace [...]
>> Ok to apply?
>
> Nope.
>
> I checked against the definition of iswspace and iswblank on Linux and
> the important factor is that spaces and blanks must not be non-breaking
> space characters.  That excludes U+2007 and U+202f again.  That also
> excludes U+00a0, and, consequentially, when calling iswspace(0xa0) or
> iswblank(0xa0) on Linux (don't try that in the "C" locale!), the
> non-breaking space U+00a0 is no space or blank character.  I reverted
> the formatting change as well to keep the patch simple.
>

Ok.  Go ahead if you haven't already.

-- Jeff J.

>
> Corinna
>
>
>   	* libc/ctype/iswblank.c (iswblank): Remove Unicode characters
> 	U+00A0 and U+200B.  Add Unicode character U+180E.  Add comment
> 	to explain how to generate from Unicode data file.
> 	* libc/ctype/iswspace.c (iswspace): Ditto.
>
>
> Index: libc/ctype/iswblank.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/ctype/iswblank.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 iswblank.c
> --- libc/ctype/iswblank.c	24 Aug 2009 16:59:35 -0000	1.8
> +++ libc/ctype/iswblank.c	15 Feb 2010 11:20:47 -0000
> @@ -67,10 +67,13 @@ _DEFUN(iswblank,(c), wint_t c)
>   {
>   #ifdef _MB_CAPABLE
>     c = _jp2uc (c);
> +  /* Based on Unicode 5.2.  Control char 09, plus all characters
> +     from general category "Zs", which are not marked as decomposition
> +     type "noBreak". */
>     return (c == 0x0009 || c == 0x0020 ||
> -	  c == 0x00A0 || c == 0x1680 ||
> +	  c == 0x1680 || c == 0x180e ||
>   	  (c>= 0x2000&&  c<= 0x2006) ||
> -	  (c>= 0x2008&&  c<= 0x200b) ||
> +	  (c>= 0x2008&&  c<= 0x200a) ||
>   	  c == 0x205f || c == 0x3000);
>   #else
>     return (c<  0x100 ? isblank (c) : 0);
> Index: libc/ctype/iswspace.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/ctype/iswspace.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 iswspace.c
> --- libc/ctype/iswspace.c	24 Aug 2009 16:59:35 -0000	1.8
> +++ libc/ctype/iswspace.c	15 Feb 2010 11:20:47 -0000
> @@ -67,10 +67,13 @@ _DEFUN(iswspace,(c), wint_t c)
>   {
>   #ifdef _MB_CAPABLE
>     c = _jp2uc (c);
> +  /* Based on Unicode 5.2.  Control chars 09-0D, plus all characters
> +     from general category "Zs", which are not marked as decomposition
> +     type "noBreak". */
>     return ((c>= 0x0009&&  c<= 0x000d) || c == 0x0020 ||
> -	  c == 0x00A0 || c == 0x1680 ||
> +	  c == 0x1680 || c == 0x180e ||
>   	  (c>= 0x2000&&  c<= 0x2006) ||
> -	  (c>= 0x2008&&  c<= 0x200b) ||
> +	  (c>= 0x2008&&  c<= 0x200a) ||
>   	  c == 0x2028 || c == 0x2029 ||
>   	  c == 0x205f || c == 0x3000);
>   #else
>
>



More information about the Newlib mailing list