[RFC] Refresh iswblank and iswspace (was Re: Update wctype functions to Unicode 5.2?)
Jeff Johnston
jjohnstn@redhat.com
Tue Feb 16 18:49:00 GMT 2010
On 15/02/10 06:23 AM, Corinna Vinschen wrote:
> On Feb 13 15:38, Corinna Vinschen wrote:
>> For a start, here are patches to iswblank and iswspace [...]
>> Ok to apply?
>
> Nope.
>
> I checked against the definition of iswspace and iswblank on Linux and
> the important factor is that spaces and blanks must not be non-breaking
> space characters. That excludes U+2007 and U+202f again. That also
> excludes U+00a0, and, consequentially, when calling iswspace(0xa0) or
> iswblank(0xa0) on Linux (don't try that in the "C" locale!), the
> non-breaking space U+00a0 is no space or blank character. I reverted
> the formatting change as well to keep the patch simple.
>
Ok. Go ahead if you haven't already.
-- Jeff J.
>
> Corinna
>
>
> * libc/ctype/iswblank.c (iswblank): Remove Unicode characters
> U+00A0 and U+200B. Add Unicode character U+180E. Add comment
> to explain how to generate from Unicode data file.
> * libc/ctype/iswspace.c (iswspace): Ditto.
>
>
> Index: libc/ctype/iswblank.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/ctype/iswblank.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 iswblank.c
> --- libc/ctype/iswblank.c 24 Aug 2009 16:59:35 -0000 1.8
> +++ libc/ctype/iswblank.c 15 Feb 2010 11:20:47 -0000
> @@ -67,10 +67,13 @@ _DEFUN(iswblank,(c), wint_t c)
> {
> #ifdef _MB_CAPABLE
> c = _jp2uc (c);
> + /* Based on Unicode 5.2. Control char 09, plus all characters
> + from general category "Zs", which are not marked as decomposition
> + type "noBreak". */
> return (c == 0x0009 || c == 0x0020 ||
> - c == 0x00A0 || c == 0x1680 ||
> + c == 0x1680 || c == 0x180e ||
> (c>= 0x2000&& c<= 0x2006) ||
> - (c>= 0x2008&& c<= 0x200b) ||
> + (c>= 0x2008&& c<= 0x200a) ||
> c == 0x205f || c == 0x3000);
> #else
> return (c< 0x100 ? isblank (c) : 0);
> Index: libc/ctype/iswspace.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/ctype/iswspace.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 iswspace.c
> --- libc/ctype/iswspace.c 24 Aug 2009 16:59:35 -0000 1.8
> +++ libc/ctype/iswspace.c 15 Feb 2010 11:20:47 -0000
> @@ -67,10 +67,13 @@ _DEFUN(iswspace,(c), wint_t c)
> {
> #ifdef _MB_CAPABLE
> c = _jp2uc (c);
> + /* Based on Unicode 5.2. Control chars 09-0D, plus all characters
> + from general category "Zs", which are not marked as decomposition
> + type "noBreak". */
> return ((c>= 0x0009&& c<= 0x000d) || c == 0x0020 ||
> - c == 0x00A0 || c == 0x1680 ||
> + c == 0x1680 || c == 0x180e ||
> (c>= 0x2000&& c<= 0x2006) ||
> - (c>= 0x2008&& c<= 0x200b) ||
> + (c>= 0x2008&& c<= 0x200a) ||
> c == 0x2028 || c == 0x2029 ||
> c == 0x205f || c == 0x3000);
> #else
>
>
More information about the Newlib
mailing list