[PATCH/RFA] Fix ctype table and isblank

Corinna Vinschen vinschen@redhat.com
Wed Apr 8 14:58:00 GMT 2009


On Apr  7 17:54, Wizards' Guild wrote:
> Corinna,
> 
> One of the things that happened here is the addition of _B to the tab
> character in the default ctype table. (Oddly, it has NOT been added
> throughout the new NLS code pages.)

It has.  The TAB character is part of _CTYPE_DATA_0_127 which in turn
is part of all new ctype tables.

If you're looking for _B in the new tables, you'll see it used for the
NBSP character in ISO-8859-x and CP12xx (0xa0), is well as for NBSP in
OEM codepages (0xff).

>  This was done so that isblank (and
> by proxy, the wrapper form of iswblank) could be implemented as a flag
> test rather than as a hardcoded { 0x09 0x20 } test. The PROBLEM is
> that _B traditionally exists for the sole benefit of isprint. So, if
> _B is used such that
> 
> #define isblank(c)      ((__ctype_ptr__)[(unsigned)((c)+1)]&_B)
> 
> returns the correct thing, it is guaranteed that
> 
> #define isprint(c)      ((__ctype_ptr__)[(unsigned)((c)+1)]&(_P|_U|_L|_N|_B))
> 
> returns nonzero for tab characters, which is wrong.
> 
> I don't see any clever way to reconcile this behavior; we are out of
> bits in the ctype table.

Yes, that's most unfortunate.  On other systems the table is wider
than 8 bit so they can have own flags for alpha, graph and printable.
Right now, all alpha characters in the extended character class tables 
are either uppercase or lowercase.  Even for languages with characters
which don't know upper/lowercase, for which I used _L in the absence
of a bit for "alpha"(*)

>  For "C" locale, the old hardcoded test is
> guaranteed to work. For anything else, maybe reverse the wrapper and
> resort to the full blown iswblank check?

We could also remove _B from the TAB character again and
change the isblank test to

  #define isblank(c)      (((__ctype_ptr__)[(unsigned)((c)+1)]&_B) \
                           || ((c) == '\t'))

Same for the isblank function.  Jeff?


Corinna

(*) What we could remove is the _X flag, which could easily be
    hardcoded.  That would give us space for a single new flag.
    I would vote for "alpha", but "printable" is almost as important.

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat



More information about the Newlib mailing list