[PATCH/RFA] Fix ctype table and isblank

Corinna Vinschen vinschen@redhat.com
Wed Apr 8 18:11:00 GMT 2009


On Apr  8 09:13, Wizards' Guild wrote:
> Yes, we really should have an "alpha" flag AND a "blank" flag with the
> new semantics.

An "alpha" and "printable" flag.  We already habe "blank" so I don't
understand what you're trying to accomplish with a new one.

> The obvious and common approach would be to widen the
> table entries. I'm not a big fan of this because it bloats the
> small-footprint systems. This is maybe why it hasn't been done
> already?

Keep in mind that until a couple of days ago newlib didn't even *have*
extended charset support.  The problems we're discussing is a direct
result of having more than just ASCII tables.

The problem are smaller targets.  Maybe it would be a feasible approach
to stick to one single 8 bit table (for ASCII) in case of small targets
and to provide everything as 16 bit tables for targets which don't care
the few extra K bytes.

And widening the tables introduces a new problem for Cygwin, which has
to keep some of the "old" stuff to maintain backward compatibility with
applications built under an older version.  It would have to maintain
the old ASCII ctype table, and copy over the data
from the new tables in a more tricky way; we would have to keep the
meaning of the current bit values and only extend flags to the upper
bytes so that Cygwin can deal with that for older apps.  New apps would
immediately profit from the new tables, of course.

However, *if* we really do that, now would be the time.  Cygwin is on
the verge of a new major release and since the extended ctype support is
only available starting with this new release, we could use the
opportunity to widen the character class tables now.

> Another approach would be to keep the "C" locale table and macros, but
> if extended charsets are supported just convert everything to UNICODE
> and handle it there.

Oh, please no...

> Some functions, such as tolower, are already
> using this approach.

Only for chars > 0x80.  But that was meant as a temporary solution.

>  I don't care for the "half hardcoded" variation of
> isblank; seems like an accident waiting to happen.

It's exactly what we need for an *immediate* fix.  _B covers all types of
spaces (SPC, NBSP), and the TAB goes extra so as not to be catched by
isprint().

> Both _N and _X are locale-invariant, making them good candidates for
> removal from the ctype table. If we wanted to recover ONE flag, I'd
> take _N rather than _X. In this case add _X to the digits and use _X
> in isprint, isgraph, and isalnum. It is possible to implement isdigit
> as a standard macro with single evaluation:
> 
> #define	isdigit(c) ((unsigned)((c)-'0')<=9)
> 
> Hard choices...

I don't think we have much of a choice.  Either we stick to the current
approach and my isblank() fix, or we widen the tables.


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat



More information about the Newlib mailing list