X11R7.5 and C.UTF-8

Eric Blake ebb9@byu.net
Fri Dec 4 04:30:00 GMT 2009

Thomas Dickey <dickey <at> his.com> writes:

> > This means that characters 0..127 have to be treated as ASCII, but

No, it means that portable characters and control characters must be < 128.  
ASCII meets this characteristic, but so does EBCDIC, as well as UTF-8.  The C 
locale also implies that you can manipulate bytes >= 128 in the naive manner, 
so long as you don't care about characters embedded in those bytes.  And what 
do you know - ASCII, EBCDIC, and UTF-8 all meet this property, too.

> > beyond that an implementation can do what it wants. And on Cygwin 1.7,
> > plain "C" actually does imply UTF-8, which happily is
> > backward-compatible with ASCII.
> That's an interpretation that so far hasn't been blessed by the standards
> people.  Any discussion of this topic should mention that, as a caveat.

Actually, the standards people HAVE spoken - and they agreed with our 
interpretation.  POSIX was INTENTIONALLY written with the intent that a UTF-8 
encoding is valid for the C locale, for the same reason that it was written 
that an EBCDIC encoding is valid for the C locale.  These emails from the 
Austin Group (the folks that write POSIX) are telling:



But they also admitted that there is still more work needed in POSIX to make 
this intent clearly codified (for example, that control characters must be 
single bytes < 128).

Eric Blake

Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://x.cygwin.com/docs/
FAQ:                   http://x.cygwin.com/docs/faq/

More information about the Cygwin-xfree mailing list