"C" character set (again)

Eric Blake ebb9@byu.net
Tue Dec 29 13:25:00 GMT 2009

According to Andy Koppe on 12/28/2009 11:54 PM:
> Following the "printf treats differently a string constant and a
> character array" issue at
> http://cygwin.com/ml/cygwin/2009-12/msg01009.html, I'm wondering again
> whether the "C" locale shouldn't go back to using ASCII rather than
> UTF-8, to avoid surprises like that and also to fit with many people's
> expectation that "C" means ASCII. I think that would save us a bunch
> of trouble and pointless legal/religious discussions about the C
> locale.

Bytes with the 8th bit set are not portable in the C locale, regardless of
whether that locale uses ASCII or UTF-8 encoding.  Yes, we will have to
field complaints from users with non-portable programs.  But I don't think
we have to change back to ASCII - we are doing those users a service by
making them fix their portability bugs.

On the other hand, I wonder if it may be possible to special case the
C.UTF-8 locale to treat invalid byte sequences as pseudo-characters, such
that we can achieve 8-bit transparency in character contexts such as
printf rather than failing with EILSEQ.  But such special-casing should be
reserved for C.UTF-8; locales like en_US.UTF-8 should still fail with
EILSEQ on invalid sequences.

Don't work too hard, make some time for fun as well!

Eric Blake             ebb9@byu.net

More information about the Cygwin-developers mailing list