default codepage

Corinna Vinschen corinna-cygwin@cygwin.com
Tue Jun 23 15:04:00 GMT 2009


On Jun 23 16:06, Corinna Vinschen wrote:
> On Jun 23 15:45, Thomas Wolff wrote:
> > Corinna Vinschen wrote:
> > > On Jun 22 16:48, Thomas Wolff wrote:
> > > > Since the latest locale-related changes, the default codepage after 
> > > > starting cygwin _without_ explicit setting (of a locale variable) 
> > > > seems to have changed from CP1252 ("Windows ANSI") to ISO 8859-1 ("Latin 1").
> > > > Was this change on purpose?
> > > 
> > > There was no such change at all.  The default codepage is still the
> > > default ANSI codepage on your system.  The internal conversion from
> > > Windows functions to the POSIX multibyte environment and vice versa
> > > uses UTF-8, though, so that all existing filenames have a valid 
> > > representation even when using characters not available in your
> > > current codepage.
> > If I do the following:
> > * Open cmd console window.
> > * Go into cygwin 1.7 directory.
> > * Call cygwin.bat.
> > * In cygwin, "cat" a file with all 8 bit characters from U+20 to U+FF.
> > Then there are no printable characters in the range U+80...U+9F 
> > (the difference between ISO 8859-1 and Windows "Western" CP1252).
> > 
> No.  The difference between UTF-8 and CP1252.  0x80-0x9f are not
> valid codepoints in UTF-8 and the Cygwin console is using UTF-8 by
> default as well.

Hang on, I'm talking nonsense.  The console does not use UTF-8 by
default, rather it just uses ASCII.

I tested this myself and now I understand what you mean.  The console
seems to use ISO-8859-1, but actually it doesn't.  What happens is this:
The console I/O functions are using UTF-16 under the hood, so each
incoming character is converted to Unicode.  The ASCII->Unicode
conversion treats all incoming bytes literally.  Since the Unicode
values from 0x80 to 0xff are derived from the ISO-8859-1 table, you
actually see ISO-8859-1 by default on the console.

So here's the question:  Why is that a problem?  It's just the default
output.  I *can't* use CP1252 as default, because it's only a valid
default on western language versions of Windows.  Rather I would have to
use the defualt ANSI codepage, whatever that is on the machine.
ISO-8859-1 OTOH is the least intrusive default since it allows a
representation on all machines, independent of their default ANSI
codepage.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list