default codepage

Corinna Vinschen corinna-cygwin@cygwin.com
Tue Jun 23 16:10:00 GMT 2009


On Jun 23 17:04, Thomas Wolff wrote:
> > I tested this myself and now I understand what you mean.  The console
> > seems to use ISO-8859-1, but actually it doesn't.  What happens is this:
> > The console I/O functions are using UTF-16 under the hood, so each
> > incoming character is converted to Unicode.  The ASCII->Unicode
> > conversion treats all incoming bytes literally.  Since the Unicode
> > values from 0x80 to 0xff are derived from the ISO-8859-1 table, you
> > actually see ISO-8859-1 by default on the console.
> Understood; which means the effective codepage of the terminal is 
> ISO-8859-1 (by whatever mechanism this is achieved). Maybe wcwidth 
> etc. have a different opinion in this configuration (which I haven't 
> tested) which might however raise additional problems.

wcwidth for the "C" locale returns the standard non-CJK values.  The
return values for wcwidth only depend on language and the @cjknarrow
modifier, not on the charset.

> > So here's the question:  Why is that a problem?  It's just the default
> > output.  I *can't* use CP1252 as default, because it's only a valid
> > default on western language versions of Windows.  Rather I would have to
> > use the defualt ANSI codepage, whatever that is on the machine.
> OK, if that's how it was in 1.5, it would be fine.
> > ISO-8859-1 OTOH is the least intrusive default since it allows a
> > representation on all machines, independent of their default ANSI
> > codepage.
> The new approach is not a problem for me. I was just wondering about 
> compatibility issues and pondering that keeping the 1.5 default might 
> reduce the number of complaints from various users on this mailing list 
> later when 1.7 goes mainstream...

Well, it's just the output codepage.  The behaviour when using the
alternate charset (for box chars) is still the same.  Right now I can't
think of a reason why this should lead to complaints.  I guess I rather
wait and see what exact problems people will get with this.

> But wait - yet here's my question: Why is there a difference between 
> 	bash --login
> and
> 	bash
> - where in the latter case CP1252 (or the default ANSI codepage) 
> *is* still the default?

It's not different for me.  I started bash --login as well as just bash
right from the Windows start menu.  The output is using ISO-8859-1
values (actually Unicode 0x20 - 0xff) in both cases.  If it's really
different for you, it would be helpful if you could debug this to find
out why.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list