This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Console codepage setting via chcp?

2009/9/23 Corinna Vinschen:
> Right now, if you switch the charset via the setlocale function, you
> also switch the charset used for console output.

Andy wrote:
> That's quite a unique advantage of the Cygwin console actually,
> because it means you always get correct output even if you switch
> charset on the fly.
It might be considered an advantage but the fact that it is unique 
also means it is absolutely not portable.
In the normal Linux/Unix environment, an application that deliberately 
uses setlocale for a switch must be aware that it does NOT switch the 
terminal encoding but does this only for the purpose of specific 
invocations of wide character functions. The same applies to a 
cygwin application running in mintty, xterm, or urxvt. So in order 
to take advantage of this "advantage" the application would have 
to check the environment whether TERM=cygwin - a use case of very 
limited value, I assume.

Also, from Corinna's last statement after the discussion I had raised 
about "codepage after rlogin" or so, my assumption was that the 
setting of ${LC_ALL:-${LC_CTYPE:-$LANG}} (in shell syntax) before 
the first invocation of a cygwin application would determine the 
console encoding for the whole "cygwin session" (whatever that is, 
considering that one might invoke CMD.EXE, change LC_ALL, invoke another 
bash etc.).

Considering that a stable solution should be found, and the 
portability issue, I am not so much in favour of switching terminal 
encoding on-the-fly. esp. not as a side effect of a function that was 
not intended this way. In this sense, I also don't think this would 
be producing "correct output".

> A normal terminal, on the other hand, doesn't actually know what
> charset the app running inside it is using. Hence, for correct output,
> the user has to make sure the terminal and application charsets match,
> or use something like 'luit' to translate between them.
If I had not split off this quote, my elaboration could have been shorter...

Corinna wrote:
>Ã?This is done on the
> grounds that the console isn't capable to switch the console set by
> itself, as it is for terminal emulators like mintty. The problem with
> this approach is even documented in setup2.sgml, just commented out.
> If you use a tool like ssh to connect to a remote machine, then ssh
> uses potentially another locale and charset than the remote shell.
I don't understand this completely; I only hope that "local" and "remote" 
charset remains consistent after this problem had been fixed once, at 
least if you use a cygwin tool for the remote connection. (If you happen 
to use a Windows telnet, you will arrive remotely with the native 
Windows console codepage instead, which is acceptable in the current 
"hybrid" mode of operation as I described it in another mail.)

> ssh is always running in the "C" locale

Andy wrote:
> Are you sure? Shouldn't it be calling 'setlocale(LC_ALL, "")', thereby
> configuring the console output according to the locale variables?
The need to add setlocale to a number of tools that don't need it in 
Linux/Unix because they are simply byte-transparent was discussed before 
and deemed undesirable if I remember correctly.
I'm not sure why this should be needed now again, maybe it's related 
to the Windows file name system not being byte-transparent? If a solution 
can be found that avoids this, much trouble would be prevented, I guess.

Coming to the initial question whether the Windows console codepage 
(as affected by chcp) should be used for cygwin, I certainly vote NO;
this would be a step back behind 1.5, using the obnoxious "OEM" codepages 
by default in many cases. My vote might change to "YES" if by adding 
suitable startup conventions (like putting chcp in cygwin.bat and always 
spawning off a new Windows console to prevent changing the current CMD.EXE 
codepage... :( ), it would be assured that the default codepage would 
always be one of:
* CP1252 (like in 1.5)
* ISO 8859-1 (like in 1.7)
* UTF-8 (as discussed in another mail thread)

I guess people migrating from 1.5 could be convinced of a transition 
to UTF-8 but not of a transition to archaic CP437 or CP850, and 
teaching them to use "chcp" or "setfont" rather than the locale 
mechanism would be both cumbersome and incompatible with a Linux/Unix 

Kind regards,

Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]