This is the mail archive of the
mailing list for the Cygwin project.
Re: Encoding of German 'umlauts' - please explain
- From: Thomas Wolff <towo at towo dot net>
- To: cygwin at cygwin dot com
- Date: Thu, 24 Sep 2009 17:31:23 +0200 (CEST)
- Subject: Re: Encoding of German 'umlauts' - please explain
- References: <loom.20090924T100848email@example.com>
Ronald Fischer wrote:
> Maybe someone could enlighten me about the following:
> That means, the German letter ü has encoding 0xFC. If I do the same on CMD shell
> (the 'od' used here comes from the Gnu Utilities for Windows), I see:
> That is, ü is encoded as 0x81. Why is this different?
> I am aware that, for historic reason, different encodings exist (the old
> DOS encoding, Windows ANSI encoding etc.).
So you answered your question yourself :)
> I wouldn't have expected those
> differences, however, when comparing bash.exe vs. cmd.exe.
The encoding is applied by the terminal, not the application. For bash,
the letter ü is only a sequence of one or two bytes, while the terminal
decides which bytes your keyboard sends to the application when you enter
ü, and what to display when your program outputs those bytes (i.e.,
traditionally, while in the age of locales things may sometimes get more
complicated :( ).
Having said this, I also need to adjust the following response:
Matthias Andree wrote:
> Because the code pages differ. 0xFC is ISO-8859-1 ("Latin 1") or -15 ("Latin 9")
> or CP1252/Windows-1252 (Latin 1 Extended; the latter allocates 0x80...0x9f
> differently than ISO-8859-1) and CMD uses CP437 or CP850.
This is not really correct; like bash, CMD does not use a codepage itself.
If you start CMD from Windows, it will implicitly be embedded in a Windows
console which uses CP437 (American), CP850 (Western European) or some other
default of your system configuration.
However, you could also run CMD from a cygwin bash. In this case, maximising
the confusion, there are two different situations:
* Run mintty, start CMD from bash there: CMD will see the same codepage as
bash since it is the one configured for mintty. So echo ü would produce
0xFC even in CMD (assuming mintty runs one of the codepages which map
ü to 0xFC).
* Run cygwin console, observe this: Since the cygwin console is a hybrid as
the encoding is emulated by the cygwin dll within a Windows console, unlike
all other terminals, the effective "codepage" varies with the application:
A cygwin application will use the encoding configured for the cygwin session,
while any non-cygwin application will use the native Windows console codepage.
So you may echo ü from bash, then start CMD from there, echo ü again, and will
get different codes for the same key!
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple