This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Locales with wrong umlauts

On Tue, 28 Mar 2006, Lapo Luchini wrote:

> Igor Peshansky wrote:
> > The system has no idea what charset it's using, because it depends on the
> > font you set for your terminal, which is outside of the terminal's
> > control.  Even if you use a Unicode font with charset conversion, the
> > charset is specified outside of the console.
> Oh? I had no idea about that.
> Then the "Arial" distributed with latin1-like CP1252 areas (most western
> europe) is a different font that the "Arial" used in eastern europe
> (CP1250 AFAIR?) or the "Arilal" used for cyrillic-using places (CP1251?)?

Nope, the font is probably the same (Unicode/UCS-2), but the encoding
vector is specified in the properties of each terminal window, and thus
not set globally.  That said, there may be a system-default encoding (in
the language preferences) that can be used as a good guess for the output
encoding of filenames as converted to 8-bit from UCS-2.  In particular, my
Windows is set to accept Russian as one of its primary locales (the main
one being en_US), and thus my non-English filenames are rendered in the
CP1251 encoding (as is evident from xterms trying to display them using a
latin1-encoded font).

> Anyway, regarding file names, I don't think it is correct to say that
> the name depends on the font: the "correct" name depends on the system
> default codepage (or, well, since I guess underneath in now uses Unicode
> let's say "the codepage used for retro-compatibility in the non-unicode
> system calls").

Yep, except I would even say "the correct *rendering* of the name depends
on the default codepage".  The name doesn't change if you change the

> If I have a filename with accents I want "ls" to show it "just like
> Explorer", at least by default, with no explicit override on my part
> using .Xdefaults or "rxvt -fn".

Windows terminals use the above system-default encoding.  IIRC, xterm and
rxvt use latin1 by default.

> OK, maybe I prefer to use a CP850-font like LucidaP because I want to
> see line-drawings in "mc" and thus every accent will be messed up, but
> that's another matter 0=)

So, in this case, the encoding vector is part of the font.  And no Windows
API call will identify this vector for you so that OUTPUT_CHARSET can be
set in the terminal...

> > Is there any way to tell mv, rm &co to display non-ASCII characters in
> > filenames?  I know this isn't Cygwin-specific, but I'm not even sure what
> > to Google for.
> Ohh, us poor non-ASCII-using people, don't you know it is just plain
> wrong to use "strange accents" in filenames? Even more "wrong" starting
> a filename with a dot or (what horror) using an extension more than 3
> chars long! (just kidding ^_^)

Yes.  Languages with different alphabets have a long history of
transliteration on the Internet, specifically because i18n became
widespread not too long ago (relatively speaking, of course).

>     Lapo
> PS:
> don't we blame Cygwin too much, many Windows apps has problems with
> unicode. E.g. if I create a folder name with japanese characters in it,
> most applications are not even able to save a file in it.

I'm not blaming Cygwin.  If anything, I'm blaming newlib...  J/K. :-)
      |\      _,,,---,,_ |
ZZZzz /,`.-'`'    -.  ;-;;,_		Igor Peshansky, Ph.D. (name changed!)
     |,4-  ) )-,_. ,\ (  `'-'		old name: Igor Pechtchanski
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

"Las! je suis sot... -Mais non, tu ne l'es pas, puisque tu t'en rends compte."
"But no -- you are no fool; you call yourself a fool, there's proof enough in
that!" -- Rostand, "Cyrano de Bergerac"

Unsubscribe info:
Problem reports:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]