This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: KOI8

On Aug 20 21:43, Andy Koppe wrote:
> One fairly important character encoding not yet supported by Cygwin
> 1.7 is KOI8. Well, two actually, because there are slightly different
> versions for Russian and Ukrainian: KOI8-R and KOI8-U, aka Windows
> codepages 20866 and 21866. Apparently they're de-facto standards for
> Unix machines and the  in the former Soviet Union. (Windows uses
> CP1251, whereas ISO-8859-5 (Cyrillic) never caught on.)
> Cygwin's Midnight Commander actually uses KOI8 if the locale is set to
> "ru" or "uk", even if a charset is specified explicitly, e.g.
> "ru.CP1251". Hence you get gibberish where a helpful hint in the
> user's language should be. (Of course that's primarily a shortcoming
> in mc.)
> Anyway, to help support them, the attached patch adds the KOI8
> charsets to newlib's Unicode conversion and ctype tables. I took the
> conversion tables from iconv and adapted the ctype tables from the
> CP1251 version. Since KOI8 has printable characters in the C1 range
> from 0x80 to 0x9F, it seems easiest to treat them as Windows
> codepages.
> To complete support, "KOI8-R" and "KOI8-U" would need to be recognised
> in _setlocale_r and mapped to codepages 20866 and 21866.

I'd suggest to add the missing code to loadlocale()  (the internally
used charset should be set to "CP20866"/"CP21866", but it seems you know
this already) and send the entire patch, together with a ChangeLog
entry, to the newlib list.  If you could base it on my pending proposal
to make the charset case insensitive, that would be great.

This patch also requires a minor patch to Cygwin, which can be applied
as ovious after the newlib change has gone in:

RCS file: /cvs/src/src/winsup/cygwin/,v
retrieving revision 1.33
diff -u -p -r1.33
---	30 Jun 2009 21:18:43 -0000	1.33
+++	21 Aug 2009 07:48:19 -0000
@@ -339,6 +339,8 @@ __set_charset_from_codepage (UINT cp, ch
     case 1256:
     case 1257:
     case 1258:
+    case 20866:
+    case 21866:
       __small_sprintf (charset, "CP%u", cp);
       return __cp_mbtowc;
     case 28591:


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]