KOI8 character sets

Jeff Johnston jjohnstn@redhat.com
Tue Aug 25 16:58:00 GMT 2009


Andy's patch checked in with Corinna's documentation change as well.  I 
changed the documentation
content to just state <<EUCJP>> and not <<EUCJP>>/<<eucJP>> and later on 
I used EUCJP and eucJP
as an example of case-insensitivity rather than UTF-8.  This made it 
easier to apply the doc patch and it made it clear that eucJP was still 
valid.

-- Jeff J.

Corinna Vinschen wrote:
> On Aug 22 17:20, Andy Koppe wrote:
>   
>> The attached patch adds support for the KOI8-R and KOI8-U character
>> sets. These are the de-facto standard character sets on Unix machines
>> and the Net in Russia, Ukraine, and other ex-Soviet states.
>> (ISO-8859-5, designed for all Cyrillic scripts, apparently never found
>> much acceptance.)
>>
>> Under Windows they are known as codepages 20866 and 21866. Since they
>> are single-byte encodings with printable characters in the C1 range
>> from 0x80 to 0x9F, it seems best to handle them like DOS/Windows
>> codepages. The conversion tables were adapted from the iconv ones.
>>
>> Tested on Cygwin 1.7.
>>
>> ChangeLog:
>>
>> 2009-08-22  Corinna Vinschen  <corinna@vinschen.de>
>>         * libc/locale/locale.c (loadlocale): Map "KOI8-R" and "KOI8-U" to
>>         CP20866 and CP21866.
>>
>> 2009-08-22  Andy Koppe  <andy.koppe@gmail.com>
>>         * libc/stdlib/sb_charsets.c (__cp_conv): Add KOI8-R (Russian, CP20866)
>>         and KOI8-U (Ukrainian, CP21866) to Windows codepage conversion tables.
>>         * libc/ctype/ctype_cp.h (__ctype_cp): Likewise for ctype tables.
>>     
>
> The documentation in libc/locale/locale.c should note the KOI8 charsets
> as well:
>
> Index: libc/locale/locale.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/locale/locale.c,v
> retrieving revision 1.23
> diff -u -p -r1.23 locale.c
> --- libc/locale/locale.c	21 Aug 2009 20:56:13 -0000	1.23
> +++ libc/locale/locale.c	24 Aug 2009 19:57:42 -0000
> @@ -54,20 +54,21 @@ the form
>  <<"language">> is a two character string per ISO 639.  <<"TERRITORY">> is a
>  country code per ISO 3166.  For <<"charset">> and <<"modifier">> see below.
>  
> -Additionally to the POSIX specifier, five extensions are supported for
> +Additionally to the POSIX specifier, seven extensions are supported for
>  backward compatibility with older implementations using newlib:
> -<<"C-UTF-8">>, <<"C-JIS">>, <<"C-EUCJP">>/<<"C-eucJP">>, <<"C-SJIS">>,
> -<<"C-ISO-8859-x">> with 1 <= x <= 15, or <<"C-CPxxx">> with xxx in [437,
> -720, 737, 775, 850, 852, 855, 857, 858, 862, 866, 874, 1125, 1250, 1251,
> -1252, 1253, 1254, 1255, 1256, 1257, 1258].
> +<<"C-UTF-8">>, <<"C-JIS">>, <<"C-eucJP">>, <<"C-SJIS">>, <<C-KOI8-R>>,
> +<<C-KOI8-U>>, <<"C-ISO-8859-x">> with 1 <= x <= 15, or <<"C-CPxxx">> with
> +xxx in [437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866, 874, 1125,
> +1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258].
>  
>  Even when using POSIX locale strings, the only charsets allowed are
> -<<"UTF-8">>, <<"JIS">>, <<"EUCJP">>/<<"eucJP">>, <<"SJIS">>, <<"ISO-8859-x">>
> -with 1 <= x <= 15, or <<"CPxxx">> with xxx in [437, 720, 737, 775, 850,
> -852, 855, 857, 858, 862, 866, 874, 1125, 1250, 1251, 1252, 1253, 1254,
> -1255, 1256, 1257, 1258].  Charsets are case insensitive.  For instance,
> -<<"UTF-8">> and <<"utf-8">> are equivalent.  <<"UTF-8">> can also be
> -written without dash, as in <<"UTF8">> or <<"utf8">>.
> +<<"UTF-8">>, <<"JIS">>, <<"eucJP">>, <<"SJIS">>, <<KOI8-R>>, <<KOI8-U>>,
> +<<"ISO-8859-x">> with 1 <= x <= 15, or <<"CPxxx">> with xxx in
> +[437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866, 874, 1125, 1250,
> +1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258].
> +Charsets are case insensitive.  For instance, <<"UTF-8">> and <<"utf-8">>
> +are equivalent.  <<"UTF-8">> can also be written without dash, as in
> +<<"UTF8">> or <<"utf8">>.
>  
>  (<<"">> is also accepted; if given, the settings are read from the
>  corresponding LC_* environment variables and $LANG according to POSIX rules.
>
>
> Corinna
>
>   



More information about the Newlib mailing list