More about charsets

Corinna Vinschen corinna-cygwin@cygwin.com
Sat Mar 27 17:53:00 GMT 2010


On Mar 27 18:24, Corinna Vinschen wrote:
> On Mar 27 16:11, Andy Koppe wrote:
> > Corinna Vinschen:
> > > while looking into the GB18030 issue once again, I found that we still
> > > may have two holes which might be important to support.
> > >
> > > - GB2312 aka EUC-CN
> > >
> > >  We already support GBK, codepage 936.  GB2312/EUC-CN is a subset
> > >  of GBK and apparently GBK is often used while still labeled as
> > >  GB2312.  See the discussion here:
> > >  http://www.mail-archive.com/unicode@unicode.org/msg03516.html
> > >
> > >  So the question is, should we just allow GB2312 and EUC-CN as
> > >  codeset names, but use the GBK conversion functions for them?
> > 
> > Might as well. As you saw, mintty already does that. Thomas Wolff's
> > mined goes even further and handles both GB2312 and GBK with its
> > GB18030 codec, because GBK is a subset of GB18030.
> 
> I think I'll opt for GBK for now, given that GB18030 doesn't exist yet.

I also intend to make GB2312 the default name, rather than GBK since
that's the default for these languages in Linux.

Btw., apart from EUC-TW, what's missing as well is BIG5-HKSCS.  I read
http://en.wikipedia.org/wiki/HKSCS and the Windows specific section,
but I'm still puzzled how this is supposed to work.  Does Vista's
codepage 950 contain the HKSCS elements or not?!?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat



More information about the Cygwin-developers mailing list