More about charsets
Sat Mar 27 17:53:00 GMT 2010
On Mar 27 18:24, Corinna Vinschen wrote:
> On Mar 27 16:11, Andy Koppe wrote:
> > Corinna Vinschen:
> > > while looking into the GB18030 issue once again, I found that we still
> > > may have two holes which might be important to support.
> > >
> > > - GB2312 aka EUC-CN
> > >
> > > Â We already support GBK, codepage 936. Â GB2312/EUC-CN is a subset
> > > Â of GBK and apparently GBK is often used while still labeled as
> > > Â GB2312. Â See the discussion here:
> > > Â http://firstname.lastname@example.org/msg03516.html
> > >
> > > Â So the question is, should we just allow GB2312 and EUC-CN as
> > > Â codeset names, but use the GBK conversion functions for them?
> > Might as well. As you saw, mintty already does that. Thomas Wolff's
> > mined goes even further and handles both GB2312 and GBK with its
> > GB18030 codec, because GBK is a subset of GB18030.
> I think I'll opt for GBK for now, given that GB18030 doesn't exist yet.
I also intend to make GB2312 the default name, rather than GBK since
that's the default for these languages in Linux.
Btw., apart from EUC-TW, what's missing as well is BIG5-HKSCS. I read
http://en.wikipedia.org/wiki/HKSCS and the Windows specific section,
but I'm still puzzled how this is supposed to work. Does Vista's
codepage 950 contain the HKSCS elements or not?!?
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
More information about the Cygwin-developers