More about charsets

Corinna Vinschen corinna-cygwin@cygwin.com
Sat Mar 27 17:25:00 GMT 2010


On Mar 27 16:11, Andy Koppe wrote:
> Corinna Vinschen:
> > while looking into the GB18030 issue once again, I found that we still
> > may have two holes which might be important to support.
> >
> > - GB2312 aka EUC-CN
> >
> >  We already support GBK, codepage 936.  GB2312/EUC-CN is a subset
> >  of GBK and apparently GBK is often used while still labeled as
> >  GB2312.  See the discussion here:
> >  http://www.mail-archive.com/unicode@unicode.org/msg03516.html
> >
> >  So the question is, should we just allow GB2312 and EUC-CN as
> >  codeset names, but use the GBK conversion functions for them?
> 
> Might as well. As you saw, mintty already does that. Thomas Wolff's
> mined goes even further and handles both GB2312 and GBK with its
> GB18030 codec, because GBK is a subset of GB18030.

I think I'll opt for GBK for now, given that GB18030 doesn't exist yet.

> >  Otherwise, there's also a codepage 51936, which is called EUC-CN
> >  in the list at
> >  http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx
> >  I didn't test it, but it appears to be the real GB2312.  I don't
> >  know if it really makes sense to make the difference, though.
> 
> Also, it isn't available on any Windows I've tried.
> 
> 
> > - EUC-TW
> >
> >  There's a codepage 51950 which appears to be something like EUC-TW.
> >  I just found this, though:
> >  http://code.google.com/p/mintty/source/detail?r=738
> >
> >  Andy, is that a general rule?  Or did you test on XP and the codepage
> >  was just not installed, by any chance?
> 
> It doesn't show up as an option on XP, and I've just tried it again on
> Windows 7, where codepages are no longer optional. Doesn't work. I
> think I'd read somewhere that 51950 is only available for .Net
> programs, but unfortunately I can't find that again. I guess it's
> possible that Chinese Windows versions do support it anyway, although
> Wikipedia describes EUC-TW as "rarely used".

If only the MSDN documentation would tell us in which environment
which codepage exists and is usable...

The term "rarely used" is quite fortunate for us.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat



More information about the Cygwin-developers mailing list