This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: CYGWIN=codepage? Or LC_CTYPE=foo?
On Apr 7 01:04, Kazuhiro Fujieda wrote:
> >>> On Sun, 06 Apr 2008 16:39:43 +0200
> >>> Corinna Vinschen <corinna-cygwin@cygwin.com> said:
>
> > Shouldn't the (default) setting of LANG, LC_CTYPE and friends be based
> > on what the underlying OS is set to? Microsoft maintains a table which
> > defines the relationship between the locale identifier used internally
> > (LCID), the "Culture name" (what's used by POSIX) and the attached
> > codepage. The list is here:
> >
> > http://www.microsoft.com/globaldev/nlsweb/default.mspx
>
> There are several culture names not conforming to the convention
> of locale names, for example, "gsw-FR", "az-Cyrl-AZ", "zh-Hant",
> and so on. I wrote the table between locale names and LCIDs for
> my implementation of setlocale.
I checked the return values from GetLocaleInfo. Apparently the strings
returned by GetLocaleInfo(LOCALE_SISO639LANGNAME) and
GetLocaleInfo(LOCALE_SISO3166LANGNAME) match what you would expect.
For instance, az-Cyrl-AZ and az-Latn-AZ both return the ISO639 code az
and the ISO3166 code AZ. They differ in the returned ANSI codepage,
1254 or 1251.
zh-CN and zh-Hans both return ISO639 zh and ISO3166 CN, zh-TW and
zh-Hant both return zh and TW. In both cases the difference is just the
returned ANSI codepage.
I don't know if gsw-FR is correct though. That's apparently an ISO639-2
code. But it looks like there is no ISO639 code covering that. What
would POSIX do?
> It isn't preferable such large table resides in the DLL. I will
> simplify the implementation. I will make setlocale check the
> codeset part only and accept the default locale or the C locale.
Fine with me, but given my above mentioned results, doesn't it make
sense to use the OS for that, too?
Oh, btw., from what I read in SUSv3, the "C" locale is actually just
the old name for what's today called "POSIX" locale. They are both
equivalent, but POSIX requires that a POSIX conformant system
understands both. I guess that's no big problem.
> > Or, we check if LANG/LC_CTYPE is set and only set the codepage according
> > to the setting of these variables. Otherwise we just use the default
> > ANSI codepage.
>
> This approach is preferable. I think setting $LANG is overkill.
You're right, of course.
Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat