"C" UTF-8 trouble

Corinna Vinschen corinna-cygwin@cygwin.com
Thu Oct 8 14:11:00 GMT 2009


On Oct  8 11:07, Corinna Vinschen wrote:
> On Oct  8 06:12, Andy Koppe wrote:
> > 2009/10/7 Andy Koppe:
> > > 2009/10/7 Corinna Vinschen:
> > >> At least, from the above it looks like all uppercase.  The KOI8s would
> > >> be covered by a translation table.
> > >>
> > >> The problem is, we *must* draw a line somewhere.
> > >
> > > I agree, better to just stick with __locale_charset(), unless problems
> > > do arise. FWIW, vim works fine with *.KOI8 locales.
> > 
> > Actually it's not quite right: on seeing "CP20866", vim falls back to
> > iso-8859-1. While this works on the surface, as it's just another
> > 8-bit charset, things like case conversion or detecting word
> > boundaries might be incorrect.
> > 
> > Anyway, here's a fix that doesn't involve a translation table:
> > 
> > * libc/locale/nl_langinfo.c (nl_langinfo): Fall back to
> > __locale_charset only if the current locale does not specify a
> > charset.
> > 
> > --- newlib/libc/locale/nl_langinfo.c    7 Oct 2009 16:45:23 -0000       1.3
> > +++ newlib/libc/locale/nl_langinfo.c    8 Oct 2009 05:00:23 -0000
> > @@ -59,7 +59,11 @@ _DEFUN(nl_langinfo, (item),
> >     switch (item) {
> >         case CODESET:
> >  #ifdef __CYGWIN__
> > -               ret = __locale_charset ();
> > +               s = setlocale(LC_CTYPE, NULL);
> > +               if (s != NULL && (cs = strchr(s, '.')) != NULL)
> > +                       ret = cs + 1;
> > +               else
> > +                       ret = __locale_charset();
> >  #else
> >                 ret = "";
> >                 if ((s = setlocale(LC_CTYPE, NULL)) != NULL) {
> 
> Thanks for the patch.  However, the value returned by setlocale has
> potentially a trailing modifier, as in LANG="ja_JP.UTF-8@cjknarrow"
> If we just return the string after the dot, the codeset is potentially
> wrong.  Either we *do* need the translation table, or we have to
> copy the value into a static buffer and strip the modifier.  This in
> turn requires to implement _nl_langinfo_r.

Maybe for now we should just hardcode tests for CP20866/CP21866
and return the appropriate "KOI8-R/U" string...


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat



More information about the Cygwin-developers mailing list