Re: "C" UTF-8 trouble

On Oct 7 11:08, Andy Koppe wrote:
2009/10/7 Corinna Vinschen:
Urgh. So we have to change nl_langinfo in newlib as well. Do we have
to return "US-ASCII" if charset is "ASCII", or is it sufficient to
return __locale_charset() as you did, thus returning "ASCII" for "ASCII"?
I'd assume so, but WWLD?

=== #include <stdio.h> #include <locale.h> #include <langinfo.h>

int main ()
  char *l;

  setlocale (LC_ALL, "");
  l = nl_langinfo (CODESET);
  if (l)
    printf ("%s\n", l);
  return 0;

$ ./nll

$ LANG=C.UTF-8 ./nll

$ LANG=ja_JP ./nll

$ LANG=ru_RU ./nll

$ LANG=ru_UA ./nll

$ LANG=zh_CN ./nll

$ LANG=zh_TW ./nll

Sigh. Do we really need a translation table?
Yes (sigh). And yes, that's what I had suggested before. Actually, "locale charmap" (on a system with a locale command) gives you the same information as "nll".
If you want a table, a fairly complete one is included in my package mined, file src/locales.t (generated from src/locales.cfg).
(Complete in the sense that all locales without explicit suffix not listed here map to ISO-8859-1; maybe I should also include them to distinguish unknown locales ...)
And, as becomes clear here, the syntax of charmap/codeset names is different between locale names and nl_langinfo,
e.g. eucJP vs. EUC-JP.


