Even if locale -a output format is not specified by Open Group Base Specifications Issue 7 / IEEE Std 1003.1-2008, the way it's currently showing codeset doesn't match IANA names[1][4]. For example, en_US.UTF-8 became en_US.utf8 , and 'utf8' is not the proper way to name UTF-8[2]. Another example, fr_FR.ISO-8859-15 became fr_FR.iso885915. glibc's locale -a output, when using locale-archive file, doesn't match locale -a's output from various BSD system. More: when using glibc system without locale-archive, codeset name are reported differently, e.g. like on the BSD system. This behavor difference hit me while fixing a test from git testsuite[3]. locale -a should use nl_langinfo(CODESET) instead of the hash key stored in locale-archive, and still report the hash key for compatibility. [1] http://www.iana.org/assignments/character-sets [2] http://en.wikipedia.org/wiki/UTF-8#Official_name_and_incorrect_variants [3] http://thread.gmane.org/gmane.comp.version-control.git/147283/focus=147285 [4] See the following comment in intl/l10nflist.c : /* Normalize codeset name. There is no standard for the codeset names. Normalization allows the user to use any of the common names. The return value is dynamically allocated and has to be freed by the caller. */ const char * _nl_normalize_codeset (codeset, name_len)
"Incorrect" codeset names are also reported when locale definition were created with "incorrect" name: localedef -f UTF-8 -i en_US /usr/lib/locale/en_US.utf8 Instead of localedef -f UTF-8 -i en_US /usr/lib/locale/en_US.UTF-8 locale -a will use the directory name, not the CODESET included in the definition.
The output is the normalized name and what glibc will accept. The output is correct.
(In reply to comment #1) > localedef -f UTF-8 -i en_US /usr/lib/locale/en_US.utf8 > > Instead of > > localedef -f UTF-8 -i en_US /usr/lib/locale/en_US.UTF-8 > Note that if en_US.utf8 directory exists then en_US.UTF-8 is also supported, but if only en_US.UTF-8 exists, en_US.utf8 is no more recognized.
(In reply to comment #2) > The output is the normalized name and what glibc will accept. The output is > correct. Yes, it is correct. My main concern problem here, is the difference from others systems: all systems I've checked were reporting UTF-8, not utf8. When normalized, it doesn't use IANA codeset names.