11629 – locale -a reports incorrect codeset name especialy when using locale-archive

Bug 11629 - locale -a reports incorrect codeset name especialy when using locale-archive

Summary: locale -a reports incorrect codeset name especialy when using locale-archive

Status:	RESOLVED INVALID

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	libc (show other bugs)
Version:	2.12

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Ulrich Drepper

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-05-24 16:49 UTC by Yann Droneaud
Modified:	2014-06-30 18:02 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Yann Droneaud 2010-05-24 16:49:04 UTC

Even if locale -a output format is not specified by Open Group Base
Specifications Issue 7 / IEEE Std 1003.1-2008, the way it's currently showing
codeset doesn't match IANA names[1][4].

For example, en_US.UTF-8 became en_US.utf8 , and 'utf8' is not the proper way to
name UTF-8[2]. Another example, fr_FR.ISO-8859-15 became fr_FR.iso885915.

glibc's locale -a output, when using locale-archive file, doesn't match locale
-a's output from various BSD system.

More: when using glibc system without locale-archive, codeset name are reported
differently, e.g. like on the BSD system.

This behavor difference hit me while fixing a test from git testsuite[3].

locale -a should use nl_langinfo(CODESET) instead of the hash key stored in
locale-archive, and still report the hash key for compatibility.

[1] http://www.iana.org/assignments/character-sets
[2] http://en.wikipedia.org/wiki/UTF-8#Official_name_and_incorrect_variants
[3] http://thread.gmane.org/gmane.comp.version-control.git/147283/focus=147285
[4] See the following comment in intl/l10nflist.c :
    /* Normalize codeset name.  There is no standard for the codeset
       names.  Normalization allows the user to use any of the common
       names.  The return value is dynamically allocated and has to be
       freed by the caller.  */
    const char *
    _nl_normalize_codeset (codeset, name_len)

Comment 1 Yann Droneaud 2010-05-25 15:10:49 UTC

"Incorrect" codeset names are also reported when locale definition were created
with "incorrect" name:

  localedef -f UTF-8 -i en_US /usr/lib/locale/en_US.utf8 

Instead of

  localedef -f UTF-8 -i en_US /usr/lib/locale/en_US.UTF-8

locale -a will use the directory name, not the CODESET included in the definition.

Comment 2 Ulrich Drepper 2010-05-25 15:12:22 UTC

The output is the normalized name and what glibc will accept.  The output is
correct.

Comment 3 Yann Droneaud 2010-05-25 15:14:56 UTC

(In reply to comment #1)
>   localedef -f UTF-8 -i en_US /usr/lib/locale/en_US.utf8 
> 
> Instead of
> 
>   localedef -f UTF-8 -i en_US /usr/lib/locale/en_US.UTF-8
> 

Note that if en_US.utf8 directory exists then en_US.UTF-8 is also supported, but
if only en_US.UTF-8 exists, en_US.utf8 is no more recognized.

Comment 4 Yann Droneaud 2010-05-25 15:18:58 UTC

(In reply to comment #2)
> The output is the normalized name and what glibc will accept.  The output is
> correct.

Yes, it is correct.

My main concern problem here, is the difference from others systems: all systems
I've checked were reporting UTF-8, not utf8. When normalized, it doesn't use
IANA codeset names.