Bug 13237 - LC_ADDRESS.country_name: update all locales w/latest CLDR data
Summary: LC_ADDRESS.country_name: update all locales w/latest CLDR data
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.14
: P2 normal
Target Milestone: 2.24
Assignee: GNU C Library Locale Maintainers
URL:
Keywords:
Depends on: 11484 13949 13950 13951 13952
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-30 06:34 UTC by Chris Leonard
Modified: 2016-04-22 03:59 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Summary of glibc country_name field entries. (33.67 KB, application/vnd.oasis.opendocument.spreadsheet)
2011-09-30 06:34 UTC, Chris Leonard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Leonard 2011-09-30 06:34:20 UTC
Created attachment 5954 [details]
Summary of glibc country_name field entries.

I have performed a comprehensive analysis of the use of the LC_ADDRESS field for country_name.  I am somewhat concerned by the findings of that analysis for a field that should be populated with the name of the country in the language of the locale, two pieces of information inherent in the locale name.


There are 279 locales (excluding the deprecated iw_IL).

Of those 279, only 84 locales have populated country_name fields. 



84 populated

43 empty, (not readily determined)

152 empty, but can be easily determined by look-up in ISO-3166 L10n files.

equals 279 total


Of the 84 populated country_name fields:

37 can be confirmed from ISO-3166 L10n files.

31 cannot be confirmed from ISO-3166 L10n files (not necessarily a problem).

16 have obvious encoding errors or require review and / or correction.


Examples of errors:

km_KH encodes Lao characters spelling Laos, not Khmer characters spelling Cambodia.

bg_BG, ku_TR, mk_MK, mn_MN, tr_TR encode English, not native language/script names

bo_CN and bo_IN coded as FIXME, should be commented out.

dz_BT coded as BHU

ur_IN uses "copy hi_IN", thus encoding Localein Hindi, not Urdu language name of India.

en-US encodes USA (not United States)
es-US encodes USA (not Estados Unidos)

Others include conflicts with ISO-3166 entries that require clarification.

Some consideration should be given to correcting the obvious errors and making the easily confirmed additions so that the LC_ADDRESS country_name field is more usefully populated with the country name of the locale in the language of the locale.


The first column attached spreadsheet contains links to 2xlibre.net locale files (purely for convenience), This data had been recently refreshed from 2.14 release.

All details checked against original sources at:
http://sourceware.org/git/?p=glibc.git;a=tree;f=localedata/locales;h=aa17c365ce474cfb9c7dab92b623bfb5a8786208;hb=HEAD

The key columns are the "Action" (suggested) and the "Corrected country_name" column.  The entries in the "Evidence ISO-3166" column link directly to the relevant location within the PO files.
Comment 1 Ulrich Drepper 2011-12-22 16:54:29 UTC
Either you provide a real patch and not something as useless as a spreadsheet or you close the bug.
Comment 2 Chris Leonard 2011-12-22 17:06:57 UTC
Ulrich, thank you for commenting, I wasn't sure anyone had looked at this bug.  

As it was a global analysis a spreadsheet seemed the best way to communicate about it.  Are you saying that the only way to get action would be to provide several hundred individual patches?  Just trying to understand.
Comment 3 Claude Paroz 2011-12-23 08:30:43 UTC
If I would have to receive the patch, I would like to have one for changes/fixes and one for new field additions, but it's just me...
Comment 4 Ulrich Drepper 2011-12-23 14:53:49 UTC
> As it was a global analysis a spreadsheet seemed the best way to communicate
> about it.  Are you saying that the only way to get action would be to provide
> several hundred individual patches?  Just trying to understand.

For all files where the change has been created the same way a single patch is sufficient.  But yes, a patch is needed.
Comment 5 Chris Leonard 2011-12-23 14:59:08 UTC
I will work on several patches as suggested.  I was not sure that such a multi-locale patch would be acceptable.  i will break them out as logically as possible, additions supported by ISO-3166, simple fixes (commenting out the "FIXME"), etc.
Comment 6 Roumen Petrov 2012-04-06 20:33:18 UTC

*** This bug has been marked as a duplicate of bug 11484 ***
Comment 7 Roumen Petrov 2012-04-06 20:36:25 UTC
I'm sorry wrong issue marked as duplicate
Comment 8 Roumen Petrov 2012-04-06 20:43:18 UTC
Restored to waiting
Comment 9 Mike Frysinger 2016-02-19 07:19:41 UTC
i'm in the process of updating all locales to the CLDR entries.  since it'll be automated from a vetted source, doing multiple patches shouldn't be needed.