Summary: | es_US locale has invalid collation rules for 'ch' and 'll' | ||
---|---|---|---|
Product: | glibc | Reporter: | Aldo <aldocassola> |
Component: | localedata | Assignee: | Mike FABIAN <maiku.fabian> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | carlos, libc-locales, maiku.fabian |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | unspecified | ||
Target Milestone: | 2.38 | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | icu sort sample program |
Description
Aldo
2014-02-19 17:38:18 UTC
Do we know what CLDR does here? Not entirely sure if this link is the right one, but it seems they agree with the rules: http://st.unicode.org/cldr-apps/v#/es_EC/Alphabetic_Information/ (In reply to Aldo from comment #2) > Not entirely sure if this link is the right one, but it seems they agree > with the rules: > > http://st.unicode.org/cldr-apps/v#/es_EC/Alphabetic_Information/ That doesn't provide enough information. For example if instead you use libicu (http://site.icu-project.org/) to do the sorting and it comes out as expected then that argues CLDR has the same interpretation. In the light of our desire to harmonize better with CLDR we would make the change locally. That makes sense. However, es_EC is the only locale of a latinamerican country inheriting collation from es_US (we do use the US dollar, but text is collated as specified by the authority, which is the case for the other countries), which looks more like a bug to me. Created attachment 7429 [details]
icu sort sample program
Comment on attachment 7429 [details]
icu sort sample program
I have written a small sort program using libicu to sort the strings "ca", "ch", "cz", and "cñ". Compile it with
gcc sort.c -licui18n -licuuc -licuio
It takes the locale as the first command-line argument.
A sample run:
$ ./a.out es_EC
Unsorted array (using: es_EC)
ch cñ cz ca
Sorted array (using: es_EC)
ca ch cñ cz
I think this is fixed. Currently **all** es locales inherit their collation from es_ES: mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $ grep -A2 ^LC_COLLATE es_* es_AR:LC_COLLATE es_AR-copy "es_ES" es_AR-END LC_COLLATE -- es_BO:LC_COLLATE es_BO-copy "es_ES" es_BO-END LC_COLLATE -- es_CL:LC_COLLATE es_CL-copy "es_ES" es_CL-END LC_COLLATE -- es_CO:LC_COLLATE es_CO-copy "es_ES" es_CO-END LC_COLLATE -- es_CR:LC_COLLATE es_CR-copy "es_ES" es_CR-END LC_COLLATE -- es_CU:LC_COLLATE es_CU-copy "es_ES" es_CU-END LC_COLLATE -- es_DO:LC_COLLATE es_DO-copy "es_ES" es_DO-END LC_COLLATE -- es_EC:LC_COLLATE es_EC-copy "es_ES" es_EC-END LC_COLLATE -- es_ES:LC_COLLATE es_ES-% CLDR collation rules for Spanish: es_ES-% (see: https://unicode.org/cldr/trac/browser/trunk/common/collation/es.xml) -- es_ES@euro:LC_COLLATE es_ES@euro-copy "es_ES" es_ES@euro-END LC_COLLATE -- es_GT:LC_COLLATE es_GT-copy "es_ES" es_GT-END LC_COLLATE -- es_HN:LC_COLLATE es_HN-copy "es_ES" es_HN-END LC_COLLATE -- es_MX:LC_COLLATE es_MX-copy "es_ES" es_MX-END LC_COLLATE -- es_NI:LC_COLLATE es_NI-copy "es_ES" es_NI-END LC_COLLATE -- es_PA:LC_COLLATE es_PA-copy "es_ES" es_PA-END LC_COLLATE -- es_PE:LC_COLLATE es_PE-copy "es_ES" es_PE-END LC_COLLATE -- es_PR:LC_COLLATE es_PR-copy "es_ES" es_PR-END LC_COLLATE -- es_PY:LC_COLLATE es_PY-copy "es_ES" es_PY-END LC_COLLATE -- es_SV:LC_COLLATE es_SV-copy "es_ES" es_SV-END LC_COLLATE -- es_US:LC_COLLATE es_US-copy "es_ES" es_US-END LC_COLLATE -- es_UY:LC_COLLATE es_UY-copy "es_ES" es_UY-END LC_COLLATE -- es_VE:LC_COLLATE es_VE-copy "es_ES" es_VE-END LC_COLLATE mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $ |