16608 – es_US locale has invalid collation rules for 'ch' and 'll'

Bug 16608 - es_US locale has invalid collation rules for 'ch' and 'll'

Summary: es_US locale has invalid collation rules for 'ch' and 'll'

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	localedata (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	2.38
Assignee:	Mike FABIAN

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-02-19 17:38 UTC by Aldo
Modified:	2024-01-04 12:05 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
icu sort sample program (577 bytes, text/plain) 2014-02-19 21:04 UTC, Aldo	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Aldo 2014-02-19 17:38:18 UTC

The es_EC locale file (which depends on es_US) defines 'ch' and 'll' as standalone letters for collation. This is an incorrect collation procedure according to the rules of the Spanish Royal Academy since 1997 (see http://www.rae.es/consultas/exclusion-de-ch-y-ll-del-abecedario)

According to the above rules, words with ch and ll are to be sorted as simply having a 'c' and 'h' and double 'l' 

E.g.:
incorrect (current): file_ce, file_cf, file_cg, file_cz, file_ch
correct (expected): file_ce, file_cf, file_cg, file_ch, file_cz

The es_ES file specifies the correct behavior and the rest of es_* files depend on it.

Please either make es_EC depend on es_ES or fix es_US

Comment 1 Carlos O'Donell 2014-02-19 18:28:09 UTC

Do we know what CLDR does here?

Comment 2 Aldo 2014-02-19 18:42:06 UTC

Not entirely sure if this link is the right one, but it seems they agree with the rules:

http://st.unicode.org/cldr-apps/v#/es_EC/Alphabetic_Information/

Comment 3 Carlos O'Donell 2014-02-19 18:52:09 UTC

(In reply to Aldo from comment #2)
> Not entirely sure if this link is the right one, but it seems they agree
> with the rules:
> 
> http://st.unicode.org/cldr-apps/v#/es_EC/Alphabetic_Information/

That doesn't provide enough information. For example if instead you use libicu (http://site.icu-project.org/) to do the sorting and it comes out as expected then that argues CLDR has the same interpretation. In the light of our desire to harmonize better with CLDR we would make the change locally.

Comment 4 Aldo 2014-02-19 19:08:19 UTC

That makes sense. However, es_EC is the only locale of a latinamerican country inheriting collation from es_US (we do use the US dollar, but text is collated as specified by the authority, which is the case for the other countries), which looks more like a bug to me.

Comment 5 Aldo 2014-02-19 21:04:50 UTC

Created attachment 7429 [details]
icu sort sample program

Comment 6 Aldo 2014-02-19 21:05:56 UTC

Comment on attachment 7429 [details]
icu sort sample program

I have written a small sort program using libicu to sort the strings "ca", "ch", "cz", and "c&ntilde;". Compile it with 
gcc sort.c -licui18n -licuuc -licuio

It takes the locale as the first command-line argument.

A sample run:

$ ./a.out es_EC
Unsorted array (using: es_EC)
ch cñ cz ca
Sorted array (using: es_EC)
ca ch cñ cz

Comment 7 Mike FABIAN 2024-01-04 12:05:22 UTC

I think this is fixed.

Currently **all** es locales inherit their collation from es_ES:

mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ grep -A2 ^LC_COLLATE es_*
es_AR:LC_COLLATE
es_AR-copy "es_ES"
es_AR-END LC_COLLATE
--
es_BO:LC_COLLATE
es_BO-copy "es_ES"
es_BO-END LC_COLLATE
--
es_CL:LC_COLLATE
es_CL-copy "es_ES"
es_CL-END LC_COLLATE
--
es_CO:LC_COLLATE
es_CO-copy "es_ES"
es_CO-END LC_COLLATE
--
es_CR:LC_COLLATE
es_CR-copy "es_ES"
es_CR-END LC_COLLATE
--
es_CU:LC_COLLATE
es_CU-copy "es_ES"
es_CU-END LC_COLLATE
--
es_DO:LC_COLLATE
es_DO-copy "es_ES"
es_DO-END LC_COLLATE
--
es_EC:LC_COLLATE
es_EC-copy "es_ES"
es_EC-END LC_COLLATE
--
es_ES:LC_COLLATE
es_ES-% CLDR collation rules for Spanish:
es_ES-% (see: https://unicode.org/cldr/trac/browser/trunk/common/collation/es.xml)
--
es_ES@euro:LC_COLLATE
es_ES@euro-copy "es_ES"
es_ES@euro-END LC_COLLATE
--
es_GT:LC_COLLATE
es_GT-copy "es_ES"
es_GT-END LC_COLLATE
--
es_HN:LC_COLLATE
es_HN-copy "es_ES"
es_HN-END LC_COLLATE
--
es_MX:LC_COLLATE
es_MX-copy "es_ES"
es_MX-END LC_COLLATE
--
es_NI:LC_COLLATE
es_NI-copy "es_ES"
es_NI-END LC_COLLATE
--
es_PA:LC_COLLATE
es_PA-copy "es_ES"
es_PA-END LC_COLLATE
--
es_PE:LC_COLLATE
es_PE-copy "es_ES"
es_PE-END LC_COLLATE
--
es_PR:LC_COLLATE
es_PR-copy "es_ES"
es_PR-END LC_COLLATE
--
es_PY:LC_COLLATE
es_PY-copy "es_ES"
es_PY-END LC_COLLATE
--
es_SV:LC_COLLATE
es_SV-copy "es_ES"
es_SV-END LC_COLLATE
--
es_US:LC_COLLATE
es_US-copy "es_ES"
es_US-END LC_COLLATE
--
es_UY:LC_COLLATE
es_UY-copy "es_ES"
es_UY-END LC_COLLATE
--
es_VE:LC_COLLATE
es_VE-copy "es_ES"
es_VE-END LC_COLLATE
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$