Bug 22336 - cs_CZ LC_COLLATE does not use i18n
Summary: cs_CZ LC_COLLATE does not use i18n
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.26
: P2 normal
Target Milestone: 2.27
Assignee: Mike FABIAN
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-23 07:10 IST by Andreas Schwab
Modified: 2017-11-29 11:07 IST (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
0001-cs_CZ-locale-Base-collation-on-iso14651_t1-BZ-22336.patch (16.63 KB, patch)
2017-11-24 11:28 IST, Mike FABIAN
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schwab 2017-10-23 07:10:20 IST
localedata/locales/cs_CZ does not build upon localedata/locales/i18n, missing all updates from there.
Comment 1 Mike FABIAN 2017-11-24 11:28:31 IST
Created attachment 10632 [details]
0001-cs_CZ-locale-Base-collation-on-iso14651_t1-BZ-22336.patch

Patch to fix the problem.

Difference in sorting of the added test file (added in the patch)
before and after applying the sorting changes of the patch:

$ diff -u cs_CZ.UTF-8.in.old cs_CZ.UTF-8.in 
--- cs_CZ.UTF-8.in.old  2017-11-24 16:47:52.688348458 +0530
+++ cs_CZ.UTF-8.in      2017-11-24 16:42:08.364221944 +0530
@@ -1,7 +1,3 @@
-ȥ
-Ȥ
-ʒ
-Ʒ
 a
 a
 a
@@ -65,7 +61,6 @@
 cenných
 cenným
 cenou
-cH
 cvrček
 cz
 cZ
@@ -94,8 +89,9 @@
 H
 hruška
 ch
-CH
+cH
 Ch
+CH
 chřestýšům
 Chřestýšům
 chřipka
@@ -188,6 +184,8 @@
 Z
 ź
 Ź
+ȥ
+Ȥ
 za
 Za
 źa
@@ -209,6 +207,8 @@
 Žb
 žluva
 Žluva
+ʒ
+Ʒ
 0
 1
 1

I think "cH" was sorted completely wrong before and "CH" slightly
wrong as well. So this patch seems to not only base the Czech
LC_COLLATE implementation on the iso14651_t1 file as requested in this
bug but also improves the sorting of the uppercase/lowercase variants
of the ch digraph.

And of course it improves the sorting of some non-Czech characters
like ʒ and ȥ because these were not handled at all in the old
Czech LC_COLLATE implementation.
Comment 2 cvs-commit@gcc.gnu.org 2017-11-28 09:33:20 IST
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  f9bb1ef233f7c99312ddc3efb6bc652537e077bc (commit)
      from  f433d0b3bbde748fa7f0941980c3d4d2863dc483 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f9bb1ef233f7c99312ddc3efb6bc652537e077bc

commit f9bb1ef233f7c99312ddc3efb6bc652537e077bc
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Fri Nov 24 15:17:18 2017 +0530

    cs_CZ locale: Base collation on iso14651_t1 [BZ #22336]
    
    	[BZ #22336]
    	* localedata/locales/cs_CZ (LC_COLLATE): Use “copy "iso14651_t1"”
    	and implement the collation rules for cs from CLDR on top of that.
    	* Makefile: Add cs_CZ.UTF-8 to test-input and to the list
    	of locales to be built for testing.
    	* cs_CZ.UTF-8.in: New file with test data to test the Czech sorting.
    
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>

-----------------------------------------------------------------------

Summary of changes:
 localedata/Makefile       |    4 +-
 localedata/cs_CZ.UTF-8.in |  228 +++++
 localedata/locales/cs_CZ  | 2216 ++-------------------------------------------
 3 files changed, 286 insertions(+), 2162 deletions(-)
 create mode 100644 localedata/cs_CZ.UTF-8.in
Comment 3 cvs-commit@gcc.gnu.org 2017-11-28 16:02:39 IST
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  22c69b6ad63dce90955d3a1d654dd152e7972fdd (commit)
      from  8d7d3ba8c505d84b2bd1946f474d9ddfe4219f0f (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=22c69b6ad63dce90955d3a1d654dd152e7972fdd

commit 22c69b6ad63dce90955d3a1d654dd152e7972fdd
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Tue Nov 28 16:42:13 2017 +0100

    Add the Changelog entry for “cs_CZ locale: Base collation on iso14651_t1 [BZ #22336]”

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)
Comment 4 Mike FABIAN 2017-11-29 11:07:49 IST
Fixed in glibc master.