LC_COLLATE in localedata/locales/hsb_DE does not build upon copy "iso14651_t1" missing all updates from there.
https://unicode.org/cldr/trac/browser/trunk/common/collation/hsb.xml contains: <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE ldml SYSTEM "../../common/dtd/ldml.dtd"> <!-- Copyright © 2014 Unicode, Inc. CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/) For terms of use, see http://www.unicode.org/copyright.html --> <ldml> <identity> <version number="$Revision: 11914 $" /> <language type="hsb" /> </identity> <collations> <collation type="standard" references="Prawopisny słownik hornjoserbskeje rěče, Pawoł Völkel, wobdźěłał Timo Meškank, 1970/2005, ISBN 3-7420-1920-1 "> <cr><![CDATA[ &C<č<<<Č<ć<<<Ć &E<ě<<<Ě &H<ch<<<cH<<<Ch<<<CH &[before 1] L<ł<<<Ł &R<ř<<<Ř &S<š<<<Š &Z<ž<<<Ž<ź<<<Ź ]]></cr> </collation> </collations> </ldml> In glibc, in localedata/locales/hsb_DE, LC_COLLATE contains: collating-element <D-Z'> from "<U0044><U0179>" collating-element <D-z'> from "<U0044><U017A>" collating-element <d-Z'> from "<U0064><U0179>" collating-element <d-z'> from "<U0064><U017A>" [...] <d8> <D-Z'> <D-Z'>;<NONE>;<CAPITAL>;IGNORE <D-z'> <D-Z'>;<NONE>;<CAPITAL-SMALL>;IGNORE <d-Z'> <D-Z'>;<NONE>;<SMALL-CAPITAL>;IGNORE <d-z'> <D-Z'>;<NONE>;<SMALL>;IGNORE [...] I.e. it contains special rules to sort dź which CLDR has not.
The current hsb_DE locale sorts ć and Ć after t: <t8> <U0106> <U0106>;<NONE>;<CAPITAL>;IGNORE <U0107> <U0106>;<NONE>;<SMALL>;IGNORE I.e. it sorts like this: S š Š ć Ć Z This seems wrong.
The current hsb_DE sorting also contradicts the CLDR sort order in sorting like this: Z ź Ź ž Ž i.e. sorting ž after ź. In CLDR it is the other way round: &Z<ž<<<Ž<ź<<<Ź
There is a little bit of a contradiction in the CLDR data for collation. https://unicode.org/cldr/trac/browser/trunk/common/collation/hsb.xml contains: &C<č<<<Č<ć<<<Ć &E<ě<<<Ě &H<ch<<<cH<<<Ch<<<CH &[before 1] L<ł<<<Ł &R<ř<<<Ř &S<š<<<Š &Z<ž<<<Ž<ź<<<Ź but https://unicode.org/cldr/trac/browser/trunk/common/main/hsb.xml contains: <exemplarCharacters type="index">[A B C Č Ć D {DŹ} E F G H {CH} I J K Ł L M N O P Q R S Š T U V W X Y Z Ž]</exemplarCharacters> I.e. in the index, DŹ is considered as a special character whereas in the sorting rules it is not. Also, Ź is special in the sorting rules but not in the index.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via 62ea2193ee4b538b13da1c579113761e0b92376c (commit) from 37ac8e635a29810318f6d79902102e2e96b2b5bf (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=62ea2193ee4b538b13da1c579113761e0b92376c commit 62ea2193ee4b538b13da1c579113761e0b92376c Author: Mike FABIAN <mfabian@redhat.com> Date: Wed Dec 6 10:02:48 2017 +0100 hsb_DE locale: Base collation on copy "iso14651_t1" [BZ #22515] [BZ #22515] * localedata/Makefile: Add hsb_DE.UTF-8 to test-input and to the list of locales to be built for testing. * localedata/hsb_DE.UTF-8.in: New file for testing the collation. * localedata/locales/hsb_DE (LC_COLLATE): Use “copy "iso14651_t1"” and build the collation rules upon that. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 9 + localedata/Makefile | 5 +- localedata/hsb_DE.UTF-8.in | 35 + localedata/locales/hsb_DE | 2159 ++------------------------------------------ 4 files changed, 133 insertions(+), 2075 deletions(-) create mode 100644 localedata/hsb_DE.UTF-8.in
Fixed in glibc master.
(In reply to Mike FABIAN from comment #4) > There is a little bit of a contradiction in the CLDR data > for collation. > > https://unicode.org/cldr/trac/browser/trunk/common/collation/hsb.xml > > contains: > > &C<č<<<Č<ć<<<Ć > &E<ě<<<Ě > &H<ch<<<cH<<<Ch<<<CH > &[before 1] L<ł<<<Ł > &R<ř<<<Ř > &S<š<<<Š > &Z<ž<<<Ž<ź<<<Ź > > but > > https://unicode.org/cldr/trac/browser/trunk/common/main/hsb.xml > > contains: > > <exemplarCharacters type="index">[A B C Č Ć D {DŹ} E F G H {CH} I J K Ł L M > N O P Q R S Š T U V W X Y Z Ž]</exemplarCharacters> > > I.e. in the index, DŹ is considered as a special character whereas in > the sorting rules it is not. > > Also, Ź is special in the sorting rules but not in the index. I reported this to CLDR: https://unicode.org/cldr/trac/ticket/10797