This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[PATCH][BZ 17293] Fix sorting order for Ukrainian locale
- From: Andriy Rysin <arysin at gmail dot com>
- To: libc-locales at sourceware dot org
- Date: Thu, 1 Jan 2015 23:22:15 -0500
- Subject: [PATCH][BZ 17293] Fix sorting order for Ukrainian locale
- Authentication-results: sourceware.org; auth=none
The sorting order for several characters was wrong in uk_UA locale.
This patch fixes two problems:
1) soft sign position (it has its own in the alphabet and should not be ignored)
2) UKR-IE should follow CYR-IE (as they are separate letters and have
their own positions)
Collation order tests added.
Unfortunately there's no official standard for collation for Ukrainian
language in public access but this new order is confirmed to be used
in official documents and dictionaries in Ukrainian.
Some links:
http://spelling.ulif.org.ua/peredmova.htm - Official spelling rules
for Ukrainian (the alphabet is listed there and the only note is about
apostrophe which should not affect the sorting)
http://lcorp.ulif.org.ua/dictua/ - Ukrainian dictionaries from
National Academy of Science use the sorting order that matches the one
provided by the patch
Also with this patch the order for soft sign and UKR-IE/CYR-IE match
those in ICU which follows Unicode standard.
Thanks
Andriy
From e2cdfa3b916a2dbac80184ed7918aaebb88d57e7 Mon Sep 17 00:00:00 2001
From: Andriy Rysin <arysin@gmail.com>
Date: Thu, 1 Jan 2015 22:16:10 -0500
Subject: [PATCH] Fix sorting order for Ukrainian locale: soft sign has its
own position and UKR-IE should follow CYR-IE; added collation tests for
Ukrainian locale symbols
---
localedata/ChangeLog | 5 ++
localedata/Makefile | 4 +-
localedata/locales/uk_UA | 116 ++++++++++++++++++++++++-----------------------
localedata/uk_UA.in | 56 +++++++++++++++++++++++
4 files changed, 122 insertions(+), 59 deletions(-)
create mode 100644 localedata/uk_UA.in
diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 1636e52..147df93 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-01 Andriy Rysin <arysin@gmail.com>
+
+ [BZ #17293]
+ * uk_UA: Fix sorting order for Ukrainian locale
+
2014-12-01 Pravin Satpute <psatpute@redhat.com>
[BZ #16857]
diff --git a/localedata/Makefile b/localedata/Makefile
index 0826b36..5f8ca7f 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon tst-rpmatch tst-trans \
tst-ctype tst-langinfo tst-langinfo-static tst-numeric
test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
- si_LK.UTF-8
+ si_LK.UTF-8 uk_UA.UTF-8
test-input-data = $(addsuffix .in, $(basename $(test-input)))
test-output := $(foreach s, .out .xout, \
$(addsuffix $s, $(basename $(test-input))))
@@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \
hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
- tr_TR.ISO-8859-9 en_GB.UTF-8
+ tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8
LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g')
CHARMAPS := $(shell echo "$(LOCALES)" | \
sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g)
diff --git a/localedata/locales/uk_UA b/localedata/locales/uk_UA
index d9194b8..2bd30eb 100644
--- a/localedata/locales/uk_UA
+++ b/localedata/locales/uk_UA
@@ -349,61 +349,63 @@ collating-symbol <UKR-GHE>
% Soft sign '<U044C>' may follow only this set of nine characters [<U0432><U0434><U0437><U043B><U043D><U0440><U0441><U0442><U0446>].
% It only softens pronunciation of these characters so it's should not impact
% sorting.
-
-
-collating-symbol <V+SS>
-collating-element <V-SS> from "<U0412><U042C>"
-collating-element <V-ss> from "<U0412><U044C>"
-collating-element <v-SS> from "<U0432><U042C>"
-collating-element <v-ss> from "<U0432><U044C>"
-
-collating-symbol <D+SS>
-collating-element <D-SS> from "<U0414><U042C>"
-collating-element <D-ss> from "<U0414><U044C>"
-collating-element <d-SS> from "<U0434><U042C>"
-collating-element <d-ss> from "<U0434><U044C>"
-
-collating-symbol <Z+SS>
-collating-element <Z-SS> from "<U0417><U042C>"
-collating-element <Z-ss> from "<U0417><U044C>"
-collating-element <z-SS> from "<U0437><U042C>"
-collating-element <z-ss> from "<U0437><U044C>"
-
-collating-symbol <L+SS>
-collating-element <L-SS> from "<U041B><U042C>"
-collating-element <L-ss> from "<U041B><U044C>"
-collating-element <l-SS> from "<U043B><U042C>"
-collating-element <l-ss> from "<U043B><U044C>"
-
-collating-symbol <N+SS>
-collating-element <N-SS> from "<U041D><U042C>"
-collating-element <N-ss> from "<U041D><U044C>"
-collating-element <n-SS> from "<U043D><U042C>"
-collating-element <n-ss> from "<U043D><U044C>"
-
-collating-symbol <R+SS>
-collating-element <R-SS> from "<U0420><U042C>"
-collating-element <R-ss> from "<U0420><U044C>"
-collating-element <r-SS> from "<U0440><U042C>"
-collating-element <r-ss> from "<U0440><U044C>"
-
-collating-symbol <S+SS>
-collating-element <S-SS> from "<U0421><U042C>"
-collating-element <S-ss> from "<U0421><U044C>"
-collating-element <s-SS> from "<U0441><U042C>"
-collating-element <s-ss> from "<U0441><U044C>"
-
-collating-symbol <T+SS>
-collating-element <T-SS> from "<U0422><U042C>"
-collating-element <T-ss> from "<U0422><U044C>"
-collating-element <t-SS> from "<U0442><U042C>"
-collating-element <t-ss> from "<U0442><U044C>"
-
-collating-symbol <TSE+SS>
-collating-element <TS-SS> from "<U0426><U042C>"
-collating-element <TS-ss> from "<U0426><U044C>"
-collating-element <ts-SS> from "<U0446><U042C>"
-collating-element <ts-ss> from "<U0446><U044C>"
+%
+% Note: in the official alphabet the soft sign is a letter and has a hard position in the order
+
+
+%collating-symbol <V+SS>
+%collating-element <V-SS> from "<U0412><U042C>"
+%collating-element <V-ss> from "<U0412><U044C>"
+%collating-element <v-SS> from "<U0432><U042C>"
+%collating-element <v-ss> from "<U0432><U044C>"
+%
+%collating-symbol <D+SS>
+%collating-element <D-SS> from "<U0414><U042C>"
+%collating-element <D-ss> from "<U0414><U044C>"
+%collating-element <d-SS> from "<U0434><U042C>"
+%collating-element <d-ss> from "<U0434><U044C>"
+%
+%collating-symbol <Z+SS>
+%collating-element <Z-SS> from "<U0417><U042C>"
+%collating-element <Z-ss> from "<U0417><U044C>"
+%collating-element <z-SS> from "<U0437><U042C>"
+%collating-element <z-ss> from "<U0437><U044C>"
+%
+%collating-symbol <L+SS>
+%collating-element <L-SS> from "<U041B><U042C>"
+%collating-element <L-ss> from "<U041B><U044C>"
+%collating-element <l-SS> from "<U043B><U042C>"
+%collating-element <l-ss> from "<U043B><U044C>"
+%
+%collating-symbol <N+SS>
+%collating-element <N-SS> from "<U041D><U042C>"
+%collating-element <N-ss> from "<U041D><U044C>"
+%collating-element <n-SS> from "<U043D><U042C>"
+%collating-element <n-ss> from "<U043D><U044C>"
+%
+%collating-symbol <R+SS>
+%collating-element <R-SS> from "<U0420><U042C>"
+%collating-element <R-ss> from "<U0420><U044C>"
+%collating-element <r-SS> from "<U0440><U042C>"
+%collating-element <r-ss> from "<U0440><U044C>"
+%
+%collating-symbol <S+SS>
+%collating-element <S-SS> from "<U0421><U042C>"
+%collating-element <S-ss> from "<U0421><U044C>"
+%collating-element <s-SS> from "<U0441><U042C>"
+%collating-element <s-ss> from "<U0441><U044C>"
+%
+%collating-symbol <T+SS>
+%collating-element <T-SS> from "<U0422><U042C>"
+%collating-element <T-ss> from "<U0422><U044C>"
+%collating-element <t-SS> from "<U0442><U042C>"
+%collating-element <t-ss> from "<U0442><U044C>"
+%
+%collating-symbol <TSE+SS>
+%collating-element <TS-SS> from "<U0426><U042C>"
+%collating-element <TS-ss> from "<U0426><U044C>"
+%collating-element <ts-SS> from "<U0446><U042C>"
+%collating-element <ts-ss> from "<U0446><U044C>"
collating-symbol <CAP-MIN>
@@ -489,11 +491,11 @@ reorder-after <U0434>
<U0455> "<U003C><U0043><U0059><U0052><U002D><U0044><U0045><U003E><U003C><U0043><U0059><U0052><U002D><U005A><U0045><U003E>";"<U003C><U004C><U0049><U0047><U003E><U003C><U004C><U0049><U0047><U003E>";"<U003C><U004D><U0049><U004E><U003E><U003C><U004D><U0049><U004E><U003E>";IGNORE % CYR-DZE
reorder-after <U0435>
-<U0454> <CYR-IE>;<UKR-IE>;<MIN>;IGNORE
+%<U0454> <CYR-IE>;<UKR-IE>;<MIN>;IGNORE
<U0451> <CYR-IE>;<CYR-IO>;<MIN>;IGNORE
<U044D> <CYR-IE>;<CYR-E>;<MIN>;IGNORE
reorder-after <U0415>
-<U0404> <CYR-IE>;<UKR-IE>;<CAP>;IGNORE
+%<U0404> <CYR-IE>;<UKR-IE>;<CAP>;IGNORE
<U0401> <CYR-IE>;<CYR-IO>;<CAP>;IGNORE
<U042D> <CYR-IE>;<CYR-E>;<CAP>;IGNORE
diff --git a/localedata/uk_UA.in b/localedata/uk_UA.in
new file mode 100644
index 0000000..ff4d284
--- /dev/null
+++ b/localedata/uk_UA.in
@@ -0,0 +1,56 @@
+01010
+ÐÐÐÐÑÑ
+ÐÐÐÐÑÑ
+ÐÐÐÐÑÑ-10
+ÐÑÐÐÐ
+ÐÐÑÐÑÑÐÐÑ
+ÐÑÐÑÐ
+ÐÑÐÑÑÑ
+ÐÑÐÑÑÑ
+ÒÑÐÑÐ
+ÐÐÐÐÑÑÐÐÐÐ
+ÐÐÑÐÐÑÑ
+ÐÐÑÐÐÑÐÐ
+ÐÐÑ-ÐÐÑÐÐ
+ÐÐÑÐÐÑÐÐÑÑ
+ÐÐÑÐÑÐÑÑÐÐÐÐ
+ÐÐÑ-ÐÑÐÑÑÐÐÐÐ
+ÐÐÐÑÑÐÑÐÑÑÑÑ
+ÐÐÐÐÑÑÐÐÐ
+ÐÐÑÐÐÑ
+ÐÐÑÐÐÑ
+ÐÐÐÑÐÐ
+ÑÐÐÐÐÑÐÐ
+ÐÐÑÐÐÐ
+ÑÐÐÑÑÑÑ
+ÐÐÐÑ
+ÐÐÐÑ
+ÐÐÑÑ
+Ð
+Ñ
+Ñ
+Ð
+ÐÐÑÐÐÑÐ
+ÐÑÐÐÐÑÑ
+ÐÐÐÑÐÑ
+ÐÑÑÑÑÐÐÐÐ
+ÐÑÑÑÑÑ
+ÐÐÑÐÐÑ
+ÐÐÐÐÑÐ
+ÐÐ'ÑÐÐ
+ÐÐâÑÐÐ
+ÐÐÊÑÐÐ
+ÐÐÑÐÐ
+ÐÐÑÑ
+ÐÑÐÐÐ
+ÑÐÐÑÐ
+ÑÐÑÐÐÐ
+ÑÐÑÐÐÐÐÐ
+ÑÐÐÑÐÑÑÐÐÐ
+ÑÐÐÑÑ
+ÑÑÑÐÑÐÑÑ
+Ñ
+Ñ
+Ñ
+Ñ
+Ñ
--
2.1.0