This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
Re: [PATCH][BZ 17293] Fix sorting order for Ukrainian locale
- From: Andriy Rysin <arysin at gmail dot com>
- To: libc-locales at sourceware dot org
- Date: Tue, 10 Feb 2015 19:59:53 -0500
- Subject: Re: [PATCH][BZ 17293] Fix sorting order for Ukrainian locale
- Authentication-results: sourceware.org; auth=none
- References: <CAOZT1o3UbFDfbiHufL0gy7daz81EV+cGFtB_nHERpRtV9CjxrQ at mail dot gmail dot com>
Ok, I reread the guidelines again and here's the patch and changelog
separately inlined in the email.
Hope this works better.
Thanks
Andriy
2015-02-10 Andriy Rysin <arysin@gmail.com>
[BZ #17293]
uk_UA: Fix sorting order for Ukrainian locale
* localedata/Makefile:
* localedata/locales/uk_UA:
* localedata/uk_UA.in: New file.
>From 060dca0cabdd85de4ae2cbfad3e3027539987252 Mon Sep 17 00:00:00 2001
From: Andriy Rysin <arysin@gmail.com>
Date: Tue, 10 Feb 2015 19:46:56 -0500
Subject: [PATCH] Fix sorting order for Ukrainian locale: soft sign has its
own position and UKR-IE should follow CYR-IE; added collation tests for
Ukrainian locale symbols
---
localedata/Makefile | 4 +-
localedata/locales/uk_UA | 116 ++++++++++++++++++++++++-----------------------
localedata/uk_UA.in | 56 +++++++++++++++++++++++
3 files changed, 117 insertions(+), 59 deletions(-)
create mode 100644 localedata/uk_UA.in
diff --git a/localedata/Makefile b/localedata/Makefile
index d1218f5..83c7047 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon
tst-rpmatch tst-trans \
tst-ctype tst-langinfo tst-langinfo-static tst-numeric
test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
- si_LK.UTF-8
+ si_LK.UTF-8 uk_UA.UTF-8
test-input-data = $(addsuffix .in, $(basename $(test-input)))
test-output := $(foreach s, .out .xout, \
$(addsuffix $s, $(basename $(test-input))))
@@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8
en_US.ANSI_X3.4-1968 \
hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
- tr_TR.ISO-8859-9 en_GB.UTF-8
+ tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8
LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g')
CHARMAPS := $(shell echo "$(LOCALES)" | \
sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g)
diff --git a/localedata/locales/uk_UA b/localedata/locales/uk_UA
index d9194b8..2bd30eb 100644
--- a/localedata/locales/uk_UA
+++ b/localedata/locales/uk_UA
@@ -349,61 +349,63 @@ collating-symbol <UKR-GHE>
% Soft sign '<U044C>' may follow only this set of nine characters
[<U0432><U0434><U0437><U043B><U043D><U0440><U0441><U0442><U0446>].
% It only softens pronunciation of these characters so it's should not impact
% sorting.
-
-
-collating-symbol <V+SS>
-collating-element <V-SS> from "<U0412><U042C>"
-collating-element <V-ss> from "<U0412><U044C>"
-collating-element <v-SS> from "<U0432><U042C>"
-collating-element <v-ss> from "<U0432><U044C>"
-
-collating-symbol <D+SS>
-collating-element <D-SS> from "<U0414><U042C>"
-collating-element <D-ss> from "<U0414><U044C>"
-collating-element <d-SS> from "<U0434><U042C>"
-collating-element <d-ss> from "<U0434><U044C>"
-
-collating-symbol <Z+SS>
-collating-element <Z-SS> from "<U0417><U042C>"
-collating-element <Z-ss> from "<U0417><U044C>"
-collating-element <z-SS> from "<U0437><U042C>"
-collating-element <z-ss> from "<U0437><U044C>"
-
-collating-symbol <L+SS>
-collating-element <L-SS> from "<U041B><U042C>"
-collating-element <L-ss> from "<U041B><U044C>"
-collating-element <l-SS> from "<U043B><U042C>"
-collating-element <l-ss> from "<U043B><U044C>"
-
-collating-symbol <N+SS>
-collating-element <N-SS> from "<U041D><U042C>"
-collating-element <N-ss> from "<U041D><U044C>"
-collating-element <n-SS> from "<U043D><U042C>"
-collating-element <n-ss> from "<U043D><U044C>"
-
-collating-symbol <R+SS>
-collating-element <R-SS> from "<U0420><U042C>"
-collating-element <R-ss> from "<U0420><U044C>"
-collating-element <r-SS> from "<U0440><U042C>"
-collating-element <r-ss> from "<U0440><U044C>"
-
-collating-symbol <S+SS>
-collating-element <S-SS> from "<U0421><U042C>"
-collating-element <S-ss> from "<U0421><U044C>"
-collating-element <s-SS> from "<U0441><U042C>"
-collating-element <s-ss> from "<U0441><U044C>"
-
-collating-symbol <T+SS>
-collating-element <T-SS> from "<U0422><U042C>"
-collating-element <T-ss> from "<U0422><U044C>"
-collating-element <t-SS> from "<U0442><U042C>"
-collating-element <t-ss> from "<U0442><U044C>"
-
-collating-symbol <TSE+SS>
-collating-element <TS-SS> from "<U0426><U042C>"
-collating-element <TS-ss> from "<U0426><U044C>"
-collating-element <ts-SS> from "<U0446><U042C>"
-collating-element <ts-ss> from "<U0446><U044C>"
+%
+% Note: in the official alphabet the soft sign is a letter and has a
hard position in the order
+
+
+%collating-symbol <V+SS>
+%collating-element <V-SS> from "<U0412><U042C>"
+%collating-element <V-ss> from "<U0412><U044C>"
+%collating-element <v-SS> from "<U0432><U042C>"
+%collating-element <v-ss> from "<U0432><U044C>"
+%
+%collating-symbol <D+SS>
+%collating-element <D-SS> from "<U0414><U042C>"
+%collating-element <D-ss> from "<U0414><U044C>"
+%collating-element <d-SS> from "<U0434><U042C>"
+%collating-element <d-ss> from "<U0434><U044C>"
+%
+%collating-symbol <Z+SS>
+%collating-element <Z-SS> from "<U0417><U042C>"
+%collating-element <Z-ss> from "<U0417><U044C>"
+%collating-element <z-SS> from "<U0437><U042C>"
+%collating-element <z-ss> from "<U0437><U044C>"
+%
+%collating-symbol <L+SS>
+%collating-element <L-SS> from "<U041B><U042C>"
+%collating-element <L-ss> from "<U041B><U044C>"
+%collating-element <l-SS> from "<U043B><U042C>"
+%collating-element <l-ss> from "<U043B><U044C>"
+%
+%collating-symbol <N+SS>
+%collating-element <N-SS> from "<U041D><U042C>"
+%collating-element <N-ss> from "<U041D><U044C>"
+%collating-element <n-SS> from "<U043D><U042C>"
+%collating-element <n-ss> from "<U043D><U044C>"
+%
+%collating-symbol <R+SS>
+%collating-element <R-SS> from "<U0420><U042C>"
+%collating-element <R-ss> from "<U0420><U044C>"
+%collating-element <r-SS> from "<U0440><U042C>"
+%collating-element <r-ss> from "<U0440><U044C>"
+%
+%collating-symbol <S+SS>
+%collating-element <S-SS> from "<U0421><U042C>"
+%collating-element <S-ss> from "<U0421><U044C>"
+%collating-element <s-SS> from "<U0441><U042C>"
+%collating-element <s-ss> from "<U0441><U044C>"
+%
+%collating-symbol <T+SS>
+%collating-element <T-SS> from "<U0422><U042C>"
+%collating-element <T-ss> from "<U0422><U044C>"
+%collating-element <t-SS> from "<U0442><U042C>"
+%collating-element <t-ss> from "<U0442><U044C>"
+%
+%collating-symbol <TSE+SS>
+%collating-element <TS-SS> from "<U0426><U042C>"
+%collating-element <TS-ss> from "<U0426><U044C>"
+%collating-element <ts-SS> from "<U0446><U042C>"
+%collating-element <ts-ss> from "<U0446><U044C>"
collating-symbol <CAP-MIN>
@@ -489,11 +491,11 @@ reorder-after <U0434>
<U0455> "<U003C><U0043><U0059><U0052><U002D><U0044><U0045><U003E><U003C><U0043><U0059><U0052><U002D><U005A><U0045><U003E>";"<U003C><U004C><U0049><U0047><U003E><U003C><U004C><U0049><U0047><U003E>";"<U003C><U004D><U0049><U004E><U003E><U003C><U004D><U0049><U004E><U003E>";IGNORE
% CYR-DZE
reorder-after <U0435>
-<U0454> <CYR-IE>;<UKR-IE>;<MIN>;IGNORE
+%<U0454> <CYR-IE>;<UKR-IE>;<MIN>;IGNORE
<U0451> <CYR-IE>;<CYR-IO>;<MIN>;IGNORE
<U044D> <CYR-IE>;<CYR-E>;<MIN>;IGNORE
reorder-after <U0415>
-<U0404> <CYR-IE>;<UKR-IE>;<CAP>;IGNORE
+%<U0404> <CYR-IE>;<UKR-IE>;<CAP>;IGNORE
<U0401> <CYR-IE>;<CYR-IO>;<CAP>;IGNORE
<U042D> <CYR-IE>;<CYR-E>;<CAP>;IGNORE
diff --git a/localedata/uk_UA.in b/localedata/uk_UA.in
new file mode 100644
index 0000000..ff4d284
--- /dev/null
+++ b/localedata/uk_UA.in
@@ -0,0 +1,56 @@
+01010
+ÐÐÐÐÑÑ
+ÐÐÐÐÑÑ
+ÐÐÐÐÑÑ-10
+ÐÑÐÐÐ
+ÐÐÑÐÑÑÐÐÑ
+ÐÑÐÑÐ
+ÐÑÐÑÑÑ
+ÐÑÐÑÑÑ
+ÒÑÐÑÐ
+ÐÐÐÐÑÑÐÐÐÐ
+ÐÐÑÐÐÑÑ
+ÐÐÑÐÐÑÐÐ
+ÐÐÑ-ÐÐÑÐÐ
+ÐÐÑÐÐÑÐÐÑÑ
+ÐÐÑÐÑÐÑÑÐÐÐÐ
+ÐÐÑ-ÐÑÐÑÑÐÐÐÐ
+ÐÐÐÑÑÐÑÐÑÑÑÑ
+ÐÐÐÐÑÑÐÐÐ
+ÐÐÑÐÐÑ
+ÐÐÑÐÐÑ
+ÐÐÐÑÐÐ
+ÑÐÐÐÐÑÐÐ
+ÐÐÑÐÐÐ
+ÑÐÐÑÑÑÑ
+ÐÐÐÑ
+ÐÐÐÑ
+ÐÐÑÑ
+Ð
+Ñ
+Ñ
+Ð
+ÐÐÑÐÐÑÐ
+ÐÑÐÐÐÑÑ
+ÐÐÐÑÐÑ
+ÐÑÑÑÑÐÐÐÐ
+ÐÑÑÑÑÑ
+ÐÐÑÐÐÑ
+ÐÐÐÐÑÐ
+ÐÐ'ÑÐÐ
+ÐÐâÑÐÐ
+ÐÐÊÑÐÐ
+ÐÐÑÐÐ
+ÐÐÑÑ
+ÐÑÐÐÐ
+ÑÐÐÑÐ
+ÑÐÑÐÐÐ
+ÑÐÑÐÐÐÐÐ
+ÑÐÐÑÐÑÑÐÐÐ
+ÑÐÐÑÑ
+ÑÑÑÐÑÐÑÑ
+Ñ
+Ñ
+Ñ
+Ñ
+Ñ
--
2.1.0
2015-01-01 23:22 GMT-05:00 Andriy Rysin <arysin@gmail.com>:
> The sorting order for several characters was wrong in uk_UA locale.
> This patch fixes two problems:
> 1) soft sign position (it has its own in the alphabet and should not be ignored)
> 2) UKR-IE should follow CYR-IE (as they are separate letters and have
> their own positions)
>
> Collation order tests added.
>
> Unfortunately there's no official standard for collation for Ukrainian
> language in public access but this new order is confirmed to be used
> in official documents and dictionaries in Ukrainian.
> Some links:
> http://spelling.ulif.org.ua/peredmova.htm - Official spelling rules
> for Ukrainian (the alphabet is listed there and the only note is about
> apostrophe which should not affect the sorting)
> http://lcorp.ulif.org.ua/dictua/ - Ukrainian dictionaries from
> National Academy of Science use the sorting order that matches the one
> provided by the patch
>
> Also with this patch the order for soft sign and UKR-IE/CYR-IE match
> those in ICU which follows Unicode standard.
>
> Thanks
> Andriy