[PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.

Egmont Koblinger egmont@gmail.com
Sat Apr 16 08:50:00 GMT 2016


Hi guys,

This is about the sixth time I'm sending this patch to the list.

No response whatsoever so far -- I see you're in the middle of fixing
tons of locales right now, so I really do hope it's going to be
different this time.

The patch fixes a few bugs (as detailed in previous mails and the
bugzilla entry), and backs them up with by far the most extensive
unittest any locale definition has.

Due to the fixes being driven by the unittests, it would have required
tons of extra work to split to smaller changes (that are not being
tested individually), that is, to create intermediate, deliberately
somewhat broken definitions in addition to the correct one. I hope
it's not a problem, let me know if it is.

Please kindly review and apply,

Cheers,
Egmont


diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 541c34f..59320a1 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,10 @@
+2015-09-09  Egmont Koblinger  <egmont@gmail.com>
+
+    [BZ #18934]
+    * locales/hu_HU: Fix multiple collate bugs.
+    * hu_HU.in: New file.
+    * Makefile (test-input): Add hu_HU.UTF-8.
+
 2016-04-15  Mike Frysinger  <vapier@gentoo.org>

     [BZ #16374]
diff --git a/localedata/Makefile b/localedata/Makefile
index 4ecb192..7e62b7e 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon
tst-rpmatch tst-trans \
          tst-ctype tst-langinfo tst-langinfo-static tst-numeric
 test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
           hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
-          si_LK.UTF-8 uk_UA.UTF-8
+          si_LK.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
 test-input-data = $(addsuffix .in, $(basename $(test-input)))
 test-output := $(foreach s, .out .xout, \
              $(addsuffix $s, $(basename $(test-input))))
@@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8
en_US.ANSI_X3.4-1968 \
        hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
        nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
        zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
-       tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8
+       tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
 include ../gen-locales.mk
 endif

diff --git a/localedata/hu_HU.in b/localedata/hu_HU.in
new file mode 100644
index 0000000..4eb8eee
--- /dev/null
+++ b/localedata/hu_HU.in
@@ -0,0 +1,560 @@
+AkH-14-a1 acél          ; These tests are from:
+AkH-14-a1 cukor         ;
+AkH-14-a1 csók          ; A magyar helyesírás szabályai, 12. kiadás
+AkH-14-a1 gép           ; [The Rules of Hungarian Orthography, 12th edition]
+AkH-14-a1 hideg         ;
+AkH-14-a1 kettő         ; often referred to as akadémiai helyesírás
(AkH.) [academic orthography]
+AkH-14-a1 Nagy          ;
+AkH-14-a1 nyúl          ; http://helyesiras.mta.hu/helyesiras/default/akh12
+AkH-14-a1 olasz         ;
+AkH-14-a1 öröm          ; Alphabetical ordering described in #14-16.
+AkH-14-a1 remény
+AkH-14-a1 sokáig        ; #14-a1: Sort based on first letter.
+AkH-14-a1 szabad
+AkH-14-a1 Tamás
+AkH-14-a1 vásárol
+AkH-14-a2 jácint        ; #14-a2: If no other difference, lowercase
initial precedes uppercase.
+AkH-14-a2 Jácint
+AkH-14-a2 opera
+AkH-14-a2 Opera
+AkH-14-a2 szűcs
+AkH-14-a2 Szűcs
+AkH-14-a2 viola
+AkH-14-a2 Viola
+AkH-14-a3 cudar         ; #14-a3: Compound letters (cs, dz, dzs, gy,
ly, ny, sz, ty, zs)
+AkH-14-a3 cukor         ; are sorted separately, after their first letter:
+AkH-14-a3 cuppant       ; a b c cs d dz dzs e f g gy h ... l ly m n
ny o ... s sz t ty u ... z zs
+AkH-14-a3 csalit
+AkH-14-a3 csata
+AkH-14-a3 Csepel
+AkH-14-a3 Zoltán
+AkH-14-a3 zongora
+AkH-14-a3 zúdul
+AkH-14-a3 zsalu
+AkH-14-a3 zseni
+AkH-14-a3 Zsigmond
+AkH-14-b1 lom           ; #14-b1: The first difference matters.
+AkH-14-b1 lomb
+AkH-14-b1 lombik
+AkH-14-b1 Lontay
+AkH-14-b1 lovagol
+AkH-14-b1 pirinkó
+AkH-14-b1 pirinyó
+AkH-14-b1 pirít
+AkH-14-b1 pirkad
+AkH-14-b1 Piroska
+AkH-14-b1 tükör
+AkH-14-b1 Tünde
+AkH-14-b1 tünemény
+AkH-14-b1 tüntet
+AkH-14-b1 tüzér
+AkH-14-b2 kas           ; #14-b2: If a compound letter is pronounced
long, only the first letter
+AkH-14-b2 Kasmír        ; is duplicated in writing: <cs><cs> becomes
"ccs", <dzs><dzs> is "ddzs" etc.
+AkH-14-b2 Kassák        ; (unless it's at the boundary of a compound
word when it's written out twice).
+AkH-14-b2 kastély       ; Sort according to the actual tokens, not
the shorthand written form.
+AkH-14-b2 kasza         ; <k><a><sz><a>
+AkH-14-b2 kaszinó       ; <k><a><sz><i><n><ó>
+AkH-14-b2 kassza        ; <k><a><sz><sz><a>
+AkH-14-b2 kaszt         ; <k><a><sz><t>
+AkH-14-b2 mennek
+AkH-14-b2 mennének
+AkH-14-b2 menü
+AkH-14-b2 menza
+AkH-14-b2 meny          ; <m><e><ny>
+AkH-14-b2 Menyhért      ; <M><e><ny><h><é><r><t>
+AkH-14-b2 mennybolt     ; <m><e><ny><ny><b><o><l><t>
+AkH-14-b2 mennyi        ; <m><e><ny><ny><i>
+AkH-14-b2 nagy          ; <n><a><gy>
+AkH-14-b2 naggyá        ; <n><a><gy><gy><á>
+AkH-14-b2 nagygyakorlat ; <n><a><gy><gy><a><k><o><r><l><a><t>
(compound word: nagy+gyakorlat)
+AkH-14-b2 naggyal       ; <n><a><gy><gy><a><l>
+AkH-14-b2 nagyít        ; <n><a><gy><í><t>
+AkH-14-b2 nagyobb
+AkH-14-b2 nagyol
+AkH-14-b2 nagyoll
+AkH-14-c1 ír            ; #14-c1: Vowels collate equally in pairs:
a-á, e-é, i-í, o-ó, ö-ő, u-ú, ü-ű.
+AkH-14-c1 Irak
+AkH-14-c1 iram
+AkH-14-c1 Irán
+AkH-14-c1 írandó
+AkH-14-c1 iránt
+AkH-14-c1 író
+AkH-14-c1 iroda
+AkH-14-c1 irónia
+AkH-14-c2 Eger          ; #14-c2: Short vowel (unaccented, or with
diaeresis) comes first if that's the only difference.
+AkH-14-c2 egér
+AkH-14-c2 egyfelé
+AkH-14-c2 egyféle
+AkH-14-c2 elöl
+AkH-14-c2 elől
+AkH-14-c2 kerek
+AkH-14-c2 kerék
+AkH-14-c2 keres
+AkH-14-c2 kérés
+AkH-14-c2 koros
+AkH-14-c2 kóros
+AkH-14-c2 szel
+AkH-14-c2 szél
+AkH-14-c2 szeles
+AkH-14-c2 széles
+AkH-14-c2 szüret
+AkH-14-c2 szűret
+AkH-14-d1 kis részben   ; #14-d1: Spaces, hyphens are ignored.
+AkH-14-d1 kissé
+AkH-14-d1 Kiss Ernő
+AkH-14-d1 kis sorozat
+AkH-14-d1 kissorozat-gyártás
+AkH-14-d1 kis számban
+AkH-14-d1 kistányér
+AkH-14-d1 kis virág
+AkH-14-d1 márvány
+AkH-14-d1 márványkő
+AkH-14-d1 márvány sírkő
+AkH-14-d1 Márvány-tenger
+AkH-14-d1 márványtömb
+AkH-14-d1 Márvány Zsolt
+AkH-14-d1 másféle
+AkH-14-d1 másol
+AkH-14-d1 tiszafa
+AkH-14-d1 Tiszahát
+AkH-14-d1 Tisza Kálmán
+AkH-14-d1 Tisza menti
+AkH-14-d1 Tiszántúl
+AkH-14-d1 Tisza-part
+AkH-14-d1 tiszavirág
+AkH-14-d1 tiszt
+AkH-15 cérna            ; #15: Foreign accents are ignored, unless
they're the only difference,
+AkH-15 Černý            ; in which case they are sorted after the
Hungarian ones (in unspecified order).
+AkH-15 Champagne
+AkH-15 Cholnoky
+AkH-15 címez
+AkH-15 cukor
+AkH-15 Czuczor
+AkH-15 csapat
+AkH-15 Gaal
+AkH-15 galamb
+AkH-15 Gärtner
+AkH-15 gáz
+AkH-15 geodézia
+AkH-15 Georges
+AkH-15 góc
+AkH-15 Goethe
+AkH-15 moshat
+AkH-15 mosna
+AkH-15 Mošna
+AkH-15 mosópor
+AkH-15 Møsstrand
+AkH-15 mostan
+AkH-15 munka
+AkH-15 Muñoz
+alphabet a              ; These tests were created by egmont@gmail.com.
+alphabet á
+alphabet aa             ; a = á unless that's the only difference in
which case a < á.
+alphabet aá             ; (Same for e = é, i = í, o = ó, ö = ő, u =
ú, ü = ű below.)
+alphabet áa             ; Differences in accents matter from left to right.
+alphabet áá
+alphabet áp
+alphabet aq
+alphabet b
+alphabet c
+alphabet cz             ; <c><z>
+alphabet cs             ; <cs>        -- or rarely <c><s>, can't tell
for sure, assume <cs>.
+alphabet csc            ; <cs><c>
+alphabet ccs            ; <cs><cs>    -- or rarely <c><cs>, can't
tell for sure, assume <cs><cs>.
+alphabet cscs           ; <cs><cs>    -- Make sure ccs and cscs don't
collate as equal, see bug 13547.
+alphabet ccsa           ; <cs><cs><a>
+alphabet cscsa          ; <cs><cs><a> -- (These comments also apply
to all other compound letters below.)
+alphabet csd            ; <cs><d>
+alphabet d
+alphabet dz             ; <dz>
+alphabet dzd            ; <dz><d>
+alphabet ddz            ; <dz><dz>
+alphabet dzdz           ; <dz><dz>
+alphabet ddza           ; <dz><dz><a>
+alphabet dzdza          ; <dz><dz><a>
+alphabet dzdzs          ; <dz><dzs>
+alphabet dze            ; <dz><e>
+alphabet dzz            ; <dz><z>
+alphabet dzs            ; <dzs>
+alphabet dzsdz          ; <dzs><dz>
+alphabet ddzs           ; <dzs><dzs>
+alphabet dzsdzs         ; <dzs><dzs>
+alphabet ddzsa          ; <dzs><dzs><a>
+alphabet dzsdzsa        ; <dzs><dzs><a>
+alphabet dzse           ; <dzs><e>
+alphabet e
+alphabet é
+alphabet ee
+alphabet eé
+alphabet ée
+alphabet éé
+alphabet ép
+alphabet eq
+alphabet f
+alphabet g
+alphabet gz             ; <g><z>
+alphabet gy             ; <gy>
+alphabet gyg            ; <gy><g>
+alphabet ggy            ; <gy><gy>
+alphabet gygy           ; <gy><gy>
+alphabet ggya           ; <gy><gy><a>
+alphabet gygya          ; <gy><gy><a>
+alphabet gyh            ; <gy><h>
+alphabet h
+alphabet i
+alphabet í
+alphabet ii
+alphabet ií
+alphabet íi
+alphabet íí
+alphabet íp
+alphabet iq
+alphabet j
+alphabet k
+alphabet l
+alphabet lz             ; <l><z>
+alphabet ly             ; <ly>
+alphabet lyl            ; <ly><l>
+alphabet lly            ; <ly><ly>
+alphabet lyly           ; <ly><ly>
+alphabet llya           ; <ly><ly><a>
+alphabet lylya          ; <ly><ly><a>
+alphabet lym            ; <ly><m>
+alphabet m
+alphabet n
+alphabet nz             ; <n><z>
+alphabet ny             ; <ny>
+alphabet nyn            ; <ny><n>
+alphabet nny            ; <ny><ny>
+alphabet nyny           ; <ny><ny>
+alphabet nnya           ; <ny><ny><a>
+alphabet nynya          ; <ny><ny><a>
+alphabet nyo            ; <ny><o>
+alphabet o
+alphabet ó
+alphabet oo
+alphabet oó
+alphabet óo
+alphabet óó
+alphabet óp
+alphabet oq
+alphabet ö              ; ö = ő (unless that's the only difference),
but these come strictly after o and ó.
+alphabet ő
+alphabet öö
+alphabet öő
+alphabet őö
+alphabet őő
+alphabet őp
+alphabet öq
+alphabet p
+alphabet q
+alphabet r
+alphabet s
+alphabet sz             ; <sz>
+alphabet szs            ; <sz><s>
+alphabet ssz            ; <sz><sz>
+alphabet szsz           ; <sz><sz>
+alphabet ssza           ; <sz><sz><a>
+alphabet szsza          ; <sz><sz><a>
+alphabet szt            ; <sz><t>
+alphabet t
+alphabet tz             ; <t><z>
+alphabet ty             ; <ty>
+alphabet tyt            ; <ty><t>
+alphabet tty            ; <ty><ty>
+alphabet tyty           ; <ty><ty>
+alphabet ttya           ; <ty><ty><a>
+alphabet tytya          ; <ty><ty><a>
+alphabet tyu            ; <ty><u>
+alphabet u
+alphabet ú
+alphabet úp
+alphabet uq
+alphabet uu
+alphabet uú
+alphabet úu
+alphabet úú
+alphabet ü              ; ü = ű (unless that's the only difference),
but these come strictly after u and ú.
+alphabet ű
+alphabet űp
+alphabet üq
+alphabet üü
+alphabet üű
+alphabet űü
+alphabet űű
+alphabet v
+alphabet w
+alphabet x
+alphabet y
+alphabet z
+alphabet zz             ; <z><z>
+alphabet zs             ; <zs>
+alphabet zsz            ; <zs><z>
+alphabet zzs            ; <zs><zs>
+alphabet zszs           ; <zs><zs>
+alphabet zzsa           ; <zs><zs><a>
+alphabet zszsa          ; <zs><zs><a>
+case a                  ; #14-a2 specifies that if the same word
appears in lowercase as well as with
+case A                  ; uppercase initial, the lowercase one is to
be sorted first.
+case á                  ; Extend this to all other weird combinations
of upper- and lowercases.
+case Á
+case cs                 ; <cs>
+case cS
+case Cs
+case CS
+case ccs                ; <cs><cs>
+case ccS
+case cCs
+case cCS
+case Ccs
+case CcS
+case CCs
+case CCS
+case dz                 ; <dz>
+case dZ
+case Dz
+case DZ
+case ddz                ; <dz><dz>
+case ddZ
+case dDz
+case dDZ
+case Ddz
+case DdZ
+case DDz
+case DDZ
+case dzs                ; <dzs>
+case dzS
+case dZs
+case dZS
+case Dzs
+case DzS
+case DZs
+case DZS
+case ddzs               ; <dzs><dzs>
+case ddzS
+case ddZs
+case ddZS
+case dDzs
+case dDzS
+case dDZs
+case dDZS
+case Ddzs
+case DdzS
+case DdZs
+case DdZS
+case DDzs
+case DDzS
+case DDZs
+case DDZS
+case e
+case E
+case é
+case É
+case gy                 ; <gy>
+case gY
+case Gy
+case GY
+case ggy                ; <gy><gy>
+case ggY
+case gGy
+case gGY
+case Ggy
+case GgY
+case GGy
+case GGY
+case i
+case I
+case í
+case Í
+case ly                 ; <ly>
+case lY
+case Ly
+case LY
+case lly                ; <ly><ly>
+case llY
+case lLy
+case lLY
+case Lly
+case LlY
+case LLy
+case LLY
+case ny                 ; <ny>
+case nY
+case Ny
+case NY
+case nny                ; <ny><ny>
+case nnY
+case nNy
+case nNY
+case Nny
+case NnY
+case NNy
+case NNY
+case o
+case O
+case ó
+case Ó
+case ö
+case Ö
+case ő
+case Ő
+case sz                 ; <sz>
+case sZ
+case Sz
+case SZ
+case ssz                ; <sz><sz>
+case ssZ
+case sSz
+case sSZ
+case Ssz
+case SsZ
+case SSz
+case SSZ
+case ty                 ; <ty>
+case tY
+case Ty
+case TY
+case tty                ; <ty><ty>
+case ttY
+case tTy
+case tTY
+case Tty
+case TtY
+case TTy
+case TTY
+case u
+case U
+case ú
+case Ú
+case ü
+case Ü
+case ű
+case Ű
+case zs                 ; <zs>
+case zS
+case Zs
+case ZS
+case zzs                ; <zs><zs>
+case zzS
+case zZs
+case zZS
+case Zzs
+case ZzS
+case ZZs
+case ZZS
+foreign-a1 á            ; More thorough tests for foreign accents (#15).
+foreign-a1 à
+foreign-a1 àp
+foreign-a1 áq
+foreign-a2 á
+foreign-a2 â
+foreign-a2 âp
+foreign-a2 áq
+foreign-a3 á
+foreign-a3 ã
+foreign-a3 ãp
+foreign-a3 áq
+foreign-a4 á
+foreign-a4 ä
+foreign-a4 äp
+foreign-a4 áq
+foreign-a5 á
+foreign-a5 å
+foreign-a5 åp
+foreign-a5 áq
+foreign-a6 á
+foreign-a6 ă
+foreign-a6 ăp
+foreign-a6 áq
+foreign-c1 c
+foreign-c1 ç
+foreign-c1 çp
+foreign-c1 cq
+foreign-d1 d
+foreign-d1 đ
+foreign-d1 đp
+foreign-d1 dq
+foreign-e1 é
+foreign-e1 è
+foreign-e1 èp
+foreign-e1 éq
+foreign-e2 é
+foreign-e2 ê
+foreign-e2 êp
+foreign-e2 éq
+foreign-e3 é
+foreign-e3 ë
+foreign-e3 ëp
+foreign-e3 éq
+foreign-e4 é
+foreign-e4 ě
+foreign-e4 ěp
+foreign-e4 éq
+foreign-i1 í
+foreign-i1 ì
+foreign-i1 ìp
+foreign-i1 íq
+foreign-i2 í
+foreign-i2 î
+foreign-i2 îp
+foreign-i2 íq
+foreign-i3 í
+foreign-i3 ï
+foreign-i3 ïp
+foreign-i3 íq
+foreign-l1 l
+foreign-l1 ł
+foreign-l1 łp
+foreign-l1 lq
+foreign-n1 n
+foreign-n1 ñ
+foreign-n1 ñp
+foreign-n1 nq
+foreign-n2 n
+foreign-n2 ň
+foreign-n2 ňp
+foreign-n2 nq
+foreign-o1 ó            ; The rules are not explicit whether foreign
accents on top of o or u
+foreign-o1 ò            ; should be sorted among o-ó and u-ú, or
among ö-ő and ü-ű,
+foreign-o1 òp           ; but the example with Møsstrand makes it
clear that it's the former.
+foreign-o1 óq
+foreign-o2 ó
+foreign-o2 ô
+foreign-o2 ôp
+foreign-o2 óq
+foreign-o3 ó
+foreign-o3 õ
+foreign-o3 õp
+foreign-o3 óq
+foreign-o4 ó
+foreign-o4 ø
+foreign-o4 øp
+foreign-o4 óq
+foreign-r1 r
+foreign-r1 ř
+foreign-r1 řp
+foreign-r1 rq
+foreign-s1 s
+foreign-s1 š
+foreign-s1 šp
+foreign-s1 sq
+foreign-u1 ú
+foreign-u1 ù
+foreign-u1 ùp
+foreign-u1 úq
+foreign-u2 ú
+foreign-u2 û
+foreign-u2 ûp
+foreign-u2 úq
+foreign-u3 ú
+foreign-u3 ũ
+foreign-u3 ũp
+foreign-u3 úq
+foreign-u4 ú
+foreign-u4 ů
+foreign-u4 ůp
+foreign-u4 úq
+foreign-y1 y
+foreign-y1 ÿ
+foreign-y1 ÿp
+foreign-y1 yq
diff --git a/localedata/locales/hu_HU b/localedata/locales/hu_HU
index d76226d..8d1d95b 100644
--- a/localedata/locales/hu_HU
+++ b/localedata/locales/hu_HU
@@ -64,6 +64,7 @@ category "i18n:2012";LC_MEASUREMENT
 END LC_IDENTIFICATION

 LC_COLLATE
+define DIACRIT_FORWARD
 copy "iso14651_t1"

 %% a b c cs d dz dzs e f g gy h i j k l ly m n ny o o: p q
@@ -77,15 +78,18 @@ copy "iso14651_t1"
 %% dzs+dzs becomes ddzs, and so on.
 %% However, c+cs is also spelled as ccs, you need to speak
 %% the language to tell which one is the case.
-%% Tokenize ccs as <c_or_cs><cs>, and sort the tokens as
-%% a b c c_or_cs cs d... This effectively assumes cs+cs
-%% which is more frequent than c+cs, but guarantees that the
-%% strings ccs and cscs don't collate as equal.
+%% Tokenize ccs as <cs><cs> since this is much more frequent
+%% than <c><cs>, but apply SINGLE-OR-COMPOUND and COMPOUND
+%% to the tokens so that the strings ccs and cscs don't collate
+%% as equal.
+%% The same goes for all other compound consonants.

 collating-symbol  <odouble>
 collating-symbol  <udouble>

-collating-symbol  <c_or_cs>
+collating-symbol  <SINGLE-OR-COMPOUND>
+collating-symbol  <COMPOUND>
+
 collating-symbol  <cs>
 collating-element <C-S> from "<U0043><U0053>"
 collating-element <C-s> from "<U0043><U0073>"
@@ -100,7 +104,6 @@ collating-element <c-C-s> from "<U0063><U0043><U0073>"
 collating-element <c-c-S> from "<U0063><U0063><U0053>"
 collating-element <c-c-s> from "<U0063><U0063><U0073>"

-collating-symbol  <d_or_dz>
 collating-symbol  <dz>
 collating-element <D-Z> from "<U0044><U005A>"
 collating-element <D-z> from "<U0044><U007A>"
@@ -115,7 +118,6 @@ collating-element <d-D-z> from "<U0064><U0044><U007A>"
 collating-element <d-d-Z> from "<U0064><U0064><U005A>"
 collating-element <d-d-z> from "<U0064><U0064><U007A>"

-collating-symbol  <d_or_dzs>
 collating-symbol  <dzs>
 collating-element <D-Z-S> from "<U0044><U005A><U0053>"
 collating-element <D-Z-s> from "<U0044><U005A><U0073>"
@@ -142,7 +144,6 @@ collating-element <d-d-Z-s> from
"<U0064><U0064><U005A><U0073>"
 collating-element <d-d-z-S> from "<U0064><U0064><U007A><U0053>"
 collating-element <d-d-z-s> from "<U0064><U0064><U007A><U0073>"

-collating-symbol  <g_or_gy>
 collating-symbol  <gy>
 collating-element <G-Y> from "<U0047><U0059>"
 collating-element <G-y> from "<U0047><U0079>"
@@ -157,7 +158,6 @@ collating-element <g-G-y> from "<U0067><U0047><U0079>"
 collating-element <g-g-Y> from "<U0067><U0067><U0059>"
 collating-element <g-g-y> from "<U0067><U0067><U0079>"

-collating-symbol  <l_or_ly>
 collating-symbol  <ly>
 collating-element <L-Y> from "<U004C><U0059>"
 collating-element <L-y> from "<U004C><U0079>"
@@ -172,7 +172,6 @@ collating-element <l-L-y> from "<U006C><U004C><U0079>"
 collating-element <l-l-Y> from "<U006C><U006C><U0059>"
 collating-element <l-l-y> from "<U006C><U006C><U0079>"

-collating-symbol  <n_or_ny>
 collating-symbol  <ny>
 collating-element <N-Y> from "<U004E><U0059>"
 collating-element <N-y> from "<U004E><U0079>"
@@ -187,7 +186,6 @@ collating-element <n-N-y> from "<U006E><U004E><U0079>"
 collating-element <n-n-Y> from "<U006E><U006E><U0059>"
 collating-element <n-n-y> from "<U006E><U006E><U0079>"

-collating-symbol  <s_or_sz>
 collating-symbol  <sz>
 collating-element <S-Z> from "<U0053><U005A>"
 collating-element <S-z> from "<U0053><U007A>"
@@ -202,7 +200,6 @@ collating-element <s-S-z> from "<U0073><U0053><U007A>"
 collating-element <s-s-Z> from "<U0073><U0073><U005A>"
 collating-element <s-s-z> from "<U0073><U0073><U007A>"

-collating-symbol  <t_or_ty>
 collating-symbol  <ty>
 collating-element <T-Y> from "<U0054><U0059>"
 collating-element <T-y> from "<U0054><U0079>"
@@ -217,7 +214,6 @@ collating-element <t-T-y> from "<U0074><U0054><U0079>"
 collating-element <t-t-Y> from "<U0074><U0074><U0059>"
 collating-element <t-t-y> from "<U0074><U0074><U0079>"

-collating-symbol  <z_or_zs>
 collating-symbol  <zs>
 collating-element <Z-S> from "<U005A><U0053>"
 collating-element <Z-s> from "<U005A><U0073>"
@@ -232,8 +228,10 @@ collating-element <z-Z-s> from "<U007A><U005A><U0073>"
 collating-element <z-z-S> from "<U007A><U007A><U0053>"
 collating-element <z-z-s> from "<U007A><U007A><U0073>"

+collating-symbol <CAP-CAP>
 collating-symbol <CAP-MIN>
 collating-symbol <MIN-CAP>
+collating-symbol <MIN-MIN>
 collating-symbol <CAP-CAP-CAP>
 collating-symbol <CAP-CAP-MIN>
 collating-symbol <CAP-MIN-CAP>
@@ -244,6 +242,7 @@ collating-symbol <MIN-MIN-CAP>
 collating-symbol <MIN-MIN-MIN>

 reorder-after <MIN>
+<MIN-MIN>
 <MIN-CAP>
 <MIN-MIN-MIN>
 <MIN-MIN-CAP>
@@ -252,42 +251,38 @@ reorder-after <MIN>

 reorder-after <CAP>
 <CAP-MIN>
+<CAP-CAP>
 <CAP-MIN-MIN>
 <CAP-MIN-CAP>
 <CAP-CAP-MIN>
 <CAP-CAP-CAP>

 reorder-after <c>
-<c_or_cs>
 <cs>
 reorder-after <d>
-<d_or_dz>
-<d_or_dzs>
 <dz>
 <dzs>
 reorder-after <g>
-<g_or_gy>
 <gy>
 reorder-after <l>
-<l_or_ly>
 <ly>
 reorder-after <n>
-<n_or_ny>
 <ny>
 reorder-after <o>
 <odouble>
 reorder-after <s>
-<s_or_sz>
 <sz>
 reorder-after <t>
-<t_or_ty>
 <ty>
 reorder-after <u>
 <udouble>
 reorder-after <z>
-<z_or_zs>
 <zs>

+reorder-after <BAS>
+<SINGLE-OR-COMPOUND>
+<COMPOUND>
+
 reorder-after <o>
 <U00F6>    <odouble>;<REU>;<MIN>;IGNORE
 <U0151>    <odouble>;<DAC>;<MIN>;IGNORE
@@ -300,152 +295,157 @@ reorder-after <u>
 <U00DC>    <udouble>;<REU>;<CAP>;IGNORE
 <U0170>    <udouble>;<DAC>;<CAP>;IGNORE

+reorder-after <BAS>
+<ACA>
+<REU>
+<DAC>
+
 reorder-after <U0043>
-<C-S>        <cs>;<BAS>;<CAP>;IGNORE
-<C-s>        <cs>;<BAS>;<CAP-MIN>;IGNORE
-<C-C-S>        "<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<C-C-s>        "<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<C-c-S>        "<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<C-c-s>        "<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<C-S>        <cs>;<COMPOUND>;<CAP-CAP>;IGNORE
+<C-s>        <cs>;<COMPOUND>;<CAP-MIN>;IGNORE
+<C-C-S>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<C-C-s>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<C-c-S>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<C-c-s>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0063>
-<c-S>        <cs>;<BAS>;<MIN-CAP>;IGNORE
-<c-s>        <cs>;<BAS>;<MIN>;IGNORE
-<c-C-S>        "<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<c-C-s>        "<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<c-c-S>        "<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<c-c-s>        "<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<c-S>        <cs>;<COMPOUND>;<MIN-CAP>;IGNORE
+<c-s>        <cs>;<COMPOUND>;<MIN-MIN>;IGNORE
+<c-C-S>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<c-C-s>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<c-c-S>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<c-c-s>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U0044>
-<D-Z>        <dz>;<BAS>;<CAP>;IGNORE
-<D-z>        <dz>;<BAS>;<CAP-MIN>;IGNORE
-<D-D-Z>        "<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<D-D-z>        "<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<D-d-Z>        "<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<D-d-z>        "<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<D-Z>        <dz>;<COMPOUND>;<CAP-CAP>;IGNORE
+<D-z>        <dz>;<COMPOUND>;<CAP-MIN>;IGNORE
+<D-D-Z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<D-D-z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<D-d-Z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<D-d-z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0064>
-<d-Z>        <dz>;<BAS>;<MIN-CAP>;IGNORE
-<d-z>        <dz>;<BAS>;<MIN>;IGNORE
-<d-D-Z>        "<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<d-D-z>        "<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<d-d-Z>        "<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<d-d-z>        "<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<d-Z>        <dz>;<COMPOUND>;<MIN-CAP>;IGNORE
+<d-z>        <dz>;<COMPOUND>;<MIN-MIN>;IGNORE
+<d-D-Z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<d-D-z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<d-d-Z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<d-d-z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U0044>
-<D-Z-S>        <dzs>;<BAS>;<CAP-CAP-CAP>;IGNORE
-<D-Z-s>        <dzs>;<BAS>;<CAP-CAP-MIN>;IGNORE
-<D-z-S>        <dzs>;<BAS>;<CAP-MIN-CAP>;IGNORE
-<D-z-s>        <dzs>;<BAS>;<CAP-MIN-MIN>;IGNORE
-<D-D-Z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
-<D-D-Z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
-<D-D-z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
-<D-D-z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
-<D-d-Z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
-<D-d-Z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
-<D-d-z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
-<D-d-z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
+<D-Z-S>        <dzs>;<COMPOUND>;<CAP-CAP-CAP>;IGNORE
+<D-Z-s>        <dzs>;<COMPOUND>;<CAP-CAP-MIN>;IGNORE
+<D-z-S>        <dzs>;<COMPOUND>;<CAP-MIN-CAP>;IGNORE
+<D-z-s>        <dzs>;<COMPOUND>;<CAP-MIN-MIN>;IGNORE
+<D-D-Z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-CAP>";IGNORE
+<D-D-Z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-MIN>";IGNORE
+<D-D-z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-CAP>";IGNORE
+<D-D-z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-MIN>";IGNORE
+<D-d-Z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-CAP>";IGNORE
+<D-d-Z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-MIN>";IGNORE
+<D-d-z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-CAP>";IGNORE
+<D-d-z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-MIN>";IGNORE
 reorder-after <U0064>
-<d-Z-S>        <dzs>;<BAS>;<MIN-CAP-CAP>;IGNORE
-<d-Z-s>        <dzs>;<BAS>;<MIN-CAP-MIN>;IGNORE
-<d-z-S>        <dzs>;<BAS>;<MIN-MIN-CAP>;IGNORE
-<d-z-s>        <dzs>;<BAS>;<MIN-MIN-MIN>;IGNORE
-<d-D-Z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
-<d-D-Z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
-<d-D-z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
-<d-D-z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
-<d-d-Z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
-<d-d-Z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
-<d-d-z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
-<d-d-z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
+<d-Z-S>        <dzs>;<COMPOUND>;<MIN-CAP-CAP>;IGNORE
+<d-Z-s>        <dzs>;<COMPOUND>;<MIN-CAP-MIN>;IGNORE
+<d-z-S>        <dzs>;<COMPOUND>;<MIN-MIN-CAP>;IGNORE
+<d-z-s>        <dzs>;<COMPOUND>;<MIN-MIN-MIN>;IGNORE
+<d-D-Z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-CAP>";IGNORE
+<d-D-Z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-MIN>";IGNORE
+<d-D-z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-CAP>";IGNORE
+<d-D-z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-MIN>";IGNORE
+<d-d-Z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-CAP>";IGNORE
+<d-d-Z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-MIN>";IGNORE
+<d-d-z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-CAP>";IGNORE
+<d-d-z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-MIN>";IGNORE

 reorder-after <U0047>
-<G-Y>        <gy>;<BAS>;<CAP>;IGNORE
-<G-y>        <gy>;<BAS>;<CAP-MIN>;IGNORE
-<G-G-Y>        "<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<G-G-y>        "<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<G-g-Y>        "<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<G-g-y>        "<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<G-Y>        <gy>;<COMPOUND>;<CAP-CAP>;IGNORE
+<G-y>        <gy>;<COMPOUND>;<CAP-MIN>;IGNORE
+<G-G-Y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<G-G-y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<G-g-Y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<G-g-y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0067>
-<g-y>        <gy>;<BAS>;<MIN>;IGNORE
-<g-Y>        <gy>;<BAS>;<MIN-CAP>;IGNORE
-<g-G-Y>        "<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<g-G-y>        "<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<g-g-Y>        "<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<g-g-y>        "<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<g-Y>        <gy>;<COMPOUND>;<MIN-CAP>;IGNORE
+<g-y>        <gy>;<COMPOUND>;<MIN-MIN>;IGNORE
+<g-G-Y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<g-G-y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<g-g-Y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<g-g-y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U004C>
-<L-Y>        <ly>;<BAS>;<CAP>;IGNORE
-<L-y>        <ly>;<BAS>;<CAP-MIN>;IGNORE
-<L-L-Y>        "<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<L-L-y>        "<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<L-l-Y>        "<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<L-l-y>        "<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<L-Y>        <ly>;<COMPOUND>;<CAP-CAP>;IGNORE
+<L-y>        <ly>;<COMPOUND>;<CAP-MIN>;IGNORE
+<L-L-Y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<L-L-y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<L-l-Y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<L-l-y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U006C>
-<l-y>        <ly>;<BAS>;<MIN>;IGNORE
-<l-Y>        <ly>;<BAS>;<MIN-CAP>;IGNORE
-<l-L-Y>        "<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<l-L-y>        "<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<l-l-Y>        "<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<l-l-y>        "<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<l-Y>        <ly>;<COMPOUND>;<MIN-CAP>;IGNORE
+<l-y>        <ly>;<COMPOUND>;<MIN-MIN>;IGNORE
+<l-L-Y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<l-L-y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<l-l-Y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<l-l-y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U004E>
-<N-Y>        <ny>;<BAS>;<CAP>;IGNORE
-<N-y>        <ny>;<BAS>;<CAP-MIN>;IGNORE
-<N-N-Y>        "<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<N-N-y>        "<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<N-n-Y>        "<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<N-n-y>        "<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<N-Y>        <ny>;<COMPOUND>;<CAP-CAP>;IGNORE
+<N-y>        <ny>;<COMPOUND>;<CAP-MIN>;IGNORE
+<N-N-Y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<N-N-y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<N-n-Y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<N-n-y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U006E>
-<n-y>        <ny>;<BAS>;<MIN>;IGNORE
-<n-Y>        <ny>;<BAS>;<MIN-CAP>;IGNORE
-<n-N-Y>        "<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<n-N-y>        "<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<n-n-Y>        "<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<n-n-y>        "<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<n-Y>        <ny>;<COMPOUND>;<MIN-CAP>;IGNORE
+<n-y>        <ny>;<COMPOUND>;<MIN-MIN>;IGNORE
+<n-N-Y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<n-N-y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<n-n-Y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<n-n-y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U0053>
-<S-Z>        <sz>;<BAS>;<CAP>;IGNORE
-<S-z>        <sz>;<BAS>;<CAP-MIN>;IGNORE
-<S-S-Z>        "<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<S-S-z>        "<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<S-s-Z>        "<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<S-s-z>        "<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<S-Z>        <sz>;<COMPOUND>;<CAP-CAP>;IGNORE
+<S-z>        <sz>;<COMPOUND>;<CAP-MIN>;IGNORE
+<S-S-Z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<S-S-z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<S-s-Z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<S-s-z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0073>
-<s-Z>        <sz>;<BAS>;<MIN-CAP>;IGNORE
-<s-z>        <sz>;<BAS>;<MIN>;IGNORE
-<s-S-Z>        "<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<s-S-z>        "<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<s-s-Z>        "<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<s-s-z>        "<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<s-Z>        <sz>;<COMPOUND>;<MIN-CAP>;IGNORE
+<s-z>        <sz>;<COMPOUND>;<MIN-MIN>;IGNORE
+<s-S-Z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<s-S-z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<s-s-Z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<s-s-z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U0054>
-<T-Y>        <ty>;<BAS>;<CAP>;IGNORE
-<T-y>        <ty>;<BAS>;<CAP-MIN>;IGNORE
-<T-T-Y>        "<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<T-T-y>        "<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<T-t-Y>        "<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<T-t-y>        "<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<T-Y>        <ty>;<COMPOUND>;<CAP-CAP>;IGNORE
+<T-y>        <ty>;<COMPOUND>;<CAP-MIN>;IGNORE
+<T-T-Y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<T-T-y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<T-t-Y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<T-t-y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0074>
-<t-Y>        <ty>;<BAS>;<MIN-CAP>;IGNORE
-<t-y>        <ty>;<BAS>;<MIN>;IGNORE
-<t-T-Y>        "<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<t-T-y>        "<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<t-t-Y>        "<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<t-t-y>        "<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<t-Y>        <ty>;<COMPOUND>;<MIN-CAP>;IGNORE
+<t-y>        <ty>;<COMPOUND>;<MIN-MIN>;IGNORE
+<t-T-Y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<t-T-y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<t-t-Y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<t-t-y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U005A>
-<Z-S>        <zs>;<BAS>;<CAP>;IGNORE
-<Z-s>        <zs>;<BAS>;<CAP-MIN>;IGNORE
-<Z-Z-S>        "<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<Z-Z-s>        "<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<Z-z-S>        "<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<Z-z-s>        "<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<Z-S>        <zs>;<COMPOUND>;<CAP-CAP>;IGNORE
+<Z-s>        <zs>;<COMPOUND>;<CAP-MIN>;IGNORE
+<Z-Z-S>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<Z-Z-s>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<Z-z-S>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<Z-z-s>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U007A>
-<z-S>        <zs>;<BAS>;<MIN-CAP>;IGNORE
-<z-s>        <zs>;<BAS>;<MIN>;IGNORE
-<z-Z-S>        "<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<z-Z-s>        "<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<z-z-S>        "<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<z-z-s>        "<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<z-S>        <zs>;<COMPOUND>;<MIN-CAP>;IGNORE
+<z-s>        <zs>;<COMPOUND>;<MIN-MIN>;IGNORE
+<z-Z-S>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<z-Z-s>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<z-z-S>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<z-z-s>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-end



On Thu, Jan 14, 2016 at 1:53 PM, Egmont Koblinger <egmont@gmail.com> wrote:
> Hi,
>
> Friendly ping...
>
> Is there anything I could do to help this patch get accepted?
>
> Regards,
> egmont
>
> On Sun, Nov 15, 2015 at 10:34 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>> Hi,
>>
>> Friendly ping... what's going on with this one?
>>
>> I was the guy making the last few changes to this locale (even an
>> unfortunate regression), and now I also add the most extensive
>> unittesting any locale has (protecting against such regressions now or
>> in the future), so without even looking at this patch I guess you
>> should be quite confident that the patch only makes things better, not
>> worse.
>>
>> Would it help if I broke it down to like 4 or 5 small patches on top
>> of each other, and added the unittests in the last step?
>>
>> thanks,
>> egmont
>>
>>
>> On Mon, Oct 26, 2015 at 4:24 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>>> Hello,
>>>
>>> Friendly ping - could you please take a look at this patch (version 5)?
>>>
>>> Is there anything I can help you with?
>>>
>>> Thanks,
>>> egmont
>>>
>>> On Wed, Oct 14, 2015 at 12:36 AM, Egmont Koblinger <egmont@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Please use the patch I attach now to this mail, not to the previous
>>>> one. Sorry for the confusion!
>>>>
>>>> I checked the previous patch many times, yet I missed something that
>>>> I've just discovered after sending the previous mail. I forgot one of
>>>> the compound letters from the unittest.
>>>>
>>>> The only change from the previous patch is the addition of these few
>>>> more lines in the unittest, so it has an even better coverage. The
>>>> patch to the locale definiton is unchanged.
>>>>
>>>> I've re-run the test and of course it still passes :)
>>>>
>>>> Thanks,
>>>> egmont
>>>>
>>>> On Tue, Oct 13, 2015 at 11:56 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Could you please review and apply the attached patch?
>>>>>
>>>>> Recommended commit message body (feel free to edit as you please):
>>>>> -----
>>>>> Fix sorting of long consonants, a regression introduced by #13547. Fix
>>>>> inconsistencies in uppercase vs. lowercase sorting. Fix diacritic
>>>>> ordering. Fix ordering of foreign accents.
>>>>>
>>>>> Add an extensive test file.
>>>>>
>>>>>     [BZ #18934]
>>>>>     * locales/hu_HU: Fix multiple bugs.
>>>>>     * hu_HU.in: New file.
>>>>>     * Makefile (test-input): Add hu_HU.UTF-8.
>>>>> -----
>>>>>
>>>>> I know that generally one patch per issue is a cleaner approach, but
>>>>> this time apologize for an all-in-one: the patches would heavily
>>>>> conflict, and it would be really cumbersome to unittest an incremental
>>>>> series. Instead, think about it as TDD (test driven development): I
>>>>> attach a decent unittest with explanations and pointers to the rules,
>>>>> and a locale definition that implements them.
>>>>>
>>>>> The addressed bugs are:
>>>>>
>>>>> - The fix to bug 13547 was incorrect and introduced a regression. It
>>>>> fixed a corner case, whereas I didn't realize it broke a more typical
>>>>> once. See details over there.
>>>>>
>>>>> - Two minor bugs/inconsistencies wrt. sorting upper/lowercase values,
>>>>> as described in bug 18587.
>>>>>
>>>>> - Someone enabled backwards ordering of diacrits by default (bug
>>>>> 17750), breaking tons of locales including Hungarian. So disable
>>>>> backwards ordering in this locale definition.
>>>>>
>>>>> - Foreign accents should be sorted after the native Hungarian ones, it
>>>>> wasn't the case so far.
>>>>>
>>>>> Plus, a unittest is added which is far more extensive than any other
>>>>> locale has. It includes all the examples from the official rules of
>>>>> Hungarian orthography's corresponding sections, as well as thorough
>>>>> testing of all corner cases I could think of, created by me; and
>>>>> comments all around.
>>>>>
>>>>> In addition to fixing a(n unfortunately relatively unsignificant)
>>>>> locale, I hope that this unittest file will encourage other locale
>>>>> maintainers to create similarly extensive tests, increasing the
>>>>> quality of other locales in the long run and preventing regressions
>>>>> (such as the backward diacritics ordering) from sneaking in.
>>>>>
>>>>> Thanks a lot,
>>>>> egmont



More information about the Libc-locales mailing list