This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).


On 07/20/2018 03:19 PM, Florian Weimer wrote:
> On 07/20/2018 08:49 PM, Carlos O'Donell wrote:
>> On 07/19/2018 04:39 PM, Florian Weimer wrote:
>>> On 07/19/2018 09:43 PM, Carlos O'Donell wrote:
>>>> * Add back tests to tst-fnmatch.input and tst-regexloc.c which
>>>> exercise that [a-z] does not match A or Z.
>>>
>>> [a-z] still matches ñ, 𝚗, but not 𝚣, which I doubt is useful.
>>
>> Sorry, I don't follow, it absolutely matches ASCII z.
> 
> The z I wrote above is one of the non-BMP math characters.

Thanks :-}

It was a conservative solution.

>> We deinterlace the collation element ordering (not sequence) to get
>> the right range expression resolution.
>>
>> See the added fnmatch tests:
>>
>> +en_US.UTF-8     "a"                    "[a-z]"                0
>> +en_US.UTF-8     "z"                    "[a-z]"                0
>> +en_US.UTF-8     "A"                    "[a-z]"                NOMATCH
>> +en_US.UTF-8     "Z"                    "[a-z]"                NOMATCH
>> +en_US.UTF-8     "a"                    "[A-Z]"                NOMATCH
>> +en_US.UTF-8     "z"                    "[A-Z]"                NOMATCH
>> +en_US.UTF-8     "A"                    "[A-Z]"                0
>> +en_US.UTF-8     "Z"                    "[A-Z]"                0
>> +en_US.UTF-8     "0"                    "[0-9]"                0
>> +en_US.UTF-8     "9"                    "[0-9]"                0
>>
>> [a-z] matches a-z (including z), *and* all the lowercase inbetween,
>> and so behaves like :lower: effectively.
> 
> There are characters equivalent to ASCII z (like the z above), but
> which sort after z, so they are not matched.  This is one reason why
> I think this is a bad idea: it looks like [:lower:], but it's not.
> Same for [0-9], I assume.

Again, conservatively, this is how it worked before, and now works again
the same, but retains the improvement of ISO 14651 data being added.
 
>>> It's an improvement, and it may be good enough for glibc 2.28, but I would
>>> rather see us implement the rational ranges interpretation.
>>
>> That requires all ranges behave rationally?
>>
>> We could fix a-z, A-Z, and 0-9 easily.
>>
>> Patch attached.
> 
> (NB: Patch is relative to the previous patch.)
> 
> My enumeration tester likes it much more. 8-)

It was designed exactly for your enumerator ;-)

>   actual:   "abcdefghijklmnopqrstuvwxyz"
>   actual:   "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>   actual:   "0123456789"
> 
> That's for [a-z], [A-Z], [0-9], in en_US.UTF-8 and de_DE.ISO-8859-1. However, I still get this:
> 
> tst-regex-classes.script:85:0: result character set difference in locale tr_TR.ISO-8859-9
> enumerate_chars '[a-z]' "abcdefghijklmnopqrstuvwxyz";
> ^
>   expected: "abcdefghijklmnopqrstuvwxyz"
>   actual:   "abcdefghjklmnopqrstuvwxyz"
>
> tst-regex-classes.script:86:0: result character set difference in locale tr_TR.ISO-8859-9
> enumerate_chars '[A-Z]' "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> ^
>   expected: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>   actual:   "ABCDEFGHJKLMNOPQRSTUVWXYZ"
> error: 2 test failures
> 
> Can you fix this with data-only changes, too?

Yes, I need to duplicate the rational range for A-Z in tr_TR and
remove 'i' since it's just fine the way it is, the existing

New patch attached with additional tests in tst-fnmatch.input to
test tr_TR.UTF-8, and ISO-8859-9.

Noticed equivalence class issues and filed a bug and added an XFAIL-ish
test case in test-fnmatch.input:
https://sourceware.org/bugzilla/show_bug.cgi?id=23437

> posix/bug-regex17 regresses as well in the test for bug 9697, but I
> can incorporate that into my enumeration tester.  I don't think the
> bug is actually regressing, it's just that the test objective is not
> expressed properly in it.

Fixed.

> 
> posix/tst-rxspencer fails as well, presumably due to this:
> 
> UTF-8 aA FAIL regcomp failed: Invalid range end
> UTF-8 aAcC FAIL regcomp failed: Invalid range end
> 
> I think this happens because the test blindly replaces ASCII
> characters with non-ASCII characters, which causes issues if they are
> not ordered as expected.

Fixed.

v2
- Fixed tr_TR by duplicating A-Z rational range.
- Fixed tst-rxspender.
- Fixed bug-regex17.

Tell me how the new version does.

-- 
Cheers,
Carlos.
diff --git a/localedata/locales/iso14651_t1_common b/localedata/locales/iso14651_t1_common
index 227400cc4e..7248074a8b 100644
--- a/localedata/locales/iso14651_t1_common
+++ b/localedata/locales/iso14651_t1_common
@@ -63177,7 +63177,19 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U20BC> <S20BC>;<BASE>;<MIN>;<U20BC> % MANAT SIGN
 <U20BD> <S20BD>;<BASE>;<MIN>;<U20BD> % RUBLE SIGN
 <U20BE> <S20BE>;<BASE>;<MIN>;<U20BE> % LARI SIGN
+% Implement rational range for [0-9] in regular expressions.
+% We order the collation element order to support rational ranges.
+% Collation is unaffected because the 4-level weights remain the same.
 <U0030> <S0030>;<BASE>;<MIN>;<U0030> % DIGIT ZERO
+<U0031> <S0031>;<BASE>;<MIN>;<U0031> % DIGIT ONE
+<U0032> <S0032>;<BASE>;<MIN>;<U0032> % DIGIT TWO
+<U0033> <S0033>;<BASE>;<MIN>;<U0033> % DIGIT THREE
+<U0034> <S0034>;<BASE>;<MIN>;<U0034> % DIGIT FOUR
+<U0035> <S0035>;<BASE>;<MIN>;<U0035> % DIGIT FIVE
+<U0036> <S0036>;<BASE>;<MIN>;<U0036> % DIGIT SIX
+<U0037> <S0037>;<BASE>;<MIN>;<U0037> % DIGIT SEVEN
+<U0038> <S0038>;<BASE>;<MIN>;<U0038> % DIGIT EIGHT
+<U0039> <S0039>;<BASE>;<MIN>;<U0039> % DIGIT NINE
 <U0660> <S0030>;<BASE>;<MIN>;<U0660> % ARABIC-INDIC DIGIT ZERO
 <U06F0> <S0030>;<BASE>;<MIN>;<U06F0> % EXTENDED ARABIC-INDIC DIGIT ZERO
 <U07C0> <S0030>;<BASE>;<MIN>;<U07C0> % NKO DIGIT ZERO
@@ -63250,7 +63262,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U2080> <S0030>;<BASE>;<MNS>;<U2080> % SUBSCRIPT ZERO
 <U2189> "<S0030><S0033>";"<BASE><BASE>";"<FRACTION><FRACTION>";<U2189> % VULGAR FRACTION ZERO THIRDS
 <U3358> "<S0030><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U3358> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ZERO
-<U0031> <S0031>;<BASE>;<MIN>;<U0031> % DIGIT ONE
 <U0661> <S0031>;<BASE>;<MIN>;<U0661> % ARABIC-INDIC DIGIT ONE
 <U06F1> <S0031>;<BASE>;<MIN>;<U06F1> % EXTENDED ARABIC-INDIC DIGIT ONE
 <U07C1> <S0031>;<BASE>;<MIN>;<U07C1> % NKO DIGIT ONE
@@ -63440,7 +63451,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U33E0> "<S0031><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E0> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY ONE
 <U32C0> "<S0031><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C0> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR JANUARY
 <U3359> "<S0031><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U3359> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ONE
-<U0032> <S0032>;<BASE>;<MIN>;<U0032> % DIGIT TWO
 <U0662> <S0032>;<BASE>;<MIN>;<U0662> % ARABIC-INDIC DIGIT TWO
 <U06F2> <S0032>;<BASE>;<MIN>;<U06F2> % EXTENDED ARABIC-INDIC DIGIT TWO
 <U07C2> <S0032>;<BASE>;<MIN>;<U07C2> % NKO DIGIT TWO
@@ -63583,7 +63593,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U33E1> "<S0032><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E1> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY TWO
 <U32C1> "<S0032><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C1> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR FEBRUARY
 <U335A> "<S0032><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335A> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR TWO
-<U0033> <S0033>;<BASE>;<MIN>;<U0033> % DIGIT THREE
 <U0663> <S0033>;<BASE>;<MIN>;<U0663> % ARABIC-INDIC DIGIT THREE
 <U06F3> <S0033>;<BASE>;<MIN>;<U06F3> % EXTENDED ARABIC-INDIC DIGIT THREE
 <U07C3> <S0033>;<BASE>;<MIN>;<U07C3> % NKO DIGIT THREE
@@ -63709,7 +63718,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U33E2> "<S0033><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E2> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY THREE
 <U32C2> "<S0033><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C2> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR MARCH
 <U335B> "<S0033><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335B> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR THREE
-<U0034> <S0034>;<BASE>;<MIN>;<U0034> % DIGIT FOUR
 <U0664> <S0034>;<BASE>;<MIN>;<U0664> % ARABIC-INDIC DIGIT FOUR
 <U06F4> <S0034>;<BASE>;<MIN>;<U06F4> % EXTENDED ARABIC-INDIC DIGIT FOUR
 <U07C4> <S0034>;<BASE>;<MIN>;<U07C4> % NKO DIGIT FOUR
@@ -63829,7 +63837,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U33E3> "<S0034><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E3> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY FOUR
 <U32C3> "<S0034><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C3> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR APRIL
 <U335C> "<S0034><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335C> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR FOUR
-<U0035> <S0035>;<BASE>;<MIN>;<U0035> % DIGIT FIVE
 <U0665> <S0035>;<BASE>;<MIN>;<U0665> % ARABIC-INDIC DIGIT FIVE
 <U06F5> <S0035>;<BASE>;<MIN>;<U06F5> % EXTENDED ARABIC-INDIC DIGIT FIVE
 <U07C5> <S0035>;<BASE>;<MIN>;<U07C5> % NKO DIGIT FIVE
@@ -63941,7 +63948,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U33E4> "<S0035><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E4> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY FIVE
 <U32C4> "<S0035><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C4> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR MAY
 <U335D> "<S0035><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335D> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR FIVE
-<U0036> <S0036>;<BASE>;<MIN>;<U0036> % DIGIT SIX
 <U0666> <S0036>;<BASE>;<MIN>;<U0666> % ARABIC-INDIC DIGIT SIX
 <U06F6> <S0036>;<BASE>;<MIN>;<U06F6> % EXTENDED ARABIC-INDIC DIGIT SIX
 <U07C6> <S0036>;<BASE>;<MIN>;<U07C6> % NKO DIGIT SIX
@@ -64036,7 +64042,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U33E5> "<S0036><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E5> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY SIX
 <U32C5> "<S0036><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C5> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR JUNE
 <U335E> "<S0036><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335E> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR SIX
-<U0037> <S0037>;<BASE>;<MIN>;<U0037> % DIGIT SEVEN
 <U0667> <S0037>;<BASE>;<MIN>;<U0667> % ARABIC-INDIC DIGIT SEVEN
 <U06F7> <S0037>;<BASE>;<MIN>;<U06F7> % EXTENDED ARABIC-INDIC DIGIT SEVEN
 <U07C7> <S0037>;<BASE>;<MIN>;<U07C7> % NKO DIGIT SEVEN
@@ -64132,7 +64137,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U33E6> "<S0037><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E6> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY SEVEN
 <U32C6> "<S0037><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C6> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR JULY
 <U335F> "<S0037><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335F> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR SEVEN
-<U0038> <S0038>;<BASE>;<MIN>;<U0038> % DIGIT EIGHT
 <U0668> <S0038>;<BASE>;<MIN>;<U0668> % ARABIC-INDIC DIGIT EIGHT
 <U06F8> <S0038>;<BASE>;<MIN>;<U06F8> % EXTENDED ARABIC-INDIC DIGIT EIGHT
 <U07C8> <S0038>;<BASE>;<MIN>;<U07C8> % NKO DIGIT EIGHT
@@ -64226,7 +64230,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
 <U33E7> "<S0038><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E7> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY EIGHT
 <U32C7> "<S0038><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C7> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR AUGUST
 <U3360> "<S0038><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U3360> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR EIGHT
-<U0039> <S0039>;<BASE>;<MIN>;<U0039> % DIGIT NINE
 <U0669> <S0039>;<BASE>;<MIN>;<U0669> % ARABIC-INDIC DIGIT NINE
 <U06F9> <S0039>;<BASE>;<MIN>;<U06F9> % EXTENDED ARABIC-INDIC DIGIT NINE
 <U07C9> <S0039>;<BASE>;<MIN>;<U07C9> % NKO DIGIT NINE
@@ -64326,7 +64329,35 @@ order_start <LATIN>;forward;backward;forward;forward,position
 else
 order_start <LATIN>;forward;forward;forward;forward,position
 endif
+% Implement rational range for [a-z] in regular expressions.
+% We order the collation element order to support rational ranges.
+% Collation is unaffected because the 4-level weights remain the same.
 <U0061> <S0061>;<BASE>;<MIN>;<U0061> % LATIN SMALL LETTER A
+<U0062> <S0062>;<BASE>;<MIN>;<U0062> % LATIN SMALL LETTER B
+<U0063> <S0063>;<BASE>;<MIN>;<U0063> % LATIN SMALL LETTER C
+<U0064> <S0064>;<BASE>;<MIN>;<U0064> % LATIN SMALL LETTER D
+<U0065> <S0065>;<BASE>;<MIN>;<U0065> % LATIN SMALL LETTER E
+<U0066> <S0066>;<BASE>;<MIN>;<U0066> % LATIN SMALL LETTER F
+<U0067> <S0067>;<BASE>;<MIN>;<U0067> % LATIN SMALL LETTER G
+<U0068> <S0068>;<BASE>;<MIN>;<U0068> % LATIN SMALL LETTER H
+<U0069> <S0069>;<BASE>;<MIN>;<U0069> % LATIN SMALL LETTER I
+<U006A> <S006A>;<BASE>;<MIN>;<U006A> % LATIN SMALL LETTER J
+<U006B> <S006B>;<BASE>;<MIN>;<U006B> % LATIN SMALL LETTER K
+<U006C> <S006C>;<BASE>;<MIN>;<U006C> % LATIN SMALL LETTER L
+<U006D> <S006D>;<BASE>;<MIN>;<U006D> % LATIN SMALL LETTER M
+<U006E> <S006E>;<BASE>;<MIN>;<U006E> % LATIN SMALL LETTER N
+<U006F> <S006F>;<BASE>;<MIN>;<U006F> % LATIN SMALL LETTER O
+<U0070> <S0070>;<BASE>;<MIN>;<U0070> % LATIN SMALL LETTER P
+<U0071> <S0071>;<BASE>;<MIN>;<U0071> % LATIN SMALL LETTER Q
+<U0072> <S0072>;<BASE>;<MIN>;<U0072> % LATIN SMALL LETTER R
+<U0073> <S0073>;<BASE>;<MIN>;<U0073> % LATIN SMALL LETTER S
+<U0074> <S0074>;<BASE>;<MIN>;<U0074> % LATIN SMALL LETTER T
+<U0075> <S0075>;<BASE>;<MIN>;<U0075> % LATIN SMALL LETTER U
+<U0076> <S0076>;<BASE>;<MIN>;<U0076> % LATIN SMALL LETTER V
+<U0077> <S0077>;<BASE>;<MIN>;<U0077> % LATIN SMALL LETTER W
+<U0078> <S0078>;<BASE>;<MIN>;<U0078> % LATIN SMALL LETTER X
+<U0079> <S0079>;<BASE>;<MIN>;<U0079> % LATIN SMALL LETTER Y
+<U007A> <S007A>;<BASE>;<MIN>;<U007A> % LATIN SMALL LETTER Z
 <UFF41> <S0061>;<BASE>;<WIDE>;<UFF41> % FULLWIDTH LATIN SMALL LETTER A
 <U0363> <S0061>;<BASE>;<COMPAT>;<U0363> % COMBINING LATIN SMALL LETTER A
 <U249C> <S0061>;<BASE>;<COMPAT>;<U249C> % PARENTHESIZED LATIN SMALL LETTER A
@@ -64418,7 +64449,6 @@ endif
 <U0252> <S0252>;<BASE>;<MIN>;<U0252> % LATIN SMALL LETTER TURNED ALPHA
 <U1D9B> <S0252>;<BASE>;<MNN>;<U1D9B> % MODIFIER LETTER SMALL TURNED ALPHA
 <UAB64> <SAB64>;<BASE>;<MIN>;<UAB64> % LATIN SMALL LETTER INVERTED ALPHA
-<U0062> <S0062>;<BASE>;<MIN>;<U0062> % LATIN SMALL LETTER B
 <UFF42> <S0062>;<BASE>;<WIDE>;<UFF42> % FULLWIDTH LATIN SMALL LETTER B
 <U1DE8> <S0062>;<BASE>;<COMPAT>;<U1DE8> % COMBINING LATIN SMALL LETTER B
 <U249D> <S0062>;<BASE>;<COMPAT>;<U249D> % PARENTHESIZED LATIN SMALL LETTER B
@@ -64454,7 +64484,6 @@ endif
 <U0183> <S0183>;<BASE>;<MIN>;<U0183> % LATIN SMALL LETTER B WITH TOPBAR
 <UA7B5> <SA7B5>;<BASE>;<MIN>;<UA7B5> % LATIN SMALL LETTER BETA
 <U1DE9> <SA7B5>;<BASE>;<COMPAT>;<U1DE9> % COMBINING LATIN SMALL LETTER BETA
-<U0063> <S0063>;<BASE>;<MIN>;<U0063> % LATIN SMALL LETTER C
 <UFF43> <S0063>;<BASE>;<WIDE>;<UFF43> % FULLWIDTH LATIN SMALL LETTER C
 <U0368> <S0063>;<BASE>;<COMPAT>;<U0368> % COMBINING LATIN SMALL LETTER C
 <U217D> <S0063>;<BASE>;<COMPAT>;<U217D> % SMALL ROMAN NUMERAL ONE HUNDRED
@@ -64504,7 +64533,6 @@ endif
 <U1D9D> <S0255>;<BASE>;<MNN>;<U1D9D> % MODIFIER LETTER SMALL C WITH CURL
 <U2184> <S2184>;<BASE>;<MIN>;<U2184> % LATIN SMALL LETTER REVERSED C
 <UA73F> <SA73F>;<BASE>;<MIN>;<UA73F> % LATIN SMALL LETTER REVERSED C WITH DOT
-<U0064> <S0064>;<BASE>;<MIN>;<U0064> % LATIN SMALL LETTER D
 <UFF44> <S0064>;<BASE>;<WIDE>;<UFF44> % FULLWIDTH LATIN SMALL LETTER D
 <U0369> <S0064>;<BASE>;<COMPAT>;<U0369> % COMBINING LATIN SMALL LETTER D
 <U217E> <S0064>;<BASE>;<COMPAT>;<U217E> % SMALL ROMAN NUMERAL FIVE HUNDRED
@@ -64563,7 +64591,6 @@ endif
 <U0221> <S0221>;<BASE>;<MIN>;<U0221> % LATIN SMALL LETTER D WITH CURL
 <UA771> <SA771>;<BASE>;<MIN>;<UA771> % LATIN SMALL LETTER DUM
 <U1E9F> <S1E9F>;<BASE>;<MIN>;<U1E9F> % LATIN SMALL LETTER DELTA
-<U0065> <S0065>;<BASE>;<MIN>;<U0065> % LATIN SMALL LETTER E
 <UFF45> <S0065>;<BASE>;<WIDE>;<UFF45> % FULLWIDTH LATIN SMALL LETTER E
 <U0364> <S0065>;<BASE>;<COMPAT>;<U0364> % COMBINING LATIN SMALL LETTER E
 <U24A0> <S0065>;<BASE>;<COMPAT>;<U24A0> % PARENTHESIZED LATIN SMALL LETTER E
@@ -64641,7 +64668,6 @@ endif
 <U025E> <S025E>;<BASE>;<MIN>;<U025E> % LATIN SMALL LETTER CLOSED REVERSED OPEN E
 <U029A> <S029A>;<BASE>;<MIN>;<U029A> % LATIN SMALL LETTER CLOSED OPEN E
 <U0264> <S0264>;<BASE>;<MIN>;<U0264> % LATIN SMALL LETTER RAMS HORN
-<U0066> <S0066>;<BASE>;<MIN>;<U0066> % LATIN SMALL LETTER F
 <UFF46> <S0066>;<BASE>;<WIDE>;<UFF46> % FULLWIDTH LATIN SMALL LETTER F
 <U1DEB> <S0066>;<BASE>;<COMPAT>;<U1DEB> % COMBINING LATIN SMALL LETTER F
 <U24A1> <S0066>;<BASE>;<COMPAT>;<U24A1> % PARENTHESIZED LATIN SMALL LETTER F
@@ -64680,7 +64706,6 @@ endif
 <U0192> <S0192>;<BASE>;<MIN>;<U0192> % LATIN SMALL LETTER F WITH HOOK
 <U214E> <S214E>;<BASE>;<MIN>;<U214E> % TURNED SMALL F
 <UA7FB> <SA7FB>;<BASE>;<MIN>;<UA7FB> % LATIN EPIGRAPHIC LETTER REVERSED F
-<U0067> <S0067>;<BASE>;<MIN>;<U0067> % LATIN SMALL LETTER G
 <UFF47> <S0067>;<BASE>;<WIDE>;<UFF47> % FULLWIDTH LATIN SMALL LETTER G
 <U1DDA> <S0067>;<BASE>;<COMPAT>;<U1DDA> % COMBINING LATIN SMALL LETTER G
 <U24A2> <S0067>;<BASE>;<COMPAT>;<U24A2> % PARENTHESIZED LATIN SMALL LETTER G
@@ -64727,7 +64752,6 @@ endif
 <U0263> <S0263>;<BASE>;<MIN>;<U0263> % LATIN SMALL LETTER GAMMA
 <U02E0> <S0263>;<BASE>;<MNN>;<U02E0> % MODIFIER LETTER SMALL GAMMA
 <U01A3> <S01A3>;<BASE>;<MIN>;<U01A3> % LATIN SMALL LETTER OI
-<U0068> <S0068>;<BASE>;<MIN>;<U0068> % LATIN SMALL LETTER H
 <UFF48> <S0068>;<BASE>;<WIDE>;<UFF48> % FULLWIDTH LATIN SMALL LETTER H
 <U036A> <S0068>;<BASE>;<COMPAT>;<U036A> % COMBINING LATIN SMALL LETTER H
 <U24A3> <S0068>;<BASE>;<COMPAT>;<U24A3> % PARENTHESIZED LATIN SMALL LETTER H
@@ -64780,7 +64804,6 @@ endif
 <U0267> <S0267>;<BASE>;<MIN>;<U0267> % LATIN SMALL LETTER HENG WITH HOOK
 <U02BB> <S02BB>;<BASE>;<MIN>;<U02BB> % MODIFIER LETTER TURNED COMMA
 <U02BD> <S02BD>;<BASE>;<MIN>;<U02BD> % MODIFIER LETTER REVERSED COMMA
-<U0069> <S0069>;<BASE>;<MIN>;<U0069> % LATIN SMALL LETTER I
 <UFF49> <S0069>;<BASE>;<WIDE>;<UFF49> % FULLWIDTH LATIN SMALL LETTER I
 <U0365> <S0069>;<BASE>;<COMPAT>;<U0365> % COMBINING LATIN SMALL LETTER I
 <U2170> <S0069>;<BASE>;<COMPAT>;<U2170> % SMALL ROMAN NUMERAL ONE
@@ -64844,7 +64867,6 @@ endif
 <U0269> <S0269>;<BASE>;<MIN>;<U0269> % LATIN SMALL LETTER IOTA
 <U1DA5> <S0269>;<BASE>;<MNN>;<U1DA5> % MODIFIER LETTER SMALL IOTA
 <U1D7C> <S1D7C>;<BASE>;<MIN>;<U1D7C> % LATIN SMALL LETTER IOTA WITH STROKE
-<U006A> <S006A>;<BASE>;<MIN>;<U006A> % LATIN SMALL LETTER J
 <UFF4A> <S006A>;<BASE>;<WIDE>;<UFF4A> % FULLWIDTH LATIN SMALL LETTER J
 <U24A5> <S006A>;<BASE>;<COMPAT>;<U24A5> % PARENTHESIZED LATIN SMALL LETTER J
 <U2149> <S006A>;<BASE>;<FONT>;<U2149> % DOUBLE-STRUCK ITALIC SMALL J
@@ -64876,7 +64898,6 @@ endif
 <U025F> <S025F>;<BASE>;<MIN>;<U025F> % LATIN SMALL LETTER DOTLESS J WITH STROKE
 <U1DA1> <S025F>;<BASE>;<MNN>;<U1DA1> % MODIFIER LETTER SMALL DOTLESS J WITH STROKE
 <U0284> <S0284>;<BASE>;<MIN>;<U0284> % LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK
-<U006B> <S006B>;<BASE>;<MIN>;<U006B> % LATIN SMALL LETTER K
 <UFF4B> <S006B>;<BASE>;<WIDE>;<UFF4B> % FULLWIDTH LATIN SMALL LETTER K
 <U1DDC> <S006B>;<BASE>;<COMPAT>;<U1DDC> % COMBINING LATIN SMALL LETTER K
 <U24A6> <S006B>;<BASE>;<COMPAT>;<U24A6> % PARENTHESIZED LATIN SMALL LETTER K
@@ -64926,7 +64947,6 @@ endif
 <UA743> <SA743>;<BASE>;<MIN>;<UA743> % LATIN SMALL LETTER K WITH DIAGONAL STROKE
 <UA745> <SA745>;<BASE>;<MIN>;<UA745> % LATIN SMALL LETTER K WITH STROKE AND DIAGONAL STROKE
 <U029E> <S029E>;<BASE>;<MIN>;<U029E> % LATIN SMALL LETTER TURNED K
-<U006C> <S006C>;<BASE>;<MIN>;<U006C> % LATIN SMALL LETTER L
 <UFF4C> <S006C>;<BASE>;<WIDE>;<UFF4C> % FULLWIDTH LATIN SMALL LETTER L
 <U1DDD> <S006C>;<BASE>;<COMPAT>;<U1DDD> % COMBINING LATIN SMALL LETTER L
 <U217C> <S006C>;<BASE>;<COMPAT>;<U217C> % SMALL ROMAN NUMERAL FIFTY
@@ -64996,7 +65016,6 @@ endif
 <UA781> <SA781>;<BASE>;<MIN>;<UA781> % LATIN SMALL LETTER TURNED L
 <U019B> <S019B>;<BASE>;<MIN>;<U019B> % LATIN SMALL LETTER LAMBDA WITH STROKE
 <U028E> <S028E>;<BASE>;<MIN>;<U028E> % LATIN SMALL LETTER TURNED Y
-<U006D> <S006D>;<BASE>;<MIN>;<U006D> % LATIN SMALL LETTER M
 <UFF4D> <S006D>;<BASE>;<WIDE>;<UFF4D> % FULLWIDTH LATIN SMALL LETTER M
 <U036B> <S006D>;<BASE>;<COMPAT>;<U036B> % COMBINING LATIN SMALL LETTER M
 <U217F> <S006D>;<BASE>;<COMPAT>;<U217F> % SMALL ROMAN NUMERAL ONE THOUSAND
@@ -65055,7 +65074,6 @@ endif
 <UA7FD> <SA7FD>;<BASE>;<MIN>;<UA7FD> % LATIN EPIGRAPHIC LETTER INVERTED M
 <UA7FF> <SA7FF>;<BASE>;<MIN>;<UA7FF> % LATIN EPIGRAPHIC LETTER ARCHAIC M
 <UA773> <SA773>;<BASE>;<MIN>;<UA773> % LATIN SMALL LETTER MUM
-<U006E> <S006E>;<BASE>;<MIN>;<U006E> % LATIN SMALL LETTER N
 <UFF4E> <S006E>;<BASE>;<WIDE>;<UFF4E> % FULLWIDTH LATIN SMALL LETTER N
 <U1DE0> <S006E>;<BASE>;<COMPAT>;<U1DE0> % COMBINING LATIN SMALL LETTER N
 <U24A9> <S006E>;<BASE>;<COMPAT>;<U24A9> % PARENTHESIZED LATIN SMALL LETTER N
@@ -65114,7 +65132,6 @@ endif
 <U014B> <S014B>;<BASE>;<MIN>;<U014B> % LATIN SMALL LETTER ENG
 <U1D51> <S014B>;<BASE>;<MNN>;<U1D51> % MODIFIER LETTER SMALL ENG
 <UAB3C> <SAB3C>;<BASE>;<MIN>;<UAB3C> % LATIN SMALL LETTER ENG WITH CROSSED-TAIL
-<U006F> <S006F>;<BASE>;<MIN>;<U006F> % LATIN SMALL LETTER O
 <UFF4F> <S006F>;<BASE>;<WIDE>;<UFF4F> % FULLWIDTH LATIN SMALL LETTER O
 <U0366> <S006F>;<BASE>;<COMPAT>;<U0366> % COMBINING LATIN SMALL LETTER O
 <U24AA> <S006F>;<BASE>;<COMPAT>;<U24AA> % PARENTHESIZED LATIN SMALL LETTER O
@@ -65213,7 +65230,6 @@ endif
 <U0223> <S0223>;<BASE>;<MIN>;<U0223> % LATIN SMALL LETTER OU
 <U1D3D> <S0223>;<BASE>;<MISCCAP>;<U1D3D> % MODIFIER LETTER CAPITAL OU
 <U1D15> <S1D15>;<BASE>;<MIN>;<U1D15> % LATIN LETTER SMALL CAPITAL OU
-<U0070> <S0070>;<BASE>;<MIN>;<U0070> % LATIN SMALL LETTER P
 <UFF50> <S0070>;<BASE>;<WIDE>;<UFF50> % FULLWIDTH LATIN SMALL LETTER P
 <U1DEE> <S0070>;<BASE>;<COMPAT>;<U1DEE> % COMBINING LATIN SMALL LETTER P
 <U24AB> <S0070>;<BASE>;<COMPAT>;<U24AB> % PARENTHESIZED LATIN SMALL LETTER P
@@ -65262,7 +65278,6 @@ endif
 <U0278> <S0278>;<BASE>;<MIN>;<U0278> % LATIN SMALL LETTER PHI
 <U1DB2> <S0278>;<BASE>;<MNN>;<U1DB2> % MODIFIER LETTER SMALL PHI
 <U2C77> <S2C77>;<BASE>;<MIN>;<U2C77> % LATIN SMALL LETTER TAILLESS PHI
-<U0071> <S0071>;<BASE>;<MIN>;<U0071> % LATIN SMALL LETTER Q
 <UFF51> <S0071>;<BASE>;<WIDE>;<UFF51> % FULLWIDTH LATIN SMALL LETTER Q
 <U24AC> <S0071>;<BASE>;<COMPAT>;<U24AC> % PARENTHESIZED LATIN SMALL LETTER Q
 <U0001D42A> <S0071>;<BASE>;<FONT>;<U0001D42A> % MATHEMATICAL BOLD SMALL Q
@@ -65285,7 +65300,6 @@ endif
 <U02A0> <S02A0>;<BASE>;<MIN>;<U02A0> % LATIN SMALL LETTER Q WITH HOOK
 <U024B> <S024B>;<BASE>;<MIN>;<U024B> % LATIN SMALL LETTER Q WITH HOOK TAIL
 <U0138> <S0138>;<BASE>;<MIN>;<U0138> % LATIN SMALL LETTER KRA
-<U0072> <S0072>;<BASE>;<MIN>;<U0072> % LATIN SMALL LETTER R
 <UFF52> <S0072>;<BASE>;<WIDE>;<UFF52> % FULLWIDTH LATIN SMALL LETTER R
 <U036C> <S0072>;<BASE>;<COMPAT>;<U036C> % COMBINING LATIN SMALL LETTER R
 <U1DCA> <S0072>;<BASE>;<COMPAT>;<U1DCA> % COMBINING LATIN SMALL LETTER R BELOW
@@ -65354,7 +65368,6 @@ endif
 <UA775> <SA775>;<BASE>;<MIN>;<UA775> % LATIN SMALL LETTER RUM
 <UA776> <SA776>;<BASE>;<MIN>;<UA776> % LATIN LETTER SMALL CAPITAL RUM
 <UA75D> <SA75D>;<BASE>;<MIN>;<UA75D> % LATIN SMALL LETTER RUM ROTUNDA
-<U0073> <S0073>;<BASE>;<MIN>;<U0073> % LATIN SMALL LETTER S
 <UFF53> <S0073>;<BASE>;<WIDE>;<UFF53> % FULLWIDTH LATIN SMALL LETTER S
 <U1DE4> <S0073>;<BASE>;<COMPAT>;<U1DE4> % COMBINING LATIN SMALL LETTER S
 <U24AE> <S0073>;<BASE>;<COMPAT>;<U24AE> % PARENTHESIZED LATIN SMALL LETTER S
@@ -65417,7 +65430,6 @@ endif
 <U0285> <S0285>;<BASE>;<MIN>;<U0285> % LATIN SMALL LETTER SQUAT REVERSED ESH
 <U1D98> <S1D98>;<BASE>;<MIN>;<U1D98> % LATIN SMALL LETTER ESH WITH RETROFLEX HOOK
 <U0286> <S0286>;<BASE>;<MIN>;<U0286> % LATIN SMALL LETTER ESH WITH CURL
-<U0074> <S0074>;<BASE>;<MIN>;<U0074> % LATIN SMALL LETTER T
 <UFF54> <S0074>;<BASE>;<WIDE>;<UFF54> % FULLWIDTH LATIN SMALL LETTER T
 <U036D> <S0074>;<BASE>;<COMPAT>;<U036D> % COMBINING LATIN SMALL LETTER T
 <U24AF> <S0074>;<BASE>;<COMPAT>;<U24AF> % PARENTHESIZED LATIN SMALL LETTER T
@@ -65467,7 +65479,6 @@ endif
 <U0236> <S0236>;<BASE>;<MIN>;<U0236> % LATIN SMALL LETTER T WITH CURL
 <UA777> <SA777>;<BASE>;<MIN>;<UA777> % LATIN SMALL LETTER TUM
 <U0287> <S0287>;<BASE>;<MIN>;<U0287> % LATIN SMALL LETTER TURNED T
-<U0075> <S0075>;<BASE>;<MIN>;<U0075> % LATIN SMALL LETTER U
 <UFF55> <S0075>;<BASE>;<WIDE>;<UFF55> % FULLWIDTH LATIN SMALL LETTER U
 <U0367> <S0075>;<BASE>;<COMPAT>;<U0367> % COMBINING LATIN SMALL LETTER U
 <U24B0> <S0075>;<BASE>;<COMPAT>;<U24B0> % PARENTHESIZED LATIN SMALL LETTER U
@@ -65552,7 +65563,6 @@ endif
 <U028A> <S028A>;<BASE>;<MIN>;<U028A> % LATIN SMALL LETTER UPSILON
 <U1DB7> <S028A>;<BASE>;<MNN>;<U1DB7> % MODIFIER LETTER SMALL UPSILON
 <U1D7F> <S1D7F>;<BASE>;<MIN>;<U1D7F> % LATIN SMALL LETTER UPSILON WITH STROKE
-<U0076> <S0076>;<BASE>;<MIN>;<U0076> % LATIN SMALL LETTER V
 <UFF56> <S0076>;<BASE>;<WIDE>;<UFF56> % FULLWIDTH LATIN SMALL LETTER V
 <U036E> <S0076>;<BASE>;<COMPAT>;<U036E> % COMBINING LATIN SMALL LETTER V
 <U2174> <S0076>;<BASE>;<COMPAT>;<U2174> % SMALL ROMAN NUMERAL FIVE
@@ -65593,7 +65603,6 @@ endif
 <U1EFD> <S1EFD>;<BASE>;<MIN>;<U1EFD> % LATIN SMALL LETTER MIDDLE-WELSH V
 <U028C> <S028C>;<BASE>;<MIN>;<U028C> % LATIN SMALL LETTER TURNED V
 <U1DBA> <S028C>;<BASE>;<MNN>;<U1DBA> % MODIFIER LETTER SMALL TURNED V
-<U0077> <S0077>;<BASE>;<MIN>;<U0077> % LATIN SMALL LETTER W
 <UFF57> <S0077>;<BASE>;<WIDE>;<UFF57> % FULLWIDTH LATIN SMALL LETTER W
 <U1DF1> <S0077>;<BASE>;<COMPAT>;<U1DF1> % COMBINING LATIN SMALL LETTER W
 <U24B2> <S0077>;<BASE>;<COMPAT>;<U24B2> % PARENTHESIZED LATIN SMALL LETTER W
@@ -65627,7 +65636,6 @@ endif
 <U1D21> <S1D21>;<BASE>;<MIN>;<U1D21> % LATIN LETTER SMALL CAPITAL W
 <U2C73> <S2C73>;<BASE>;<MIN>;<U2C73> % LATIN SMALL LETTER W WITH HOOK
 <U028D> <S028D>;<BASE>;<MIN>;<U028D> % LATIN SMALL LETTER TURNED W
-<U0078> <S0078>;<BASE>;<MIN>;<U0078> % LATIN SMALL LETTER X
 <UFF58> <S0078>;<BASE>;<WIDE>;<UFF58> % FULLWIDTH LATIN SMALL LETTER X
 <U036F> <S0078>;<BASE>;<COMPAT>;<U036F> % COMBINING LATIN SMALL LETTER X
 <U2179> <S0078>;<BASE>;<COMPAT>;<U2179> % SMALL ROMAN NUMERAL TEN
@@ -65660,7 +65668,6 @@ endif
 <UAB53> <SAB53>;<BASE>;<MIN>;<UAB53> % LATIN SMALL LETTER CHI
 <UAB54> <SAB54>;<BASE>;<MIN>;<UAB54> % LATIN SMALL LETTER CHI WITH LOW RIGHT RING
 <UAB55> <SAB55>;<BASE>;<MIN>;<UAB55> % LATIN SMALL LETTER CHI WITH LOW LEFT SERIF
-<U0079> <S0079>;<BASE>;<MIN>;<U0079> % LATIN SMALL LETTER Y
 <UFF59> <S0079>;<BASE>;<WIDE>;<UFF59> % FULLWIDTH LATIN SMALL LETTER Y
 <U24B4> <S0079>;<BASE>;<COMPAT>;<U24B4> % PARENTHESIZED LATIN SMALL LETTER Y
 <U0001D432> <S0079>;<BASE>;<FONT>;<U0001D432> % MATHEMATICAL BOLD SMALL Y
@@ -65694,7 +65701,6 @@ endif
 <U1EFF> <S1EFF>;<BASE>;<MIN>;<U1EFF> % LATIN SMALL LETTER Y WITH LOOP
 <UAB5A> <SAB5A>;<BASE>;<MIN>;<UAB5A> % LATIN SMALL LETTER Y WITH SHORT RIGHT LEG
 <U021D> <S021D>;<BASE>;<MIN>;<U021D> % LATIN SMALL LETTER YOGH
-<U007A> <S007A>;<BASE>;<MIN>;<U007A> % LATIN SMALL LETTER Z
 <UFF5A> <S007A>;<BASE>;<WIDE>;<UFF5A> % FULLWIDTH LATIN SMALL LETTER Z
 <U1DE6> <S007A>;<BASE>;<COMPAT>;<U1DE6> % COMBINING LATIN SMALL LETTER Z
 <U24B5> <S007A>;<BASE>;<COMPAT>;<U24B5> % PARENTHESIZED LATIN SMALL LETTER Z
@@ -65796,7 +65802,35 @@ endif
 <U0001D736> <S03B1>;<BASE>;<FONT>;<U0001D736> % MATHEMATICAL BOLD ITALIC SMALL ALPHA
 <U0001D770> <S03B1>;<BASE>;<FONT>;<U0001D770> % MATHEMATICAL SANS-SERIF BOLD SMALL ALPHA
 <U0001D7AA> <S03B1>;<BASE>;<FONT>;<U0001D7AA> % MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA
+% Implement rational range for [A-Z] in regular expressions.
+% We order the collation element order to support rational ranges.
+% Collation is unaffected because the 4-level weights remain the same.
 <U0041> <S0061>;<BASE>;<CAP>;<U0041> % LATIN CAPITAL LETTER A
+<U0042> <S0062>;<BASE>;<CAP>;<U0042> % LATIN CAPITAL LETTER B
+<U0043> <S0063>;<BASE>;<CAP>;<U0043> % LATIN CAPITAL LETTER C
+<U0044> <S0064>;<BASE>;<CAP>;<U0044> % LATIN CAPITAL LETTER D
+<U0045> <S0065>;<BASE>;<CAP>;<U0045> % LATIN CAPITAL LETTER E
+<U0046> <S0066>;<BASE>;<CAP>;<U0046> % LATIN CAPITAL LETTER F
+<U0047> <S0067>;<BASE>;<CAP>;<U0047> % LATIN CAPITAL LETTER G
+<U0048> <S0068>;<BASE>;<CAP>;<U0048> % LATIN CAPITAL LETTER H
+<U0049> <S0069>;<BASE>;<CAP>;<U0049> % LATIN CAPITAL LETTER I
+<U004A> <S006A>;<BASE>;<CAP>;<U004A> % LATIN CAPITAL LETTER J
+<U004B> <S006B>;<BASE>;<CAP>;<U004B> % LATIN CAPITAL LETTER K
+<U004C> <S006C>;<BASE>;<CAP>;<U004C> % LATIN CAPITAL LETTER L
+<U004D> <S006D>;<BASE>;<CAP>;<U004D> % LATIN CAPITAL LETTER M
+<U004E> <S006E>;<BASE>;<CAP>;<U004E> % LATIN CAPITAL LETTER N
+<U004F> <S006F>;<BASE>;<CAP>;<U004F> % LATIN CAPITAL LETTER O
+<U0050> <S0070>;<BASE>;<CAP>;<U0050> % LATIN CAPITAL LETTER P
+<U0051> <S0071>;<BASE>;<CAP>;<U0051> % LATIN CAPITAL LETTER Q
+<U0052> <S0072>;<BASE>;<CAP>;<U0052> % LATIN CAPITAL LETTER R
+<U0053> <S0073>;<BASE>;<CAP>;<U0053> % LATIN CAPITAL LETTER S
+<U0054> <S0074>;<BASE>;<CAP>;<U0054> % LATIN CAPITAL LETTER T
+<U0055> <S0075>;<BASE>;<CAP>;<U0055> % LATIN CAPITAL LETTER U
+<U0056> <S0076>;<BASE>;<CAP>;<U0056> % LATIN CAPITAL LETTER V
+<U0057> <S0077>;<BASE>;<CAP>;<U0057> % LATIN CAPITAL LETTER W
+<U0058> <S0078>;<BASE>;<CAP>;<U0058> % LATIN CAPITAL LETTER X
+<U0059> <S0079>;<BASE>;<CAP>;<U0059> % LATIN CAPITAL LETTER Y
+<U005A> <S007A>;<BASE>;<CAP>;<U005A> % LATIN CAPITAL LETTER Z
 <UFF21> <S0061>;<BASE>;<WIDECAP>;<UFF21> % FULLWIDTH LATIN CAPITAL LETTER A
 <U0001F110> <S0061>;<BASE>;<COMPATCAP>;<U0001F110> % PARENTHESIZED LATIN CAPITAL LETTER A
 <U0001D400> <S0061>;<BASE>;<FONTCAP>;<U0001D400> % MATHEMATICAL BOLD CAPITAL A
@@ -65860,7 +65894,6 @@ endif
 <U2C6F> <S0250>;<BASE>;<CAP>;<U2C6F> % LATIN CAPITAL LETTER TURNED A
 <U2C6D> <S0251>;<BASE>;<CAP>;<U2C6D> % LATIN CAPITAL LETTER ALPHA
 <U2C70> <S0252>;<BASE>;<CAP>;<U2C70> % LATIN CAPITAL LETTER TURNED ALPHA
-<U0042> <S0062>;<BASE>;<CAP>;<U0042> % LATIN CAPITAL LETTER B
 <UFF22> <S0062>;<BASE>;<WIDECAP>;<UFF22> % FULLWIDTH LATIN CAPITAL LETTER B
 <U0001F111> <S0062>;<BASE>;<COMPATCAP>;<U0001F111> % PARENTHESIZED LATIN CAPITAL LETTER B
 <U212C> <S0062>;<BASE>;<FONTCAP>;<U212C> % SCRIPT CAPITAL B
@@ -65888,7 +65921,6 @@ endif
 <U0181> <S0253>;<BASE>;<CAP>;<U0181> % LATIN CAPITAL LETTER B WITH HOOK
 <U0182> <S0183>;<BASE>;<CAP>;<U0182> % LATIN CAPITAL LETTER B WITH TOPBAR
 <UA7B4> <SA7B5>;<BASE>;<CAP>;<UA7B4> % LATIN CAPITAL LETTER BETA
-<U0043> <S0063>;<BASE>;<CAP>;<U0043> % LATIN CAPITAL LETTER C
 <UFF23> <S0063>;<BASE>;<WIDECAP>;<UFF23> % FULLWIDTH LATIN CAPITAL LETTER C
 <U216D> <S0063>;<BASE>;<COMPATCAP>;<U216D> % ROMAN NUMERAL ONE HUNDRED
 <U0001F112> <S0063>;<BASE>;<COMPATCAP>;<U0001F112> % PARENTHESIZED LATIN CAPITAL LETTER C
@@ -65921,7 +65953,6 @@ endif
 <U0187> <S0188>;<BASE>;<CAP>;<U0187> % LATIN CAPITAL LETTER C WITH HOOK
 <U2183> <S2184>;<BASE>;<CAP>;<U2183> % ROMAN NUMERAL REVERSED ONE HUNDRED
 <UA73E> <SA73F>;<BASE>;<CAP>;<UA73E> % LATIN CAPITAL LETTER REVERSED C WITH DOT
-<U0044> <S0064>;<BASE>;<CAP>;<U0044> % LATIN CAPITAL LETTER D
 <UFF24> <S0064>;<BASE>;<WIDECAP>;<UFF24> % FULLWIDTH LATIN CAPITAL LETTER D
 <U216E> <S0064>;<BASE>;<COMPATCAP>;<U216E> % ROMAN NUMERAL FIVE HUNDRED
 <U0001F113> <S0064>;<BASE>;<COMPATCAP>;<U0001F113> % PARENTHESIZED LATIN CAPITAL LETTER D
@@ -65959,7 +65990,6 @@ endif
 <U0189> <S0256>;<BASE>;<CAP>;<U0189> % LATIN CAPITAL LETTER AFRICAN D
 <U018A> <S0257>;<BASE>;<CAP>;<U018A> % LATIN CAPITAL LETTER D WITH HOOK
 <U018B> <S018C>;<BASE>;<CAP>;<U018B> % LATIN CAPITAL LETTER D WITH TOPBAR
-<U0045> <S0065>;<BASE>;<CAP>;<U0045> % LATIN CAPITAL LETTER E
 <UFF25> <S0065>;<BASE>;<WIDECAP>;<UFF25> % FULLWIDTH LATIN CAPITAL LETTER E
 <U0001F114> <S0065>;<BASE>;<COMPATCAP>;<U0001F114> % PARENTHESIZED LATIN CAPITAL LETTER E
 <U2130> <S0065>;<BASE>;<FONTCAP>;<U2130> % SCRIPT CAPITAL E
@@ -66010,7 +66040,6 @@ endif
 <U0190> <S025B>;<BASE>;<CAP>;<U0190> % LATIN CAPITAL LETTER OPEN E
 <U2107> <S025B>;<BASE>;<COMPATCAP>;<U2107> % EULER CONSTANT
 <UA7AB> <S025C>;<BASE>;<CAP>;<UA7AB> % LATIN CAPITAL LETTER REVERSED OPEN E
-<U0046> <S0066>;<BASE>;<CAP>;<U0046> % LATIN CAPITAL LETTER F
 <UFF26> <S0066>;<BASE>;<WIDECAP>;<UFF26> % FULLWIDTH LATIN CAPITAL LETTER F
 <U0001F115> <S0066>;<BASE>;<COMPATCAP>;<U0001F115> % PARENTHESIZED LATIN CAPITAL LETTER F
 <U2131> <S0066>;<BASE>;<FONTCAP>;<U2131> % SCRIPT CAPITAL F
@@ -66035,7 +66064,6 @@ endif
 <UA798> <SA799>;<BASE>;<CAP>;<UA798> % LATIN CAPITAL LETTER F WITH STROKE
 <U0191> <S0192>;<BASE>;<CAP>;<U0191> % LATIN CAPITAL LETTER F WITH HOOK
 <U2132> <S214E>;<BASE>;<CAP>;<U2132> % TURNED CAPITAL F
-<U0047> <S0067>;<BASE>;<CAP>;<U0047> % LATIN CAPITAL LETTER G
 <UFF27> <S0067>;<BASE>;<WIDECAP>;<UFF27> % FULLWIDTH LATIN CAPITAL LETTER G
 <U0001F116> <S0067>;<BASE>;<COMPATCAP>;<U0001F116> % PARENTHESIZED LATIN CAPITAL LETTER G
 <U0001D406> <S0067>;<BASE>;<FONTCAP>;<U0001D406> % MATHEMATICAL BOLD CAPITAL G
@@ -66071,7 +66099,6 @@ endif
 <UA77E> <SA77F>;<BASE>;<CAP>;<UA77E> % LATIN CAPITAL LETTER TURNED INSULAR G
 <U0194> <S0263>;<BASE>;<CAP>;<U0194> % LATIN CAPITAL LETTER GAMMA
 <U01A2> <S01A3>;<BASE>;<CAP>;<U01A2> % LATIN CAPITAL LETTER OI
-<U0048> <S0068>;<BASE>;<CAP>;<U0048> % LATIN CAPITAL LETTER H
 <UFF28> <S0068>;<BASE>;<WIDECAP>;<UFF28> % FULLWIDTH LATIN CAPITAL LETTER H
 <U0001F117> <S0068>;<BASE>;<COMPATCAP>;<U0001F117> % PARENTHESIZED LATIN CAPITAL LETTER H
 <U210B> <S0068>;<BASE>;<FONTCAP>;<U210B> % SCRIPT CAPITAL H
@@ -66104,7 +66131,6 @@ endif
 <U2C67> <S2C68>;<BASE>;<CAP>;<U2C67> % LATIN CAPITAL LETTER H WITH DESCENDER
 <U2C75> <S2C76>;<BASE>;<CAP>;<U2C75> % LATIN CAPITAL LETTER HALF H
 <UA726> <SA727>;<BASE>;<CAP>;<UA726> % LATIN CAPITAL LETTER HENG
-<U0049> <S0069>;<BASE>;<CAP>;<U0049> % LATIN CAPITAL LETTER I
 <UFF29> <S0069>;<BASE>;<WIDECAP>;<UFF29> % FULLWIDTH LATIN CAPITAL LETTER I
 <U2160> <S0069>;<BASE>;<COMPATCAP>;<U2160> % ROMAN NUMERAL ONE
 <U0001F118> <S0069>;<BASE>;<COMPATCAP>;<U0001F118> % PARENTHESIZED LATIN CAPITAL LETTER I
@@ -66149,7 +66175,6 @@ endif
 <UA7AE> <S026A>;<BASE>;<CAP>;<UA7AE> % LATIN CAPITAL LETTER SMALL CAPITAL I
 <U0197> <S0268>;<BASE>;<CAP>;<U0197> % LATIN CAPITAL LETTER I WITH STROKE
 <U0196> <S0269>;<BASE>;<CAP>;<U0196> % LATIN CAPITAL LETTER IOTA
-<U004A> <S006A>;<BASE>;<CAP>;<U004A> % LATIN CAPITAL LETTER J
 <UFF2A> <S006A>;<BASE>;<WIDECAP>;<UFF2A> % FULLWIDTH LATIN CAPITAL LETTER J
 <U0001F119> <S006A>;<BASE>;<COMPATCAP>;<U0001F119> % PARENTHESIZED LATIN CAPITAL LETTER J
 <U0001D409> <S006A>;<BASE>;<FONTCAP>;<U0001D409> % MATHEMATICAL BOLD CAPITAL J
@@ -66172,7 +66197,6 @@ endif
 <U0134> <S006A>;"<BASE><CIRCF>";"<CAP><MIN>";<U0134> % LATIN CAPITAL LETTER J WITH CIRCUMFLEX
 <U0248> <S0249>;<BASE>;<CAP>;<U0248> % LATIN CAPITAL LETTER J WITH STROKE
 <UA7B2> <S029D>;<BASE>;<CAP>;<UA7B2> % LATIN CAPITAL LETTER J WITH CROSSED-TAIL
-<U004B> <S006B>;<BASE>;<CAP>;<U004B> % LATIN CAPITAL LETTER K
 <U212A> <S006B>;<BASE>;<CAP>;<U212A> % KELVIN SIGN
 <UFF2B> <S006B>;<BASE>;<WIDECAP>;<UFF2B> % FULLWIDTH LATIN CAPITAL LETTER K
 <U0001F11A> <S006B>;<BASE>;<COMPATCAP>;<U0001F11A> % PARENTHESIZED LATIN CAPITAL LETTER K
@@ -66206,7 +66230,6 @@ endif
 <UA742> <SA743>;<BASE>;<CAP>;<UA742> % LATIN CAPITAL LETTER K WITH DIAGONAL STROKE
 <UA744> <SA745>;<BASE>;<CAP>;<UA744> % LATIN CAPITAL LETTER K WITH STROKE AND DIAGONAL STROKE
 <UA7B0> <S029E>;<BASE>;<CAP>;<UA7B0> % LATIN CAPITAL LETTER TURNED K
-<U004C> <S006C>;<BASE>;<CAP>;<U004C> % LATIN CAPITAL LETTER L
 <UFF2C> <S006C>;<BASE>;<WIDECAP>;<UFF2C> % FULLWIDTH LATIN CAPITAL LETTER L
 <U216C> <S006C>;<BASE>;<COMPATCAP>;<U216C> % ROMAN NUMERAL FIFTY
 <U0001F11B> <S006C>;<BASE>;<COMPATCAP>;<U0001F11B> % PARENTHESIZED LATIN CAPITAL LETTER L
@@ -66249,7 +66272,6 @@ endif
 <U2C62> <S026B>;<BASE>;<CAP>;<U2C62> % LATIN CAPITAL LETTER L WITH MIDDLE TILDE
 <UA7AD> <S026C>;<BASE>;<CAP>;<UA7AD> % LATIN CAPITAL LETTER L WITH BELT
 <UA780> <SA781>;<BASE>;<CAP>;<UA780> % LATIN CAPITAL LETTER TURNED L
-<U004D> <S006D>;<BASE>;<CAP>;<U004D> % LATIN CAPITAL LETTER M
 <UFF2D> <S006D>;<BASE>;<WIDECAP>;<UFF2D> % FULLWIDTH LATIN CAPITAL LETTER M
 <U216F> <S006D>;<BASE>;<COMPATCAP>;<U216F> % ROMAN NUMERAL ONE THOUSAND
 <U0001F11C> <S006D>;<BASE>;<COMPATCAP>;<U0001F11C> % PARENTHESIZED LATIN CAPITAL LETTER M
@@ -66275,7 +66297,6 @@ endif
 <U1E42> <S006D>;"<BASE><POINS>";"<CAP><MIN>";<U1E42> % LATIN CAPITAL LETTER M WITH DOT BELOW
 <U1DDF> <S1D0D>;<BASE>;<COMPAT>;<U1DDF> % COMBINING LATIN LETTER SMALL CAPITAL M
 <U2C6E> <S0271>;<BASE>;<CAP>;<U2C6E> % LATIN CAPITAL LETTER M WITH HOOK
-<U004E> <S006E>;<BASE>;<CAP>;<U004E> % LATIN CAPITAL LETTER N
 <UFF2E> <S006E>;<BASE>;<WIDECAP>;<UFF2E> % FULLWIDTH LATIN CAPITAL LETTER N
 <U0001F11D> <S006E>;<BASE>;<COMPATCAP>;<U0001F11D> % PARENTHESIZED LATIN CAPITAL LETTER N
 <U2115> <S006E>;<BASE>;<FONTCAP>;<U2115> % DOUBLE-STRUCK CAPITAL N
@@ -66312,7 +66333,6 @@ endif
 <U0220> <S019E>;<BASE>;<CAP>;<U0220> % LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
 <UA790> <SA791>;<BASE>;<CAP>;<UA790> % LATIN CAPITAL LETTER N WITH DESCENDER
 <U014A> <S014B>;<BASE>;<CAP>;<U014A> % LATIN CAPITAL LETTER ENG
-<U004F> <S006F>;<BASE>;<CAP>;<U004F> % LATIN CAPITAL LETTER O
 <UFF2F> <S006F>;<BASE>;<WIDECAP>;<UFF2F> % FULLWIDTH LATIN CAPITAL LETTER O
 <U0001F11E> <S006F>;<BASE>;<COMPATCAP>;<U0001F11E> % PARENTHESIZED LATIN CAPITAL LETTER O
 <U0001D40E> <S006F>;<BASE>;<FONTCAP>;<U0001D40E> % MATHEMATICAL BOLD CAPITAL O
@@ -66377,7 +66397,6 @@ endif
 <UA74A> <SA74B>;<BASE>;<CAP>;<UA74A> % LATIN CAPITAL LETTER O WITH LONG STROKE OVERLAY
 <UA7B6> <SA7B7>;<BASE>;<CAP>;<UA7B6> % LATIN CAPITAL LETTER OMEGA
 <U0222> <S0223>;<BASE>;<CAP>;<U0222> % LATIN CAPITAL LETTER OU
-<U0050> <S0070>;<BASE>;<CAP>;<U0050> % LATIN CAPITAL LETTER P
 <UFF30> <S0070>;<BASE>;<WIDECAP>;<UFF30> % FULLWIDTH LATIN CAPITAL LETTER P
 <U0001F11F> <S0070>;<BASE>;<COMPATCAP>;<U0001F11F> % PARENTHESIZED LATIN CAPITAL LETTER P
 <U2119> <S0070>;<BASE>;<FONTCAP>;<U2119> % DOUBLE-STRUCK CAPITAL P
@@ -66405,7 +66424,6 @@ endif
 <U01A4> <S01A5>;<BASE>;<CAP>;<U01A4> % LATIN CAPITAL LETTER P WITH HOOK
 <UA752> <SA753>;<BASE>;<CAP>;<UA752> % LATIN CAPITAL LETTER P WITH FLOURISH
 <UA754> <SA755>;<BASE>;<CAP>;<UA754> % LATIN CAPITAL LETTER P WITH SQUIRREL TAIL
-<U0051> <S0071>;<BASE>;<CAP>;<U0051> % LATIN CAPITAL LETTER Q
 <UFF31> <S0071>;<BASE>;<WIDECAP>;<UFF31> % FULLWIDTH LATIN CAPITAL LETTER Q
 <U0001F120> <S0071>;<BASE>;<COMPATCAP>;<U0001F120> % PARENTHESIZED LATIN CAPITAL LETTER Q
 <U211A> <S0071>;<BASE>;<FONTCAP>;<U211A> % DOUBLE-STRUCK CAPITAL Q
@@ -66428,7 +66446,6 @@ endif
 <UA756> <SA757>;<BASE>;<CAP>;<UA756> % LATIN CAPITAL LETTER Q WITH STROKE THROUGH DESCENDER
 <UA758> <SA759>;<BASE>;<CAP>;<UA758> % LATIN CAPITAL LETTER Q WITH DIAGONAL STROKE
 <U024A> <S024B>;<BASE>;<CAP>;<U024A> % LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL
-<U0052> <S0072>;<BASE>;<CAP>;<U0052> % LATIN CAPITAL LETTER R
 <UFF32> <S0072>;<BASE>;<WIDECAP>;<UFF32> % FULLWIDTH LATIN CAPITAL LETTER R
 <U0001F121> <S0072>;<BASE>;<COMPATCAP>;<U0001F121> % PARENTHESIZED LATIN CAPITAL LETTER R
 <U211B> <S0072>;<BASE>;<FONTCAP>;<U211B> % SCRIPT CAPITAL R
@@ -66466,7 +66483,6 @@ endif
 <U024C> <S024D>;<BASE>;<CAP>;<U024C> % LATIN CAPITAL LETTER R WITH STROKE
 <U2C64> <S027D>;<BASE>;<CAP>;<U2C64> % LATIN CAPITAL LETTER R WITH TAIL
 <UA75C> <SA75D>;<BASE>;<CAP>;<UA75C> % LATIN CAPITAL LETTER RUM ROTUNDA
-<U0053> <S0073>;<BASE>;<CAP>;<U0053> % LATIN CAPITAL LETTER S
 <UFF33> <S0073>;<BASE>;<WIDECAP>;<UFF33> % FULLWIDTH LATIN CAPITAL LETTER S
 <U0001F122> <S0073>;<BASE>;<COMPATCAP>;<U0001F122> % PARENTHESIZED LATIN CAPITAL LETTER S
 <U0001F12A> <S0073>;<BASE>;<COMPATCAP>;<U0001F12A> % TORTOISE SHELL BRACKETED LATIN CAPITAL LETTER S
@@ -66502,7 +66518,6 @@ endif
 <U1E9E> "<S0073><S0073>";"<BASE><VRNT1><BASE>";"<COMPATCAP><COMPAT><COMPATCAP>";<U1E9E> % LATIN CAPITAL LETTER SHARP S
 <U2C7E> <S023F>;<BASE>;<CAP>;<U2C7E> % LATIN CAPITAL LETTER S WITH SWASH TAIL
 <U01A9> <S0283>;<BASE>;<CAP>;<U01A9> % LATIN CAPITAL LETTER ESH
-<U0054> <S0074>;<BASE>;<CAP>;<U0054> % LATIN CAPITAL LETTER T
 <UFF34> <S0074>;<BASE>;<WIDECAP>;<UFF34> % FULLWIDTH LATIN CAPITAL LETTER T
 <U0001F123> <S0074>;<BASE>;<COMPATCAP>;<U0001F123> % PARENTHESIZED LATIN CAPITAL LETTER T
 <U0001D413> <S0074>;<BASE>;<FONTCAP>;<U0001D413> % MATHEMATICAL BOLD CAPITAL T
@@ -66536,7 +66551,6 @@ endif
 <U01AC> <S01AD>;<BASE>;<CAP>;<U01AC> % LATIN CAPITAL LETTER T WITH HOOK
 <U01AE> <S0288>;<BASE>;<CAP>;<U01AE> % LATIN CAPITAL LETTER T WITH RETROFLEX HOOK
 <UA7B1> <S0287>;<BASE>;<CAP>;<UA7B1> % LATIN CAPITAL LETTER TURNED T
-<U0055> <S0075>;<BASE>;<CAP>;<U0055> % LATIN CAPITAL LETTER U
 <UFF35> <S0075>;<BASE>;<WIDECAP>;<UFF35> % FULLWIDTH LATIN CAPITAL LETTER U
 <U0001F124> <S0075>;<BASE>;<COMPATCAP>;<U0001F124> % PARENTHESIZED LATIN CAPITAL LETTER U
 <U0001D414> <S0075>;<BASE>;<FONTCAP>;<U0001D414> % MATHEMATICAL BOLD CAPITAL U
@@ -66591,7 +66605,6 @@ endif
 <UA78D> <S0265>;<BASE>;<CAP>;<UA78D> % LATIN CAPITAL LETTER TURNED H
 <U019C> <S026F>;<BASE>;<CAP>;<U019C> % LATIN CAPITAL LETTER TURNED M
 <U01B1> <S028A>;<BASE>;<CAP>;<U01B1> % LATIN CAPITAL LETTER UPSILON
-<U0056> <S0076>;<BASE>;<CAP>;<U0056> % LATIN CAPITAL LETTER V
 <UFF36> <S0076>;<BASE>;<WIDECAP>;<UFF36> % FULLWIDTH LATIN CAPITAL LETTER V
 <U2164> <S0076>;<BASE>;<COMPATCAP>;<U2164> % ROMAN NUMERAL FIVE
 <U0001F125> <S0076>;<BASE>;<COMPATCAP>;<U0001F125> % PARENTHESIZED LATIN CAPITAL LETTER V
@@ -66622,7 +66635,6 @@ endif
 <U01B2> <S028B>;<BASE>;<CAP>;<U01B2> % LATIN CAPITAL LETTER V WITH HOOK
 <U1EFC> <S1EFD>;<BASE>;<CAP>;<U1EFC> % LATIN CAPITAL LETTER MIDDLE-WELSH V
 <U0245> <S028C>;<BASE>;<CAP>;<U0245> % LATIN CAPITAL LETTER TURNED V
-<U0057> <S0077>;<BASE>;<CAP>;<U0057> % LATIN CAPITAL LETTER W
 <UFF37> <S0077>;<BASE>;<WIDECAP>;<UFF37> % FULLWIDTH LATIN CAPITAL LETTER W
 <U0001F126> <S0077>;<BASE>;<COMPATCAP>;<U0001F126> % PARENTHESIZED LATIN CAPITAL LETTER W
 <U0001D416> <S0077>;<BASE>;<FONTCAP>;<U0001D416> % MATHEMATICAL BOLD CAPITAL W
@@ -66649,7 +66661,6 @@ endif
 <U1E86> <S0077>;"<BASE><POINT>";"<CAP><MIN>";<U1E86> % LATIN CAPITAL LETTER W WITH DOT ABOVE
 <U1E88> <S0077>;"<BASE><POINS>";"<CAP><MIN>";<U1E88> % LATIN CAPITAL LETTER W WITH DOT BELOW
 <U2C72> <S2C73>;<BASE>;<CAP>;<U2C72> % LATIN CAPITAL LETTER W WITH HOOK
-<U0058> <S0078>;<BASE>;<CAP>;<U0058> % LATIN CAPITAL LETTER X
 <UFF38> <S0078>;<BASE>;<WIDECAP>;<UFF38> % FULLWIDTH LATIN CAPITAL LETTER X
 <U2169> <S0078>;<BASE>;<COMPATCAP>;<U2169> % ROMAN NUMERAL TEN
 <U0001F127> <S0078>;<BASE>;<COMPATCAP>;<U0001F127> % PARENTHESIZED LATIN CAPITAL LETTER X
@@ -66675,7 +66686,6 @@ endif
 <U216A> "<S0078><S0069>";"<BASE><BASE>";"<COMPATCAP><COMPATCAP>";<U216A> % ROMAN NUMERAL ELEVEN
 <U216B> "<S0078><S0069><S0069>";"<BASE><BASE><BASE>";"<COMPATCAP><COMPATCAP><COMPATCAP>";<U216B> % ROMAN NUMERAL TWELVE
 <UA7B3> <SAB53>;<BASE>;<CAP>;<UA7B3> % LATIN CAPITAL LETTER CHI
-<U0059> <S0079>;<BASE>;<CAP>;<U0059> % LATIN CAPITAL LETTER Y
 <UFF39> <S0079>;<BASE>;<WIDECAP>;<UFF39> % FULLWIDTH LATIN CAPITAL LETTER Y
 <U0001F128> <S0079>;<BASE>;<COMPATCAP>;<U0001F128> % PARENTHESIZED LATIN CAPITAL LETTER Y
 <U0001D418> <S0079>;<BASE>;<FONTCAP>;<U0001D418> % MATHEMATICAL BOLD CAPITAL Y
@@ -66708,7 +66718,6 @@ endif
 <U01B3> <S01B4>;<BASE>;<CAP>;<U01B3> % LATIN CAPITAL LETTER Y WITH HOOK
 <U1EFE> <S1EFF>;<BASE>;<CAP>;<U1EFE> % LATIN CAPITAL LETTER Y WITH LOOP
 <U021C> <S021D>;<BASE>;<CAP>;<U021C> % LATIN CAPITAL LETTER YOGH
-<U005A> <S007A>;<BASE>;<CAP>;<U005A> % LATIN CAPITAL LETTER Z
 <UFF3A> <S007A>;<BASE>;<WIDECAP>;<UFF3A> % FULLWIDTH LATIN CAPITAL LETTER Z
 <U0001F129> <S007A>;<BASE>;<COMPATCAP>;<U0001F129> % PARENTHESIZED LATIN CAPITAL LETTER Z
 <U2124> <S007A>;<BASE>;<FONTCAP>;<U2124> % DOUBLE-STRUCK CAPITAL Z
diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR
index f7c13ddf4b..7d5c9d878e 100644
--- a/localedata/locales/tr_TR
+++ b/localedata/locales/tr_TR
@@ -81,6 +81,8 @@ copy "iso14651_t1"
 %
 % The following rules implement the same order for glibc.
 
+% All of these collating symbols are used as primary weights
+% and cause equivalnce class problems, see Bug 23437.
 collating-symbol <c-cedilla>
 collating-symbol <g-breve>
 collating-symbol <i-dotless>
@@ -111,8 +113,40 @@ reorder-after <AFTER-U>
 <U011F> <g-breve>;<BASE>;<MIN>;IGNORE % Ä?
 <U011E> <g-breve>;<BASE>;<CAP>;IGNORE % Ä?
 <U0131> <i-dotless>;<BASE>;<MIN>;IGNORE % ı
+
+% tr_TR must copy the rational range definition here for CEO:
+% Implement rational range for [A-Z] in regular expressions.
+% We order the collation element order to support rational ranges.
+% Collation is unaffected because the 4-level weights remain the same.
+<U0041> <S0061>;<BASE>;<CAP>;<U0041> % LATIN CAPITAL LETTER A
+<U0042> <S0062>;<BASE>;<CAP>;<U0042> % LATIN CAPITAL LETTER B
+<U0043> <S0063>;<BASE>;<CAP>;<U0043> % LATIN CAPITAL LETTER C
+<U0044> <S0064>;<BASE>;<CAP>;<U0044> % LATIN CAPITAL LETTER D
+<U0045> <S0065>;<BASE>;<CAP>;<U0045> % LATIN CAPITAL LETTER E
+<U0046> <S0066>;<BASE>;<CAP>;<U0046> % LATIN CAPITAL LETTER F
+<U0047> <S0067>;<BASE>;<CAP>;<U0047> % LATIN CAPITAL LETTER G
+<U0048> <S0068>;<BASE>;<CAP>;<U0048> % LATIN CAPITAL LETTER H
+% Turkish sorting of I, but within rational range.
+% FIXME: 'I' is no longer in the equivalence class of i's.
 <U0049> <i-dotless>;<BASE>;<CAP>;IGNORE % I
-<U0069> <S0069>;<BASE>;<MIN>;IGNORE % i
+<U004A> <S006A>;<BASE>;<CAP>;<U004A> % LATIN CAPITAL LETTER J
+<U004B> <S006B>;<BASE>;<CAP>;<U004B> % LATIN CAPITAL LETTER K
+<U004C> <S006C>;<BASE>;<CAP>;<U004C> % LATIN CAPITAL LETTER L
+<U004D> <S006D>;<BASE>;<CAP>;<U004D> % LATIN CAPITAL LETTER M
+<U004E> <S006E>;<BASE>;<CAP>;<U004E> % LATIN CAPITAL LETTER N
+<U004F> <S006F>;<BASE>;<CAP>;<U004F> % LATIN CAPITAL LETTER O
+<U0050> <S0070>;<BASE>;<CAP>;<U0050> % LATIN CAPITAL LETTER P
+<U0051> <S0071>;<BASE>;<CAP>;<U0051> % LATIN CAPITAL LETTER Q
+<U0052> <S0072>;<BASE>;<CAP>;<U0052> % LATIN CAPITAL LETTER R
+<U0053> <S0073>;<BASE>;<CAP>;<U0053> % LATIN CAPITAL LETTER S
+<U0054> <S0074>;<BASE>;<CAP>;<U0054> % LATIN CAPITAL LETTER T
+<U0055> <S0075>;<BASE>;<CAP>;<U0055> % LATIN CAPITAL LETTER U
+<U0056> <S0076>;<BASE>;<CAP>;<U0056> % LATIN CAPITAL LETTER V
+<U0057> <S0077>;<BASE>;<CAP>;<U0057> % LATIN CAPITAL LETTER W
+<U0058> <S0078>;<BASE>;<CAP>;<U0058> % LATIN CAPITAL LETTER X
+<U0059> <S0079>;<BASE>;<CAP>;<U0059> % LATIN CAPITAL LETTER Y
+<U005A> <S007A>;<BASE>;<CAP>;<U005A> % LATIN CAPITAL LETTER Z
+
 <U0130> <S0069>;<BASE>;<CAP>;IGNORE % Ä°
 <U00F6> <o-diaresis>;<BASE>;<MIN>;IGNORE % ö
 <U00D6> <o-diaresis>;<BASE>;<CAP>;IGNORE % Ã?
diff --git a/posix/bug-regex17.c b/posix/bug-regex17.c
index 893b9654b8..341fe4d827 100644
--- a/posix/bug-regex17.c
+++ b/posix/bug-regex17.c
@@ -46,14 +46,25 @@ struct
     { { 2, 10 }, { -1, -1 } } },
 
   /* Tests for bug 9697:
+     Look for a multibyte sequence in a range. We pick the range based
+     on collation element order, since a-z is no longer valid since it's
+     a rational range.
+
+     We use U+FF53 FULLWIDTH LATIN SMALL LETTER S as the start of the
+     range, and U+33DC SQUARE SV as the end of the range.  These were
+     chosen by looking at collation element ordering and picking a range
+     in which the matching character was listed.
+
+     U+02E2	\xcb\xa2	MODIFIER LETTER SMALL S
      U+00DF	\xc3\x9f	LATIN SMALL LETTER SHARP S
      U+02DA	\xcb\x9a	RING ABOVE
-     U+02E2	\xcb\xa2	MODIFIER LETTER SMALL S  */
-  { "[a-z]|[^a-z]", "\xcb\xa2", REG_EXTENDED, 2,
+     
+     The U+02DA RING ABOVE is chosen because it's not in [ï½?-ã??].  */
+  { "[ï½?-ã??]|[^ï½?-ã??]", "\xcb\xa2", REG_EXTENDED, 2,
     { { 0, 2 }, { -1, -1 } } },
-  { "[a-z]", "\xc3\x9f", REG_EXTENDED, 2,
+  { "[ï½?-ã??]", "\xc3\x9f", REG_EXTENDED, 2,
     { { 0, 2 }, { -1, -1 } } },
-  { "[^a-z]", "\xcb\x9a", REG_EXTENDED, 2,
+  { "[^ï½?-ã??]", "\xcb\x9a", REG_EXTENDED, 2,
     { { 0, 2 }, { -1, -1 } } },
 };
 
diff --git a/posix/tst-fnmatch.input b/posix/tst-fnmatch.input
index dc2ca8d01a..2131d1e437 100644
--- a/posix/tst-fnmatch.input
+++ b/posix/tst-fnmatch.input
@@ -67,9 +67,11 @@
 # https://sourceware.org/bugzilla/show_bug.cgi?id=23393
 # https://sourceware.org/bugzilla/show_bug.cgi?id=23420
 #
-# No consensus exists on how best to handle the changes so the
-# iso14651_t1_common collation element order (CEO) has been changed to
-# deinterlace the a-z and A-Z regions.
+# The solution was to implement rational ranges by moving the collation
+# element order to fix this for [a-z], [A-Z], and [0-9]. Likewise the
+# upper and lower case letters are deinterlaced to allow for accented
+# ranges that don't include uppercase e.g. [a-ñ] should not include
+# any uppercase letters but may include a-z and more.
 #
 # With the deinterlacing commit ac3a3b4b0d561d776b60317d6a926050c8541655
 # could be reverted to re-test the correct non-interleaved expectations.
@@ -77,9 +79,7 @@
 # Please note that despite the region being deinterlaced, the ordering
 # of collation remains the same.  In glibc we implement CEO and because of
 # that we can reorder the elements to reorder ranges without impacting
-# collation which depends on weights.  The collation element ordering
-# could have been changed to include just a-z, A-Z, and 0-9 in three
-# distinct blocks, but this needs more discussion by the community.
+# collation which depends on weights.
 
 # B.6 004(C)
 C		 "!#%+,-./01234567889"	"!#%+,-./01234567889"  0
@@ -477,9 +477,9 @@ C		"-"			"[Z-\\]]"	       NOMATCH
 # handling of ranges and the recognition of character (vs bytes).
 de_DE.ISO-8859-1 "a"			"[a-z]"		       0
 de_DE.ISO-8859-1 "z"			"[a-z]"		       0
-de_DE.ISO-8859-1 "ä"			"[a-z]"		       0
-de_DE.ISO-8859-1 "ö"			"[a-z]"		       0
-de_DE.ISO-8859-1 "ü"			"[a-z]"		       0
+de_DE.ISO-8859-1 "ä"			"[a-z]"		       NOMATCH
+de_DE.ISO-8859-1 "ö"			"[a-z]"		       NOMATCH
+de_DE.ISO-8859-1 "ü"			"[a-z]"		       NOMATCH
 de_DE.ISO-8859-1 "A"			"[a-z]"		       NOMATCH
 de_DE.ISO-8859-1 "Z"			"[a-z]"		       NOMATCH
 de_DE.ISO-8859-1 "Ä"			"[a-z]"		       NOMATCH
@@ -492,9 +492,9 @@ de_DE.ISO-8859-1 "
 de_DE.ISO-8859-1 "ü"			"[A-Z]"		       NOMATCH
 de_DE.ISO-8859-1 "A"			"[A-Z]"		       0
 de_DE.ISO-8859-1 "Z"			"[A-Z]"		       0
-de_DE.ISO-8859-1 "Ä"			"[A-Z]"		       0
-de_DE.ISO-8859-1 "Ö"			"[A-Z]"		       0
-de_DE.ISO-8859-1 "Ü"			"[A-Z]"		       0
+de_DE.ISO-8859-1 "Ä"			"[A-Z]"		       NOMATCH
+de_DE.ISO-8859-1 "Ö"			"[A-Z]"		       NOMATCH
+de_DE.ISO-8859-1 "Ü"			"[A-Z]"		       NOMATCH
 de_DE.ISO-8859-1 "a"			"[[:lower:]]"	       0
 de_DE.ISO-8859-1 "z"			"[[:lower:]]"	       0
 de_DE.ISO-8859-1 "ä"			"[[:lower:]]"	       0
@@ -566,22 +566,46 @@ de_DE.ISO-8859-1 "aa"			"[[.a.]]a"	       0
 de_DE.ISO-8859-1 "ba"			"[[.a.]]a"	       NOMATCH
 
 
-# And with a multibyte character set.
+# And with a multibyte character set:
+# Ensure that Turkish reordering rules don't move 'i' out of a-z set,
+# or 'I' out of A-Z set.
+tr_TR.UTF-8	 "i"			"[a-z]"		       0
+tr_TR.UTF-8	 "ı"			"[a-z]"		       NOMATCH
+tr_TR.UTF-8	 "I"			"[A-Z]"		       0
+tr_TR.UTF-8	 "Ä°"			"[A-Z]"		       NOMATCH
+tr_TR.ISO-8859-9 "i"			"[a-z]"		       0
+tr_TR.ISO-8859-9 "I"			"[A-Z]"		       0
+# See bug 23437 for I not being in [=i=].
+tr_TR.UTF-8	 "I"			"[=i=]"		       NOMATCH
 en_US.UTF-8	 "a"			"[a-z]"		       0
+# Test that <U00F1> LATIN SMALL LETTER N WITH TILDE is not in [a-z].
+en_US.UTF-8	 "ñ"			"[a-z]"		       NOMATCH
 en_US.UTF-8	 "z"			"[a-z]"		       0
 en_US.UTF-8	 "A"			"[a-z]"		       NOMATCH
+# Test that <U00D1> LATIN CAPITAL LETTER N WITH TILDE is not in [a-z].
+en_US.UTF-8	 "Ã?"			"[a-z]"		       NOMATCH
 en_US.UTF-8	 "Z"			"[a-z]"		       NOMATCH
 en_US.UTF-8	 "a"			"[A-Z]"		       NOMATCH
+# Test that <U00F1> LATIN SMALL LETTER N WITH TILDE is not in [A-Z].
+en_US.UTF-8	 "ñ"			"[A-Z]"		       NOMATCH
 en_US.UTF-8	 "z"			"[A-Z]"		       NOMATCH
 en_US.UTF-8	 "A"			"[A-Z]"		       0
+# Test that <U00D1> LATIN CAPITAL LETTER N WITH TILDE is not in [A-Z].
+en_US.UTF-8	 "Ã?"			"[A-Z]"		       NOMATCH
 en_US.UTF-8	 "Z"			"[A-Z]"		       0
 en_US.UTF-8	 "0"			"[0-9]"		       0
+# Test that <UFF10> FULLWIDTH DIGIT ZERO is not in [0-9].
+en_US.UTF-8	 "ï¼?"			"[0-9]"		       NOMATCH
+# Test that <U00BD> VULGAR FRACTION ONE HALF is not in [0-9].
+en_US.UTF-8	 "½"			"[0-9]"		       NOMATCH
 en_US.UTF-8	 "9"			"[0-9]"		       0
+# Test that <UFF19> FULLWIDTH DIGIT NINE is not in [0-9].
+en_US.UTF-8	 "ï¼?"			"[0-9]"		       NOMATCH
 de_DE.UTF-8	 "a"			"[a-z]"		       0
 de_DE.UTF-8	 "z"			"[a-z]"		       0
-de_DE.UTF-8	 "ä"			"[a-z]"		       0
-de_DE.UTF-8	 "ö"			"[a-z]"		       0
-de_DE.UTF-8	 "ü"			"[a-z]"		       0
+de_DE.UTF-8	 "ä"			"[a-z]"		       NOMATCH
+de_DE.UTF-8	 "ö"			"[a-z]"		       NOMATCH
+de_DE.UTF-8	 "ü"			"[a-z]"		       NOMATCH
 de_DE.UTF-8	 "A"			"[a-z]"		       NOMATCH
 de_DE.UTF-8	 "Z"			"[a-z]"		       NOMATCH
 de_DE.UTF-8	 "Ã?"			"[a-z]"		       NOMATCH
@@ -594,9 +618,9 @@ de_DE.UTF-8	 "ö"			"[A-Z]"		       NOMATCH
 de_DE.UTF-8	 "ü"			"[A-Z]"		       NOMATCH
 de_DE.UTF-8	 "A"			"[A-Z]"		       0
 de_DE.UTF-8	 "Z"			"[A-Z]"		       0
-de_DE.UTF-8	 "Ã?"			"[A-Z]"		       0
-de_DE.UTF-8	 "Ã?"			"[A-Z]"		       0
-de_DE.UTF-8	 "Ã?"			"[A-Z]"		       0
+de_DE.UTF-8	 "Ã?"		"[A-Z]"		       NOMATCH
+de_DE.UTF-8	 "Ã?"		"[A-Z]"		       NOMATCH
+de_DE.UTF-8	 "Ã?"		"[A-Z]"		       NOMATCH
 de_DE.UTF-8	 "a"			"[[:lower:]]"	       0
 de_DE.UTF-8	 "z"			"[[:lower:]]"	       0
 de_DE.UTF-8	 "ä"			"[[:lower:]]"	       0
diff --git a/posix/tst-rxspencer.c b/posix/tst-rxspencer.c
index 9d597ef3e9..a3d836679a 100644
--- a/posix/tst-rxspencer.c
+++ b/posix/tst-rxspencer.c
@@ -155,7 +155,12 @@ mb_frob_pattern (const char *str, const char *letters)
 	*dst++ = *src;
 	continue;
       }
-    else if (!in_class && strchr (letters, *src))
+    /* We do a replacement, but not for the start of ranges, because
+       mb_replace will create invalid rational ranges.  For example
+       [á-z] is an invalid range because á comes after z, but [a-á]
+       is a valid range.  So we avoid replacing the start of ranges
+       to avoid this problem.  */
+    else if (!in_class && src[1] != '-' && strchr (letters, *src))
       dst = mb_replace (dst, *src);
     else
       {

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]