This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
On 07/20/2018 03:19 PM, Florian Weimer wrote:
> On 07/20/2018 08:49 PM, Carlos O'Donell wrote:
>> On 07/19/2018 04:39 PM, Florian Weimer wrote:
>>> On 07/19/2018 09:43 PM, Carlos O'Donell wrote:
>>>> * Add back tests to tst-fnmatch.input and tst-regexloc.c which
>>>> exercise that [a-z] does not match A or Z.
>>>
>>> [a-z] still matches ñ, 𝚗, but not 𝚣, which I doubt is useful.
>>
>> Sorry, I don't follow, it absolutely matches ASCII z.
>
> The z I wrote above is one of the non-BMP math characters.
Thanks :-}
It was a conservative solution.
>> We deinterlace the collation element ordering (not sequence) to get
>> the right range expression resolution.
>>
>> See the added fnmatch tests:
>>
>> +en_US.UTF-8 "a" "[a-z]" 0
>> +en_US.UTF-8 "z" "[a-z]" 0
>> +en_US.UTF-8 "A" "[a-z]" NOMATCH
>> +en_US.UTF-8 "Z" "[a-z]" NOMATCH
>> +en_US.UTF-8 "a" "[A-Z]" NOMATCH
>> +en_US.UTF-8 "z" "[A-Z]" NOMATCH
>> +en_US.UTF-8 "A" "[A-Z]" 0
>> +en_US.UTF-8 "Z" "[A-Z]" 0
>> +en_US.UTF-8 "0" "[0-9]" 0
>> +en_US.UTF-8 "9" "[0-9]" 0
>>
>> [a-z] matches a-z (including z), *and* all the lowercase inbetween,
>> and so behaves like :lower: effectively.
>
> There are characters equivalent to ASCII z (like the z above), but
> which sort after z, so they are not matched. This is one reason why
> I think this is a bad idea: it looks like [:lower:], but it's not.
> Same for [0-9], I assume.
Again, conservatively, this is how it worked before, and now works again
the same, but retains the improvement of ISO 14651 data being added.
>>> It's an improvement, and it may be good enough for glibc 2.28, but I would
>>> rather see us implement the rational ranges interpretation.
>>
>> That requires all ranges behave rationally?
>>
>> We could fix a-z, A-Z, and 0-9 easily.
>>
>> Patch attached.
>
> (NB: Patch is relative to the previous patch.)
>
> My enumeration tester likes it much more. 8-)
It was designed exactly for your enumerator ;-)
> actual: "abcdefghijklmnopqrstuvwxyz"
> actual: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> actual: "0123456789"
>
> That's for [a-z], [A-Z], [0-9], in en_US.UTF-8 and de_DE.ISO-8859-1. However, I still get this:
>
> tst-regex-classes.script:85:0: result character set difference in locale tr_TR.ISO-8859-9
> enumerate_chars '[a-z]' "abcdefghijklmnopqrstuvwxyz";
> ^
> expected: "abcdefghijklmnopqrstuvwxyz"
> actual: "abcdefghjklmnopqrstuvwxyz"
>
> tst-regex-classes.script:86:0: result character set difference in locale tr_TR.ISO-8859-9
> enumerate_chars '[A-Z]' "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> ^
> expected: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> actual: "ABCDEFGHJKLMNOPQRSTUVWXYZ"
> error: 2 test failures
>
> Can you fix this with data-only changes, too?
Yes, I need to duplicate the rational range for A-Z in tr_TR and
remove 'i' since it's just fine the way it is, the existing
New patch attached with additional tests in tst-fnmatch.input to
test tr_TR.UTF-8, and ISO-8859-9.
Noticed equivalence class issues and filed a bug and added an XFAIL-ish
test case in test-fnmatch.input:
https://sourceware.org/bugzilla/show_bug.cgi?id=23437
> posix/bug-regex17 regresses as well in the test for bug 9697, but I
> can incorporate that into my enumeration tester. I don't think the
> bug is actually regressing, it's just that the test objective is not
> expressed properly in it.
Fixed.
>
> posix/tst-rxspencer fails as well, presumably due to this:
>
> UTF-8 aA FAIL regcomp failed: Invalid range end
> UTF-8 aAcC FAIL regcomp failed: Invalid range end
>
> I think this happens because the test blindly replaces ASCII
> characters with non-ASCII characters, which causes issues if they are
> not ordered as expected.
Fixed.
v2
- Fixed tr_TR by duplicating A-Z rational range.
- Fixed tst-rxspender.
- Fixed bug-regex17.
Tell me how the new version does.
--
Cheers,
Carlos.
diff --git a/localedata/locales/iso14651_t1_common b/localedata/locales/iso14651_t1_common
index 227400cc4e..7248074a8b 100644
--- a/localedata/locales/iso14651_t1_common
+++ b/localedata/locales/iso14651_t1_common
@@ -63177,7 +63177,19 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U20BC> <S20BC>;<BASE>;<MIN>;<U20BC> % MANAT SIGN
<U20BD> <S20BD>;<BASE>;<MIN>;<U20BD> % RUBLE SIGN
<U20BE> <S20BE>;<BASE>;<MIN>;<U20BE> % LARI SIGN
+% Implement rational range for [0-9] in regular expressions.
+% We order the collation element order to support rational ranges.
+% Collation is unaffected because the 4-level weights remain the same.
<U0030> <S0030>;<BASE>;<MIN>;<U0030> % DIGIT ZERO
+<U0031> <S0031>;<BASE>;<MIN>;<U0031> % DIGIT ONE
+<U0032> <S0032>;<BASE>;<MIN>;<U0032> % DIGIT TWO
+<U0033> <S0033>;<BASE>;<MIN>;<U0033> % DIGIT THREE
+<U0034> <S0034>;<BASE>;<MIN>;<U0034> % DIGIT FOUR
+<U0035> <S0035>;<BASE>;<MIN>;<U0035> % DIGIT FIVE
+<U0036> <S0036>;<BASE>;<MIN>;<U0036> % DIGIT SIX
+<U0037> <S0037>;<BASE>;<MIN>;<U0037> % DIGIT SEVEN
+<U0038> <S0038>;<BASE>;<MIN>;<U0038> % DIGIT EIGHT
+<U0039> <S0039>;<BASE>;<MIN>;<U0039> % DIGIT NINE
<U0660> <S0030>;<BASE>;<MIN>;<U0660> % ARABIC-INDIC DIGIT ZERO
<U06F0> <S0030>;<BASE>;<MIN>;<U06F0> % EXTENDED ARABIC-INDIC DIGIT ZERO
<U07C0> <S0030>;<BASE>;<MIN>;<U07C0> % NKO DIGIT ZERO
@@ -63250,7 +63262,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U2080> <S0030>;<BASE>;<MNS>;<U2080> % SUBSCRIPT ZERO
<U2189> "<S0030><S0033>";"<BASE><BASE>";"<FRACTION><FRACTION>";<U2189> % VULGAR FRACTION ZERO THIRDS
<U3358> "<S0030><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U3358> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ZERO
-<U0031> <S0031>;<BASE>;<MIN>;<U0031> % DIGIT ONE
<U0661> <S0031>;<BASE>;<MIN>;<U0661> % ARABIC-INDIC DIGIT ONE
<U06F1> <S0031>;<BASE>;<MIN>;<U06F1> % EXTENDED ARABIC-INDIC DIGIT ONE
<U07C1> <S0031>;<BASE>;<MIN>;<U07C1> % NKO DIGIT ONE
@@ -63440,7 +63451,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U33E0> "<S0031><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E0> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY ONE
<U32C0> "<S0031><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C0> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR JANUARY
<U3359> "<S0031><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U3359> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ONE
-<U0032> <S0032>;<BASE>;<MIN>;<U0032> % DIGIT TWO
<U0662> <S0032>;<BASE>;<MIN>;<U0662> % ARABIC-INDIC DIGIT TWO
<U06F2> <S0032>;<BASE>;<MIN>;<U06F2> % EXTENDED ARABIC-INDIC DIGIT TWO
<U07C2> <S0032>;<BASE>;<MIN>;<U07C2> % NKO DIGIT TWO
@@ -63583,7 +63593,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U33E1> "<S0032><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E1> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY TWO
<U32C1> "<S0032><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C1> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR FEBRUARY
<U335A> "<S0032><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335A> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR TWO
-<U0033> <S0033>;<BASE>;<MIN>;<U0033> % DIGIT THREE
<U0663> <S0033>;<BASE>;<MIN>;<U0663> % ARABIC-INDIC DIGIT THREE
<U06F3> <S0033>;<BASE>;<MIN>;<U06F3> % EXTENDED ARABIC-INDIC DIGIT THREE
<U07C3> <S0033>;<BASE>;<MIN>;<U07C3> % NKO DIGIT THREE
@@ -63709,7 +63718,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U33E2> "<S0033><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E2> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY THREE
<U32C2> "<S0033><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C2> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR MARCH
<U335B> "<S0033><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335B> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR THREE
-<U0034> <S0034>;<BASE>;<MIN>;<U0034> % DIGIT FOUR
<U0664> <S0034>;<BASE>;<MIN>;<U0664> % ARABIC-INDIC DIGIT FOUR
<U06F4> <S0034>;<BASE>;<MIN>;<U06F4> % EXTENDED ARABIC-INDIC DIGIT FOUR
<U07C4> <S0034>;<BASE>;<MIN>;<U07C4> % NKO DIGIT FOUR
@@ -63829,7 +63837,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U33E3> "<S0034><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E3> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY FOUR
<U32C3> "<S0034><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C3> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR APRIL
<U335C> "<S0034><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335C> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR FOUR
-<U0035> <S0035>;<BASE>;<MIN>;<U0035> % DIGIT FIVE
<U0665> <S0035>;<BASE>;<MIN>;<U0665> % ARABIC-INDIC DIGIT FIVE
<U06F5> <S0035>;<BASE>;<MIN>;<U06F5> % EXTENDED ARABIC-INDIC DIGIT FIVE
<U07C5> <S0035>;<BASE>;<MIN>;<U07C5> % NKO DIGIT FIVE
@@ -63941,7 +63948,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U33E4> "<S0035><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E4> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY FIVE
<U32C4> "<S0035><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C4> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR MAY
<U335D> "<S0035><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335D> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR FIVE
-<U0036> <S0036>;<BASE>;<MIN>;<U0036> % DIGIT SIX
<U0666> <S0036>;<BASE>;<MIN>;<U0666> % ARABIC-INDIC DIGIT SIX
<U06F6> <S0036>;<BASE>;<MIN>;<U06F6> % EXTENDED ARABIC-INDIC DIGIT SIX
<U07C6> <S0036>;<BASE>;<MIN>;<U07C6> % NKO DIGIT SIX
@@ -64036,7 +64042,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U33E5> "<S0036><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E5> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY SIX
<U32C5> "<S0036><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C5> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR JUNE
<U335E> "<S0036><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335E> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR SIX
-<U0037> <S0037>;<BASE>;<MIN>;<U0037> % DIGIT SEVEN
<U0667> <S0037>;<BASE>;<MIN>;<U0667> % ARABIC-INDIC DIGIT SEVEN
<U06F7> <S0037>;<BASE>;<MIN>;<U06F7> % EXTENDED ARABIC-INDIC DIGIT SEVEN
<U07C7> <S0037>;<BASE>;<MIN>;<U07C7> % NKO DIGIT SEVEN
@@ -64132,7 +64137,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U33E6> "<S0037><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E6> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY SEVEN
<U32C6> "<S0037><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C6> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR JULY
<U335F> "<S0037><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U335F> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR SEVEN
-<U0038> <S0038>;<BASE>;<MIN>;<U0038> % DIGIT EIGHT
<U0668> <S0038>;<BASE>;<MIN>;<U0668> % ARABIC-INDIC DIGIT EIGHT
<U06F8> <S0038>;<BASE>;<MIN>;<U06F8> % EXTENDED ARABIC-INDIC DIGIT EIGHT
<U07C8> <S0038>;<BASE>;<MIN>;<U07C8> % NKO DIGIT EIGHT
@@ -64226,7 +64230,6 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U33E7> "<S0038><RFB40><TE5E5>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U33E7> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY EIGHT
<U32C7> "<S0038><RFB40><TE708>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U32C7> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR AUGUST
<U3360> "<S0038><RFB40><TF0B9>";"<BASE><BASE>";"<COMPAT><COMPAT>";<U3360> % IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR EIGHT
-<U0039> <S0039>;<BASE>;<MIN>;<U0039> % DIGIT NINE
<U0669> <S0039>;<BASE>;<MIN>;<U0669> % ARABIC-INDIC DIGIT NINE
<U06F9> <S0039>;<BASE>;<MIN>;<U06F9> % EXTENDED ARABIC-INDIC DIGIT NINE
<U07C9> <S0039>;<BASE>;<MIN>;<U07C9> % NKO DIGIT NINE
@@ -64326,7 +64329,35 @@ order_start <LATIN>;forward;backward;forward;forward,position
else
order_start <LATIN>;forward;forward;forward;forward,position
endif
+% Implement rational range for [a-z] in regular expressions.
+% We order the collation element order to support rational ranges.
+% Collation is unaffected because the 4-level weights remain the same.
<U0061> <S0061>;<BASE>;<MIN>;<U0061> % LATIN SMALL LETTER A
+<U0062> <S0062>;<BASE>;<MIN>;<U0062> % LATIN SMALL LETTER B
+<U0063> <S0063>;<BASE>;<MIN>;<U0063> % LATIN SMALL LETTER C
+<U0064> <S0064>;<BASE>;<MIN>;<U0064> % LATIN SMALL LETTER D
+<U0065> <S0065>;<BASE>;<MIN>;<U0065> % LATIN SMALL LETTER E
+<U0066> <S0066>;<BASE>;<MIN>;<U0066> % LATIN SMALL LETTER F
+<U0067> <S0067>;<BASE>;<MIN>;<U0067> % LATIN SMALL LETTER G
+<U0068> <S0068>;<BASE>;<MIN>;<U0068> % LATIN SMALL LETTER H
+<U0069> <S0069>;<BASE>;<MIN>;<U0069> % LATIN SMALL LETTER I
+<U006A> <S006A>;<BASE>;<MIN>;<U006A> % LATIN SMALL LETTER J
+<U006B> <S006B>;<BASE>;<MIN>;<U006B> % LATIN SMALL LETTER K
+<U006C> <S006C>;<BASE>;<MIN>;<U006C> % LATIN SMALL LETTER L
+<U006D> <S006D>;<BASE>;<MIN>;<U006D> % LATIN SMALL LETTER M
+<U006E> <S006E>;<BASE>;<MIN>;<U006E> % LATIN SMALL LETTER N
+<U006F> <S006F>;<BASE>;<MIN>;<U006F> % LATIN SMALL LETTER O
+<U0070> <S0070>;<BASE>;<MIN>;<U0070> % LATIN SMALL LETTER P
+<U0071> <S0071>;<BASE>;<MIN>;<U0071> % LATIN SMALL LETTER Q
+<U0072> <S0072>;<BASE>;<MIN>;<U0072> % LATIN SMALL LETTER R
+<U0073> <S0073>;<BASE>;<MIN>;<U0073> % LATIN SMALL LETTER S
+<U0074> <S0074>;<BASE>;<MIN>;<U0074> % LATIN SMALL LETTER T
+<U0075> <S0075>;<BASE>;<MIN>;<U0075> % LATIN SMALL LETTER U
+<U0076> <S0076>;<BASE>;<MIN>;<U0076> % LATIN SMALL LETTER V
+<U0077> <S0077>;<BASE>;<MIN>;<U0077> % LATIN SMALL LETTER W
+<U0078> <S0078>;<BASE>;<MIN>;<U0078> % LATIN SMALL LETTER X
+<U0079> <S0079>;<BASE>;<MIN>;<U0079> % LATIN SMALL LETTER Y
+<U007A> <S007A>;<BASE>;<MIN>;<U007A> % LATIN SMALL LETTER Z
<UFF41> <S0061>;<BASE>;<WIDE>;<UFF41> % FULLWIDTH LATIN SMALL LETTER A
<U0363> <S0061>;<BASE>;<COMPAT>;<U0363> % COMBINING LATIN SMALL LETTER A
<U249C> <S0061>;<BASE>;<COMPAT>;<U249C> % PARENTHESIZED LATIN SMALL LETTER A
@@ -64418,7 +64449,6 @@ endif
<U0252> <S0252>;<BASE>;<MIN>;<U0252> % LATIN SMALL LETTER TURNED ALPHA
<U1D9B> <S0252>;<BASE>;<MNN>;<U1D9B> % MODIFIER LETTER SMALL TURNED ALPHA
<UAB64> <SAB64>;<BASE>;<MIN>;<UAB64> % LATIN SMALL LETTER INVERTED ALPHA
-<U0062> <S0062>;<BASE>;<MIN>;<U0062> % LATIN SMALL LETTER B
<UFF42> <S0062>;<BASE>;<WIDE>;<UFF42> % FULLWIDTH LATIN SMALL LETTER B
<U1DE8> <S0062>;<BASE>;<COMPAT>;<U1DE8> % COMBINING LATIN SMALL LETTER B
<U249D> <S0062>;<BASE>;<COMPAT>;<U249D> % PARENTHESIZED LATIN SMALL LETTER B
@@ -64454,7 +64484,6 @@ endif
<U0183> <S0183>;<BASE>;<MIN>;<U0183> % LATIN SMALL LETTER B WITH TOPBAR
<UA7B5> <SA7B5>;<BASE>;<MIN>;<UA7B5> % LATIN SMALL LETTER BETA
<U1DE9> <SA7B5>;<BASE>;<COMPAT>;<U1DE9> % COMBINING LATIN SMALL LETTER BETA
-<U0063> <S0063>;<BASE>;<MIN>;<U0063> % LATIN SMALL LETTER C
<UFF43> <S0063>;<BASE>;<WIDE>;<UFF43> % FULLWIDTH LATIN SMALL LETTER C
<U0368> <S0063>;<BASE>;<COMPAT>;<U0368> % COMBINING LATIN SMALL LETTER C
<U217D> <S0063>;<BASE>;<COMPAT>;<U217D> % SMALL ROMAN NUMERAL ONE HUNDRED
@@ -64504,7 +64533,6 @@ endif
<U1D9D> <S0255>;<BASE>;<MNN>;<U1D9D> % MODIFIER LETTER SMALL C WITH CURL
<U2184> <S2184>;<BASE>;<MIN>;<U2184> % LATIN SMALL LETTER REVERSED C
<UA73F> <SA73F>;<BASE>;<MIN>;<UA73F> % LATIN SMALL LETTER REVERSED C WITH DOT
-<U0064> <S0064>;<BASE>;<MIN>;<U0064> % LATIN SMALL LETTER D
<UFF44> <S0064>;<BASE>;<WIDE>;<UFF44> % FULLWIDTH LATIN SMALL LETTER D
<U0369> <S0064>;<BASE>;<COMPAT>;<U0369> % COMBINING LATIN SMALL LETTER D
<U217E> <S0064>;<BASE>;<COMPAT>;<U217E> % SMALL ROMAN NUMERAL FIVE HUNDRED
@@ -64563,7 +64591,6 @@ endif
<U0221> <S0221>;<BASE>;<MIN>;<U0221> % LATIN SMALL LETTER D WITH CURL
<UA771> <SA771>;<BASE>;<MIN>;<UA771> % LATIN SMALL LETTER DUM
<U1E9F> <S1E9F>;<BASE>;<MIN>;<U1E9F> % LATIN SMALL LETTER DELTA
-<U0065> <S0065>;<BASE>;<MIN>;<U0065> % LATIN SMALL LETTER E
<UFF45> <S0065>;<BASE>;<WIDE>;<UFF45> % FULLWIDTH LATIN SMALL LETTER E
<U0364> <S0065>;<BASE>;<COMPAT>;<U0364> % COMBINING LATIN SMALL LETTER E
<U24A0> <S0065>;<BASE>;<COMPAT>;<U24A0> % PARENTHESIZED LATIN SMALL LETTER E
@@ -64641,7 +64668,6 @@ endif
<U025E> <S025E>;<BASE>;<MIN>;<U025E> % LATIN SMALL LETTER CLOSED REVERSED OPEN E
<U029A> <S029A>;<BASE>;<MIN>;<U029A> % LATIN SMALL LETTER CLOSED OPEN E
<U0264> <S0264>;<BASE>;<MIN>;<U0264> % LATIN SMALL LETTER RAMS HORN
-<U0066> <S0066>;<BASE>;<MIN>;<U0066> % LATIN SMALL LETTER F
<UFF46> <S0066>;<BASE>;<WIDE>;<UFF46> % FULLWIDTH LATIN SMALL LETTER F
<U1DEB> <S0066>;<BASE>;<COMPAT>;<U1DEB> % COMBINING LATIN SMALL LETTER F
<U24A1> <S0066>;<BASE>;<COMPAT>;<U24A1> % PARENTHESIZED LATIN SMALL LETTER F
@@ -64680,7 +64706,6 @@ endif
<U0192> <S0192>;<BASE>;<MIN>;<U0192> % LATIN SMALL LETTER F WITH HOOK
<U214E> <S214E>;<BASE>;<MIN>;<U214E> % TURNED SMALL F
<UA7FB> <SA7FB>;<BASE>;<MIN>;<UA7FB> % LATIN EPIGRAPHIC LETTER REVERSED F
-<U0067> <S0067>;<BASE>;<MIN>;<U0067> % LATIN SMALL LETTER G
<UFF47> <S0067>;<BASE>;<WIDE>;<UFF47> % FULLWIDTH LATIN SMALL LETTER G
<U1DDA> <S0067>;<BASE>;<COMPAT>;<U1DDA> % COMBINING LATIN SMALL LETTER G
<U24A2> <S0067>;<BASE>;<COMPAT>;<U24A2> % PARENTHESIZED LATIN SMALL LETTER G
@@ -64727,7 +64752,6 @@ endif
<U0263> <S0263>;<BASE>;<MIN>;<U0263> % LATIN SMALL LETTER GAMMA
<U02E0> <S0263>;<BASE>;<MNN>;<U02E0> % MODIFIER LETTER SMALL GAMMA
<U01A3> <S01A3>;<BASE>;<MIN>;<U01A3> % LATIN SMALL LETTER OI
-<U0068> <S0068>;<BASE>;<MIN>;<U0068> % LATIN SMALL LETTER H
<UFF48> <S0068>;<BASE>;<WIDE>;<UFF48> % FULLWIDTH LATIN SMALL LETTER H
<U036A> <S0068>;<BASE>;<COMPAT>;<U036A> % COMBINING LATIN SMALL LETTER H
<U24A3> <S0068>;<BASE>;<COMPAT>;<U24A3> % PARENTHESIZED LATIN SMALL LETTER H
@@ -64780,7 +64804,6 @@ endif
<U0267> <S0267>;<BASE>;<MIN>;<U0267> % LATIN SMALL LETTER HENG WITH HOOK
<U02BB> <S02BB>;<BASE>;<MIN>;<U02BB> % MODIFIER LETTER TURNED COMMA
<U02BD> <S02BD>;<BASE>;<MIN>;<U02BD> % MODIFIER LETTER REVERSED COMMA
-<U0069> <S0069>;<BASE>;<MIN>;<U0069> % LATIN SMALL LETTER I
<UFF49> <S0069>;<BASE>;<WIDE>;<UFF49> % FULLWIDTH LATIN SMALL LETTER I
<U0365> <S0069>;<BASE>;<COMPAT>;<U0365> % COMBINING LATIN SMALL LETTER I
<U2170> <S0069>;<BASE>;<COMPAT>;<U2170> % SMALL ROMAN NUMERAL ONE
@@ -64844,7 +64867,6 @@ endif
<U0269> <S0269>;<BASE>;<MIN>;<U0269> % LATIN SMALL LETTER IOTA
<U1DA5> <S0269>;<BASE>;<MNN>;<U1DA5> % MODIFIER LETTER SMALL IOTA
<U1D7C> <S1D7C>;<BASE>;<MIN>;<U1D7C> % LATIN SMALL LETTER IOTA WITH STROKE
-<U006A> <S006A>;<BASE>;<MIN>;<U006A> % LATIN SMALL LETTER J
<UFF4A> <S006A>;<BASE>;<WIDE>;<UFF4A> % FULLWIDTH LATIN SMALL LETTER J
<U24A5> <S006A>;<BASE>;<COMPAT>;<U24A5> % PARENTHESIZED LATIN SMALL LETTER J
<U2149> <S006A>;<BASE>;<FONT>;<U2149> % DOUBLE-STRUCK ITALIC SMALL J
@@ -64876,7 +64898,6 @@ endif
<U025F> <S025F>;<BASE>;<MIN>;<U025F> % LATIN SMALL LETTER DOTLESS J WITH STROKE
<U1DA1> <S025F>;<BASE>;<MNN>;<U1DA1> % MODIFIER LETTER SMALL DOTLESS J WITH STROKE
<U0284> <S0284>;<BASE>;<MIN>;<U0284> % LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK
-<U006B> <S006B>;<BASE>;<MIN>;<U006B> % LATIN SMALL LETTER K
<UFF4B> <S006B>;<BASE>;<WIDE>;<UFF4B> % FULLWIDTH LATIN SMALL LETTER K
<U1DDC> <S006B>;<BASE>;<COMPAT>;<U1DDC> % COMBINING LATIN SMALL LETTER K
<U24A6> <S006B>;<BASE>;<COMPAT>;<U24A6> % PARENTHESIZED LATIN SMALL LETTER K
@@ -64926,7 +64947,6 @@ endif
<UA743> <SA743>;<BASE>;<MIN>;<UA743> % LATIN SMALL LETTER K WITH DIAGONAL STROKE
<UA745> <SA745>;<BASE>;<MIN>;<UA745> % LATIN SMALL LETTER K WITH STROKE AND DIAGONAL STROKE
<U029E> <S029E>;<BASE>;<MIN>;<U029E> % LATIN SMALL LETTER TURNED K
-<U006C> <S006C>;<BASE>;<MIN>;<U006C> % LATIN SMALL LETTER L
<UFF4C> <S006C>;<BASE>;<WIDE>;<UFF4C> % FULLWIDTH LATIN SMALL LETTER L
<U1DDD> <S006C>;<BASE>;<COMPAT>;<U1DDD> % COMBINING LATIN SMALL LETTER L
<U217C> <S006C>;<BASE>;<COMPAT>;<U217C> % SMALL ROMAN NUMERAL FIFTY
@@ -64996,7 +65016,6 @@ endif
<UA781> <SA781>;<BASE>;<MIN>;<UA781> % LATIN SMALL LETTER TURNED L
<U019B> <S019B>;<BASE>;<MIN>;<U019B> % LATIN SMALL LETTER LAMBDA WITH STROKE
<U028E> <S028E>;<BASE>;<MIN>;<U028E> % LATIN SMALL LETTER TURNED Y
-<U006D> <S006D>;<BASE>;<MIN>;<U006D> % LATIN SMALL LETTER M
<UFF4D> <S006D>;<BASE>;<WIDE>;<UFF4D> % FULLWIDTH LATIN SMALL LETTER M
<U036B> <S006D>;<BASE>;<COMPAT>;<U036B> % COMBINING LATIN SMALL LETTER M
<U217F> <S006D>;<BASE>;<COMPAT>;<U217F> % SMALL ROMAN NUMERAL ONE THOUSAND
@@ -65055,7 +65074,6 @@ endif
<UA7FD> <SA7FD>;<BASE>;<MIN>;<UA7FD> % LATIN EPIGRAPHIC LETTER INVERTED M
<UA7FF> <SA7FF>;<BASE>;<MIN>;<UA7FF> % LATIN EPIGRAPHIC LETTER ARCHAIC M
<UA773> <SA773>;<BASE>;<MIN>;<UA773> % LATIN SMALL LETTER MUM
-<U006E> <S006E>;<BASE>;<MIN>;<U006E> % LATIN SMALL LETTER N
<UFF4E> <S006E>;<BASE>;<WIDE>;<UFF4E> % FULLWIDTH LATIN SMALL LETTER N
<U1DE0> <S006E>;<BASE>;<COMPAT>;<U1DE0> % COMBINING LATIN SMALL LETTER N
<U24A9> <S006E>;<BASE>;<COMPAT>;<U24A9> % PARENTHESIZED LATIN SMALL LETTER N
@@ -65114,7 +65132,6 @@ endif
<U014B> <S014B>;<BASE>;<MIN>;<U014B> % LATIN SMALL LETTER ENG
<U1D51> <S014B>;<BASE>;<MNN>;<U1D51> % MODIFIER LETTER SMALL ENG
<UAB3C> <SAB3C>;<BASE>;<MIN>;<UAB3C> % LATIN SMALL LETTER ENG WITH CROSSED-TAIL
-<U006F> <S006F>;<BASE>;<MIN>;<U006F> % LATIN SMALL LETTER O
<UFF4F> <S006F>;<BASE>;<WIDE>;<UFF4F> % FULLWIDTH LATIN SMALL LETTER O
<U0366> <S006F>;<BASE>;<COMPAT>;<U0366> % COMBINING LATIN SMALL LETTER O
<U24AA> <S006F>;<BASE>;<COMPAT>;<U24AA> % PARENTHESIZED LATIN SMALL LETTER O
@@ -65213,7 +65230,6 @@ endif
<U0223> <S0223>;<BASE>;<MIN>;<U0223> % LATIN SMALL LETTER OU
<U1D3D> <S0223>;<BASE>;<MISCCAP>;<U1D3D> % MODIFIER LETTER CAPITAL OU
<U1D15> <S1D15>;<BASE>;<MIN>;<U1D15> % LATIN LETTER SMALL CAPITAL OU
-<U0070> <S0070>;<BASE>;<MIN>;<U0070> % LATIN SMALL LETTER P
<UFF50> <S0070>;<BASE>;<WIDE>;<UFF50> % FULLWIDTH LATIN SMALL LETTER P
<U1DEE> <S0070>;<BASE>;<COMPAT>;<U1DEE> % COMBINING LATIN SMALL LETTER P
<U24AB> <S0070>;<BASE>;<COMPAT>;<U24AB> % PARENTHESIZED LATIN SMALL LETTER P
@@ -65262,7 +65278,6 @@ endif
<U0278> <S0278>;<BASE>;<MIN>;<U0278> % LATIN SMALL LETTER PHI
<U1DB2> <S0278>;<BASE>;<MNN>;<U1DB2> % MODIFIER LETTER SMALL PHI
<U2C77> <S2C77>;<BASE>;<MIN>;<U2C77> % LATIN SMALL LETTER TAILLESS PHI
-<U0071> <S0071>;<BASE>;<MIN>;<U0071> % LATIN SMALL LETTER Q
<UFF51> <S0071>;<BASE>;<WIDE>;<UFF51> % FULLWIDTH LATIN SMALL LETTER Q
<U24AC> <S0071>;<BASE>;<COMPAT>;<U24AC> % PARENTHESIZED LATIN SMALL LETTER Q
<U0001D42A> <S0071>;<BASE>;<FONT>;<U0001D42A> % MATHEMATICAL BOLD SMALL Q
@@ -65285,7 +65300,6 @@ endif
<U02A0> <S02A0>;<BASE>;<MIN>;<U02A0> % LATIN SMALL LETTER Q WITH HOOK
<U024B> <S024B>;<BASE>;<MIN>;<U024B> % LATIN SMALL LETTER Q WITH HOOK TAIL
<U0138> <S0138>;<BASE>;<MIN>;<U0138> % LATIN SMALL LETTER KRA
-<U0072> <S0072>;<BASE>;<MIN>;<U0072> % LATIN SMALL LETTER R
<UFF52> <S0072>;<BASE>;<WIDE>;<UFF52> % FULLWIDTH LATIN SMALL LETTER R
<U036C> <S0072>;<BASE>;<COMPAT>;<U036C> % COMBINING LATIN SMALL LETTER R
<U1DCA> <S0072>;<BASE>;<COMPAT>;<U1DCA> % COMBINING LATIN SMALL LETTER R BELOW
@@ -65354,7 +65368,6 @@ endif
<UA775> <SA775>;<BASE>;<MIN>;<UA775> % LATIN SMALL LETTER RUM
<UA776> <SA776>;<BASE>;<MIN>;<UA776> % LATIN LETTER SMALL CAPITAL RUM
<UA75D> <SA75D>;<BASE>;<MIN>;<UA75D> % LATIN SMALL LETTER RUM ROTUNDA
-<U0073> <S0073>;<BASE>;<MIN>;<U0073> % LATIN SMALL LETTER S
<UFF53> <S0073>;<BASE>;<WIDE>;<UFF53> % FULLWIDTH LATIN SMALL LETTER S
<U1DE4> <S0073>;<BASE>;<COMPAT>;<U1DE4> % COMBINING LATIN SMALL LETTER S
<U24AE> <S0073>;<BASE>;<COMPAT>;<U24AE> % PARENTHESIZED LATIN SMALL LETTER S
@@ -65417,7 +65430,6 @@ endif
<U0285> <S0285>;<BASE>;<MIN>;<U0285> % LATIN SMALL LETTER SQUAT REVERSED ESH
<U1D98> <S1D98>;<BASE>;<MIN>;<U1D98> % LATIN SMALL LETTER ESH WITH RETROFLEX HOOK
<U0286> <S0286>;<BASE>;<MIN>;<U0286> % LATIN SMALL LETTER ESH WITH CURL
-<U0074> <S0074>;<BASE>;<MIN>;<U0074> % LATIN SMALL LETTER T
<UFF54> <S0074>;<BASE>;<WIDE>;<UFF54> % FULLWIDTH LATIN SMALL LETTER T
<U036D> <S0074>;<BASE>;<COMPAT>;<U036D> % COMBINING LATIN SMALL LETTER T
<U24AF> <S0074>;<BASE>;<COMPAT>;<U24AF> % PARENTHESIZED LATIN SMALL LETTER T
@@ -65467,7 +65479,6 @@ endif
<U0236> <S0236>;<BASE>;<MIN>;<U0236> % LATIN SMALL LETTER T WITH CURL
<UA777> <SA777>;<BASE>;<MIN>;<UA777> % LATIN SMALL LETTER TUM
<U0287> <S0287>;<BASE>;<MIN>;<U0287> % LATIN SMALL LETTER TURNED T
-<U0075> <S0075>;<BASE>;<MIN>;<U0075> % LATIN SMALL LETTER U
<UFF55> <S0075>;<BASE>;<WIDE>;<UFF55> % FULLWIDTH LATIN SMALL LETTER U
<U0367> <S0075>;<BASE>;<COMPAT>;<U0367> % COMBINING LATIN SMALL LETTER U
<U24B0> <S0075>;<BASE>;<COMPAT>;<U24B0> % PARENTHESIZED LATIN SMALL LETTER U
@@ -65552,7 +65563,6 @@ endif
<U028A> <S028A>;<BASE>;<MIN>;<U028A> % LATIN SMALL LETTER UPSILON
<U1DB7> <S028A>;<BASE>;<MNN>;<U1DB7> % MODIFIER LETTER SMALL UPSILON
<U1D7F> <S1D7F>;<BASE>;<MIN>;<U1D7F> % LATIN SMALL LETTER UPSILON WITH STROKE
-<U0076> <S0076>;<BASE>;<MIN>;<U0076> % LATIN SMALL LETTER V
<UFF56> <S0076>;<BASE>;<WIDE>;<UFF56> % FULLWIDTH LATIN SMALL LETTER V
<U036E> <S0076>;<BASE>;<COMPAT>;<U036E> % COMBINING LATIN SMALL LETTER V
<U2174> <S0076>;<BASE>;<COMPAT>;<U2174> % SMALL ROMAN NUMERAL FIVE
@@ -65593,7 +65603,6 @@ endif
<U1EFD> <S1EFD>;<BASE>;<MIN>;<U1EFD> % LATIN SMALL LETTER MIDDLE-WELSH V
<U028C> <S028C>;<BASE>;<MIN>;<U028C> % LATIN SMALL LETTER TURNED V
<U1DBA> <S028C>;<BASE>;<MNN>;<U1DBA> % MODIFIER LETTER SMALL TURNED V
-<U0077> <S0077>;<BASE>;<MIN>;<U0077> % LATIN SMALL LETTER W
<UFF57> <S0077>;<BASE>;<WIDE>;<UFF57> % FULLWIDTH LATIN SMALL LETTER W
<U1DF1> <S0077>;<BASE>;<COMPAT>;<U1DF1> % COMBINING LATIN SMALL LETTER W
<U24B2> <S0077>;<BASE>;<COMPAT>;<U24B2> % PARENTHESIZED LATIN SMALL LETTER W
@@ -65627,7 +65636,6 @@ endif
<U1D21> <S1D21>;<BASE>;<MIN>;<U1D21> % LATIN LETTER SMALL CAPITAL W
<U2C73> <S2C73>;<BASE>;<MIN>;<U2C73> % LATIN SMALL LETTER W WITH HOOK
<U028D> <S028D>;<BASE>;<MIN>;<U028D> % LATIN SMALL LETTER TURNED W
-<U0078> <S0078>;<BASE>;<MIN>;<U0078> % LATIN SMALL LETTER X
<UFF58> <S0078>;<BASE>;<WIDE>;<UFF58> % FULLWIDTH LATIN SMALL LETTER X
<U036F> <S0078>;<BASE>;<COMPAT>;<U036F> % COMBINING LATIN SMALL LETTER X
<U2179> <S0078>;<BASE>;<COMPAT>;<U2179> % SMALL ROMAN NUMERAL TEN
@@ -65660,7 +65668,6 @@ endif
<UAB53> <SAB53>;<BASE>;<MIN>;<UAB53> % LATIN SMALL LETTER CHI
<UAB54> <SAB54>;<BASE>;<MIN>;<UAB54> % LATIN SMALL LETTER CHI WITH LOW RIGHT RING
<UAB55> <SAB55>;<BASE>;<MIN>;<UAB55> % LATIN SMALL LETTER CHI WITH LOW LEFT SERIF
-<U0079> <S0079>;<BASE>;<MIN>;<U0079> % LATIN SMALL LETTER Y
<UFF59> <S0079>;<BASE>;<WIDE>;<UFF59> % FULLWIDTH LATIN SMALL LETTER Y
<U24B4> <S0079>;<BASE>;<COMPAT>;<U24B4> % PARENTHESIZED LATIN SMALL LETTER Y
<U0001D432> <S0079>;<BASE>;<FONT>;<U0001D432> % MATHEMATICAL BOLD SMALL Y
@@ -65694,7 +65701,6 @@ endif
<U1EFF> <S1EFF>;<BASE>;<MIN>;<U1EFF> % LATIN SMALL LETTER Y WITH LOOP
<UAB5A> <SAB5A>;<BASE>;<MIN>;<UAB5A> % LATIN SMALL LETTER Y WITH SHORT RIGHT LEG
<U021D> <S021D>;<BASE>;<MIN>;<U021D> % LATIN SMALL LETTER YOGH
-<U007A> <S007A>;<BASE>;<MIN>;<U007A> % LATIN SMALL LETTER Z
<UFF5A> <S007A>;<BASE>;<WIDE>;<UFF5A> % FULLWIDTH LATIN SMALL LETTER Z
<U1DE6> <S007A>;<BASE>;<COMPAT>;<U1DE6> % COMBINING LATIN SMALL LETTER Z
<U24B5> <S007A>;<BASE>;<COMPAT>;<U24B5> % PARENTHESIZED LATIN SMALL LETTER Z
@@ -65796,7 +65802,35 @@ endif
<U0001D736> <S03B1>;<BASE>;<FONT>;<U0001D736> % MATHEMATICAL BOLD ITALIC SMALL ALPHA
<U0001D770> <S03B1>;<BASE>;<FONT>;<U0001D770> % MATHEMATICAL SANS-SERIF BOLD SMALL ALPHA
<U0001D7AA> <S03B1>;<BASE>;<FONT>;<U0001D7AA> % MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA
+% Implement rational range for [A-Z] in regular expressions.
+% We order the collation element order to support rational ranges.
+% Collation is unaffected because the 4-level weights remain the same.
<U0041> <S0061>;<BASE>;<CAP>;<U0041> % LATIN CAPITAL LETTER A
+<U0042> <S0062>;<BASE>;<CAP>;<U0042> % LATIN CAPITAL LETTER B
+<U0043> <S0063>;<BASE>;<CAP>;<U0043> % LATIN CAPITAL LETTER C
+<U0044> <S0064>;<BASE>;<CAP>;<U0044> % LATIN CAPITAL LETTER D
+<U0045> <S0065>;<BASE>;<CAP>;<U0045> % LATIN CAPITAL LETTER E
+<U0046> <S0066>;<BASE>;<CAP>;<U0046> % LATIN CAPITAL LETTER F
+<U0047> <S0067>;<BASE>;<CAP>;<U0047> % LATIN CAPITAL LETTER G
+<U0048> <S0068>;<BASE>;<CAP>;<U0048> % LATIN CAPITAL LETTER H
+<U0049> <S0069>;<BASE>;<CAP>;<U0049> % LATIN CAPITAL LETTER I
+<U004A> <S006A>;<BASE>;<CAP>;<U004A> % LATIN CAPITAL LETTER J
+<U004B> <S006B>;<BASE>;<CAP>;<U004B> % LATIN CAPITAL LETTER K
+<U004C> <S006C>;<BASE>;<CAP>;<U004C> % LATIN CAPITAL LETTER L
+<U004D> <S006D>;<BASE>;<CAP>;<U004D> % LATIN CAPITAL LETTER M
+<U004E> <S006E>;<BASE>;<CAP>;<U004E> % LATIN CAPITAL LETTER N
+<U004F> <S006F>;<BASE>;<CAP>;<U004F> % LATIN CAPITAL LETTER O
+<U0050> <S0070>;<BASE>;<CAP>;<U0050> % LATIN CAPITAL LETTER P
+<U0051> <S0071>;<BASE>;<CAP>;<U0051> % LATIN CAPITAL LETTER Q
+<U0052> <S0072>;<BASE>;<CAP>;<U0052> % LATIN CAPITAL LETTER R
+<U0053> <S0073>;<BASE>;<CAP>;<U0053> % LATIN CAPITAL LETTER S
+<U0054> <S0074>;<BASE>;<CAP>;<U0054> % LATIN CAPITAL LETTER T
+<U0055> <S0075>;<BASE>;<CAP>;<U0055> % LATIN CAPITAL LETTER U
+<U0056> <S0076>;<BASE>;<CAP>;<U0056> % LATIN CAPITAL LETTER V
+<U0057> <S0077>;<BASE>;<CAP>;<U0057> % LATIN CAPITAL LETTER W
+<U0058> <S0078>;<BASE>;<CAP>;<U0058> % LATIN CAPITAL LETTER X
+<U0059> <S0079>;<BASE>;<CAP>;<U0059> % LATIN CAPITAL LETTER Y
+<U005A> <S007A>;<BASE>;<CAP>;<U005A> % LATIN CAPITAL LETTER Z
<UFF21> <S0061>;<BASE>;<WIDECAP>;<UFF21> % FULLWIDTH LATIN CAPITAL LETTER A
<U0001F110> <S0061>;<BASE>;<COMPATCAP>;<U0001F110> % PARENTHESIZED LATIN CAPITAL LETTER A
<U0001D400> <S0061>;<BASE>;<FONTCAP>;<U0001D400> % MATHEMATICAL BOLD CAPITAL A
@@ -65860,7 +65894,6 @@ endif
<U2C6F> <S0250>;<BASE>;<CAP>;<U2C6F> % LATIN CAPITAL LETTER TURNED A
<U2C6D> <S0251>;<BASE>;<CAP>;<U2C6D> % LATIN CAPITAL LETTER ALPHA
<U2C70> <S0252>;<BASE>;<CAP>;<U2C70> % LATIN CAPITAL LETTER TURNED ALPHA
-<U0042> <S0062>;<BASE>;<CAP>;<U0042> % LATIN CAPITAL LETTER B
<UFF22> <S0062>;<BASE>;<WIDECAP>;<UFF22> % FULLWIDTH LATIN CAPITAL LETTER B
<U0001F111> <S0062>;<BASE>;<COMPATCAP>;<U0001F111> % PARENTHESIZED LATIN CAPITAL LETTER B
<U212C> <S0062>;<BASE>;<FONTCAP>;<U212C> % SCRIPT CAPITAL B
@@ -65888,7 +65921,6 @@ endif
<U0181> <S0253>;<BASE>;<CAP>;<U0181> % LATIN CAPITAL LETTER B WITH HOOK
<U0182> <S0183>;<BASE>;<CAP>;<U0182> % LATIN CAPITAL LETTER B WITH TOPBAR
<UA7B4> <SA7B5>;<BASE>;<CAP>;<UA7B4> % LATIN CAPITAL LETTER BETA
-<U0043> <S0063>;<BASE>;<CAP>;<U0043> % LATIN CAPITAL LETTER C
<UFF23> <S0063>;<BASE>;<WIDECAP>;<UFF23> % FULLWIDTH LATIN CAPITAL LETTER C
<U216D> <S0063>;<BASE>;<COMPATCAP>;<U216D> % ROMAN NUMERAL ONE HUNDRED
<U0001F112> <S0063>;<BASE>;<COMPATCAP>;<U0001F112> % PARENTHESIZED LATIN CAPITAL LETTER C
@@ -65921,7 +65953,6 @@ endif
<U0187> <S0188>;<BASE>;<CAP>;<U0187> % LATIN CAPITAL LETTER C WITH HOOK
<U2183> <S2184>;<BASE>;<CAP>;<U2183> % ROMAN NUMERAL REVERSED ONE HUNDRED
<UA73E> <SA73F>;<BASE>;<CAP>;<UA73E> % LATIN CAPITAL LETTER REVERSED C WITH DOT
-<U0044> <S0064>;<BASE>;<CAP>;<U0044> % LATIN CAPITAL LETTER D
<UFF24> <S0064>;<BASE>;<WIDECAP>;<UFF24> % FULLWIDTH LATIN CAPITAL LETTER D
<U216E> <S0064>;<BASE>;<COMPATCAP>;<U216E> % ROMAN NUMERAL FIVE HUNDRED
<U0001F113> <S0064>;<BASE>;<COMPATCAP>;<U0001F113> % PARENTHESIZED LATIN CAPITAL LETTER D
@@ -65959,7 +65990,6 @@ endif
<U0189> <S0256>;<BASE>;<CAP>;<U0189> % LATIN CAPITAL LETTER AFRICAN D
<U018A> <S0257>;<BASE>;<CAP>;<U018A> % LATIN CAPITAL LETTER D WITH HOOK
<U018B> <S018C>;<BASE>;<CAP>;<U018B> % LATIN CAPITAL LETTER D WITH TOPBAR
-<U0045> <S0065>;<BASE>;<CAP>;<U0045> % LATIN CAPITAL LETTER E
<UFF25> <S0065>;<BASE>;<WIDECAP>;<UFF25> % FULLWIDTH LATIN CAPITAL LETTER E
<U0001F114> <S0065>;<BASE>;<COMPATCAP>;<U0001F114> % PARENTHESIZED LATIN CAPITAL LETTER E
<U2130> <S0065>;<BASE>;<FONTCAP>;<U2130> % SCRIPT CAPITAL E
@@ -66010,7 +66040,6 @@ endif
<U0190> <S025B>;<BASE>;<CAP>;<U0190> % LATIN CAPITAL LETTER OPEN E
<U2107> <S025B>;<BASE>;<COMPATCAP>;<U2107> % EULER CONSTANT
<UA7AB> <S025C>;<BASE>;<CAP>;<UA7AB> % LATIN CAPITAL LETTER REVERSED OPEN E
-<U0046> <S0066>;<BASE>;<CAP>;<U0046> % LATIN CAPITAL LETTER F
<UFF26> <S0066>;<BASE>;<WIDECAP>;<UFF26> % FULLWIDTH LATIN CAPITAL LETTER F
<U0001F115> <S0066>;<BASE>;<COMPATCAP>;<U0001F115> % PARENTHESIZED LATIN CAPITAL LETTER F
<U2131> <S0066>;<BASE>;<FONTCAP>;<U2131> % SCRIPT CAPITAL F
@@ -66035,7 +66064,6 @@ endif
<UA798> <SA799>;<BASE>;<CAP>;<UA798> % LATIN CAPITAL LETTER F WITH STROKE
<U0191> <S0192>;<BASE>;<CAP>;<U0191> % LATIN CAPITAL LETTER F WITH HOOK
<U2132> <S214E>;<BASE>;<CAP>;<U2132> % TURNED CAPITAL F
-<U0047> <S0067>;<BASE>;<CAP>;<U0047> % LATIN CAPITAL LETTER G
<UFF27> <S0067>;<BASE>;<WIDECAP>;<UFF27> % FULLWIDTH LATIN CAPITAL LETTER G
<U0001F116> <S0067>;<BASE>;<COMPATCAP>;<U0001F116> % PARENTHESIZED LATIN CAPITAL LETTER G
<U0001D406> <S0067>;<BASE>;<FONTCAP>;<U0001D406> % MATHEMATICAL BOLD CAPITAL G
@@ -66071,7 +66099,6 @@ endif
<UA77E> <SA77F>;<BASE>;<CAP>;<UA77E> % LATIN CAPITAL LETTER TURNED INSULAR G
<U0194> <S0263>;<BASE>;<CAP>;<U0194> % LATIN CAPITAL LETTER GAMMA
<U01A2> <S01A3>;<BASE>;<CAP>;<U01A2> % LATIN CAPITAL LETTER OI
-<U0048> <S0068>;<BASE>;<CAP>;<U0048> % LATIN CAPITAL LETTER H
<UFF28> <S0068>;<BASE>;<WIDECAP>;<UFF28> % FULLWIDTH LATIN CAPITAL LETTER H
<U0001F117> <S0068>;<BASE>;<COMPATCAP>;<U0001F117> % PARENTHESIZED LATIN CAPITAL LETTER H
<U210B> <S0068>;<BASE>;<FONTCAP>;<U210B> % SCRIPT CAPITAL H
@@ -66104,7 +66131,6 @@ endif
<U2C67> <S2C68>;<BASE>;<CAP>;<U2C67> % LATIN CAPITAL LETTER H WITH DESCENDER
<U2C75> <S2C76>;<BASE>;<CAP>;<U2C75> % LATIN CAPITAL LETTER HALF H
<UA726> <SA727>;<BASE>;<CAP>;<UA726> % LATIN CAPITAL LETTER HENG
-<U0049> <S0069>;<BASE>;<CAP>;<U0049> % LATIN CAPITAL LETTER I
<UFF29> <S0069>;<BASE>;<WIDECAP>;<UFF29> % FULLWIDTH LATIN CAPITAL LETTER I
<U2160> <S0069>;<BASE>;<COMPATCAP>;<U2160> % ROMAN NUMERAL ONE
<U0001F118> <S0069>;<BASE>;<COMPATCAP>;<U0001F118> % PARENTHESIZED LATIN CAPITAL LETTER I
@@ -66149,7 +66175,6 @@ endif
<UA7AE> <S026A>;<BASE>;<CAP>;<UA7AE> % LATIN CAPITAL LETTER SMALL CAPITAL I
<U0197> <S0268>;<BASE>;<CAP>;<U0197> % LATIN CAPITAL LETTER I WITH STROKE
<U0196> <S0269>;<BASE>;<CAP>;<U0196> % LATIN CAPITAL LETTER IOTA
-<U004A> <S006A>;<BASE>;<CAP>;<U004A> % LATIN CAPITAL LETTER J
<UFF2A> <S006A>;<BASE>;<WIDECAP>;<UFF2A> % FULLWIDTH LATIN CAPITAL LETTER J
<U0001F119> <S006A>;<BASE>;<COMPATCAP>;<U0001F119> % PARENTHESIZED LATIN CAPITAL LETTER J
<U0001D409> <S006A>;<BASE>;<FONTCAP>;<U0001D409> % MATHEMATICAL BOLD CAPITAL J
@@ -66172,7 +66197,6 @@ endif
<U0134> <S006A>;"<BASE><CIRCF>";"<CAP><MIN>";<U0134> % LATIN CAPITAL LETTER J WITH CIRCUMFLEX
<U0248> <S0249>;<BASE>;<CAP>;<U0248> % LATIN CAPITAL LETTER J WITH STROKE
<UA7B2> <S029D>;<BASE>;<CAP>;<UA7B2> % LATIN CAPITAL LETTER J WITH CROSSED-TAIL
-<U004B> <S006B>;<BASE>;<CAP>;<U004B> % LATIN CAPITAL LETTER K
<U212A> <S006B>;<BASE>;<CAP>;<U212A> % KELVIN SIGN
<UFF2B> <S006B>;<BASE>;<WIDECAP>;<UFF2B> % FULLWIDTH LATIN CAPITAL LETTER K
<U0001F11A> <S006B>;<BASE>;<COMPATCAP>;<U0001F11A> % PARENTHESIZED LATIN CAPITAL LETTER K
@@ -66206,7 +66230,6 @@ endif
<UA742> <SA743>;<BASE>;<CAP>;<UA742> % LATIN CAPITAL LETTER K WITH DIAGONAL STROKE
<UA744> <SA745>;<BASE>;<CAP>;<UA744> % LATIN CAPITAL LETTER K WITH STROKE AND DIAGONAL STROKE
<UA7B0> <S029E>;<BASE>;<CAP>;<UA7B0> % LATIN CAPITAL LETTER TURNED K
-<U004C> <S006C>;<BASE>;<CAP>;<U004C> % LATIN CAPITAL LETTER L
<UFF2C> <S006C>;<BASE>;<WIDECAP>;<UFF2C> % FULLWIDTH LATIN CAPITAL LETTER L
<U216C> <S006C>;<BASE>;<COMPATCAP>;<U216C> % ROMAN NUMERAL FIFTY
<U0001F11B> <S006C>;<BASE>;<COMPATCAP>;<U0001F11B> % PARENTHESIZED LATIN CAPITAL LETTER L
@@ -66249,7 +66272,6 @@ endif
<U2C62> <S026B>;<BASE>;<CAP>;<U2C62> % LATIN CAPITAL LETTER L WITH MIDDLE TILDE
<UA7AD> <S026C>;<BASE>;<CAP>;<UA7AD> % LATIN CAPITAL LETTER L WITH BELT
<UA780> <SA781>;<BASE>;<CAP>;<UA780> % LATIN CAPITAL LETTER TURNED L
-<U004D> <S006D>;<BASE>;<CAP>;<U004D> % LATIN CAPITAL LETTER M
<UFF2D> <S006D>;<BASE>;<WIDECAP>;<UFF2D> % FULLWIDTH LATIN CAPITAL LETTER M
<U216F> <S006D>;<BASE>;<COMPATCAP>;<U216F> % ROMAN NUMERAL ONE THOUSAND
<U0001F11C> <S006D>;<BASE>;<COMPATCAP>;<U0001F11C> % PARENTHESIZED LATIN CAPITAL LETTER M
@@ -66275,7 +66297,6 @@ endif
<U1E42> <S006D>;"<BASE><POINS>";"<CAP><MIN>";<U1E42> % LATIN CAPITAL LETTER M WITH DOT BELOW
<U1DDF> <S1D0D>;<BASE>;<COMPAT>;<U1DDF> % COMBINING LATIN LETTER SMALL CAPITAL M
<U2C6E> <S0271>;<BASE>;<CAP>;<U2C6E> % LATIN CAPITAL LETTER M WITH HOOK
-<U004E> <S006E>;<BASE>;<CAP>;<U004E> % LATIN CAPITAL LETTER N
<UFF2E> <S006E>;<BASE>;<WIDECAP>;<UFF2E> % FULLWIDTH LATIN CAPITAL LETTER N
<U0001F11D> <S006E>;<BASE>;<COMPATCAP>;<U0001F11D> % PARENTHESIZED LATIN CAPITAL LETTER N
<U2115> <S006E>;<BASE>;<FONTCAP>;<U2115> % DOUBLE-STRUCK CAPITAL N
@@ -66312,7 +66333,6 @@ endif
<U0220> <S019E>;<BASE>;<CAP>;<U0220> % LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
<UA790> <SA791>;<BASE>;<CAP>;<UA790> % LATIN CAPITAL LETTER N WITH DESCENDER
<U014A> <S014B>;<BASE>;<CAP>;<U014A> % LATIN CAPITAL LETTER ENG
-<U004F> <S006F>;<BASE>;<CAP>;<U004F> % LATIN CAPITAL LETTER O
<UFF2F> <S006F>;<BASE>;<WIDECAP>;<UFF2F> % FULLWIDTH LATIN CAPITAL LETTER O
<U0001F11E> <S006F>;<BASE>;<COMPATCAP>;<U0001F11E> % PARENTHESIZED LATIN CAPITAL LETTER O
<U0001D40E> <S006F>;<BASE>;<FONTCAP>;<U0001D40E> % MATHEMATICAL BOLD CAPITAL O
@@ -66377,7 +66397,6 @@ endif
<UA74A> <SA74B>;<BASE>;<CAP>;<UA74A> % LATIN CAPITAL LETTER O WITH LONG STROKE OVERLAY
<UA7B6> <SA7B7>;<BASE>;<CAP>;<UA7B6> % LATIN CAPITAL LETTER OMEGA
<U0222> <S0223>;<BASE>;<CAP>;<U0222> % LATIN CAPITAL LETTER OU
-<U0050> <S0070>;<BASE>;<CAP>;<U0050> % LATIN CAPITAL LETTER P
<UFF30> <S0070>;<BASE>;<WIDECAP>;<UFF30> % FULLWIDTH LATIN CAPITAL LETTER P
<U0001F11F> <S0070>;<BASE>;<COMPATCAP>;<U0001F11F> % PARENTHESIZED LATIN CAPITAL LETTER P
<U2119> <S0070>;<BASE>;<FONTCAP>;<U2119> % DOUBLE-STRUCK CAPITAL P
@@ -66405,7 +66424,6 @@ endif
<U01A4> <S01A5>;<BASE>;<CAP>;<U01A4> % LATIN CAPITAL LETTER P WITH HOOK
<UA752> <SA753>;<BASE>;<CAP>;<UA752> % LATIN CAPITAL LETTER P WITH FLOURISH
<UA754> <SA755>;<BASE>;<CAP>;<UA754> % LATIN CAPITAL LETTER P WITH SQUIRREL TAIL
-<U0051> <S0071>;<BASE>;<CAP>;<U0051> % LATIN CAPITAL LETTER Q
<UFF31> <S0071>;<BASE>;<WIDECAP>;<UFF31> % FULLWIDTH LATIN CAPITAL LETTER Q
<U0001F120> <S0071>;<BASE>;<COMPATCAP>;<U0001F120> % PARENTHESIZED LATIN CAPITAL LETTER Q
<U211A> <S0071>;<BASE>;<FONTCAP>;<U211A> % DOUBLE-STRUCK CAPITAL Q
@@ -66428,7 +66446,6 @@ endif
<UA756> <SA757>;<BASE>;<CAP>;<UA756> % LATIN CAPITAL LETTER Q WITH STROKE THROUGH DESCENDER
<UA758> <SA759>;<BASE>;<CAP>;<UA758> % LATIN CAPITAL LETTER Q WITH DIAGONAL STROKE
<U024A> <S024B>;<BASE>;<CAP>;<U024A> % LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL
-<U0052> <S0072>;<BASE>;<CAP>;<U0052> % LATIN CAPITAL LETTER R
<UFF32> <S0072>;<BASE>;<WIDECAP>;<UFF32> % FULLWIDTH LATIN CAPITAL LETTER R
<U0001F121> <S0072>;<BASE>;<COMPATCAP>;<U0001F121> % PARENTHESIZED LATIN CAPITAL LETTER R
<U211B> <S0072>;<BASE>;<FONTCAP>;<U211B> % SCRIPT CAPITAL R
@@ -66466,7 +66483,6 @@ endif
<U024C> <S024D>;<BASE>;<CAP>;<U024C> % LATIN CAPITAL LETTER R WITH STROKE
<U2C64> <S027D>;<BASE>;<CAP>;<U2C64> % LATIN CAPITAL LETTER R WITH TAIL
<UA75C> <SA75D>;<BASE>;<CAP>;<UA75C> % LATIN CAPITAL LETTER RUM ROTUNDA
-<U0053> <S0073>;<BASE>;<CAP>;<U0053> % LATIN CAPITAL LETTER S
<UFF33> <S0073>;<BASE>;<WIDECAP>;<UFF33> % FULLWIDTH LATIN CAPITAL LETTER S
<U0001F122> <S0073>;<BASE>;<COMPATCAP>;<U0001F122> % PARENTHESIZED LATIN CAPITAL LETTER S
<U0001F12A> <S0073>;<BASE>;<COMPATCAP>;<U0001F12A> % TORTOISE SHELL BRACKETED LATIN CAPITAL LETTER S
@@ -66502,7 +66518,6 @@ endif
<U1E9E> "<S0073><S0073>";"<BASE><VRNT1><BASE>";"<COMPATCAP><COMPAT><COMPATCAP>";<U1E9E> % LATIN CAPITAL LETTER SHARP S
<U2C7E> <S023F>;<BASE>;<CAP>;<U2C7E> % LATIN CAPITAL LETTER S WITH SWASH TAIL
<U01A9> <S0283>;<BASE>;<CAP>;<U01A9> % LATIN CAPITAL LETTER ESH
-<U0054> <S0074>;<BASE>;<CAP>;<U0054> % LATIN CAPITAL LETTER T
<UFF34> <S0074>;<BASE>;<WIDECAP>;<UFF34> % FULLWIDTH LATIN CAPITAL LETTER T
<U0001F123> <S0074>;<BASE>;<COMPATCAP>;<U0001F123> % PARENTHESIZED LATIN CAPITAL LETTER T
<U0001D413> <S0074>;<BASE>;<FONTCAP>;<U0001D413> % MATHEMATICAL BOLD CAPITAL T
@@ -66536,7 +66551,6 @@ endif
<U01AC> <S01AD>;<BASE>;<CAP>;<U01AC> % LATIN CAPITAL LETTER T WITH HOOK
<U01AE> <S0288>;<BASE>;<CAP>;<U01AE> % LATIN CAPITAL LETTER T WITH RETROFLEX HOOK
<UA7B1> <S0287>;<BASE>;<CAP>;<UA7B1> % LATIN CAPITAL LETTER TURNED T
-<U0055> <S0075>;<BASE>;<CAP>;<U0055> % LATIN CAPITAL LETTER U
<UFF35> <S0075>;<BASE>;<WIDECAP>;<UFF35> % FULLWIDTH LATIN CAPITAL LETTER U
<U0001F124> <S0075>;<BASE>;<COMPATCAP>;<U0001F124> % PARENTHESIZED LATIN CAPITAL LETTER U
<U0001D414> <S0075>;<BASE>;<FONTCAP>;<U0001D414> % MATHEMATICAL BOLD CAPITAL U
@@ -66591,7 +66605,6 @@ endif
<UA78D> <S0265>;<BASE>;<CAP>;<UA78D> % LATIN CAPITAL LETTER TURNED H
<U019C> <S026F>;<BASE>;<CAP>;<U019C> % LATIN CAPITAL LETTER TURNED M
<U01B1> <S028A>;<BASE>;<CAP>;<U01B1> % LATIN CAPITAL LETTER UPSILON
-<U0056> <S0076>;<BASE>;<CAP>;<U0056> % LATIN CAPITAL LETTER V
<UFF36> <S0076>;<BASE>;<WIDECAP>;<UFF36> % FULLWIDTH LATIN CAPITAL LETTER V
<U2164> <S0076>;<BASE>;<COMPATCAP>;<U2164> % ROMAN NUMERAL FIVE
<U0001F125> <S0076>;<BASE>;<COMPATCAP>;<U0001F125> % PARENTHESIZED LATIN CAPITAL LETTER V
@@ -66622,7 +66635,6 @@ endif
<U01B2> <S028B>;<BASE>;<CAP>;<U01B2> % LATIN CAPITAL LETTER V WITH HOOK
<U1EFC> <S1EFD>;<BASE>;<CAP>;<U1EFC> % LATIN CAPITAL LETTER MIDDLE-WELSH V
<U0245> <S028C>;<BASE>;<CAP>;<U0245> % LATIN CAPITAL LETTER TURNED V
-<U0057> <S0077>;<BASE>;<CAP>;<U0057> % LATIN CAPITAL LETTER W
<UFF37> <S0077>;<BASE>;<WIDECAP>;<UFF37> % FULLWIDTH LATIN CAPITAL LETTER W
<U0001F126> <S0077>;<BASE>;<COMPATCAP>;<U0001F126> % PARENTHESIZED LATIN CAPITAL LETTER W
<U0001D416> <S0077>;<BASE>;<FONTCAP>;<U0001D416> % MATHEMATICAL BOLD CAPITAL W
@@ -66649,7 +66661,6 @@ endif
<U1E86> <S0077>;"<BASE><POINT>";"<CAP><MIN>";<U1E86> % LATIN CAPITAL LETTER W WITH DOT ABOVE
<U1E88> <S0077>;"<BASE><POINS>";"<CAP><MIN>";<U1E88> % LATIN CAPITAL LETTER W WITH DOT BELOW
<U2C72> <S2C73>;<BASE>;<CAP>;<U2C72> % LATIN CAPITAL LETTER W WITH HOOK
-<U0058> <S0078>;<BASE>;<CAP>;<U0058> % LATIN CAPITAL LETTER X
<UFF38> <S0078>;<BASE>;<WIDECAP>;<UFF38> % FULLWIDTH LATIN CAPITAL LETTER X
<U2169> <S0078>;<BASE>;<COMPATCAP>;<U2169> % ROMAN NUMERAL TEN
<U0001F127> <S0078>;<BASE>;<COMPATCAP>;<U0001F127> % PARENTHESIZED LATIN CAPITAL LETTER X
@@ -66675,7 +66686,6 @@ endif
<U216A> "<S0078><S0069>";"<BASE><BASE>";"<COMPATCAP><COMPATCAP>";<U216A> % ROMAN NUMERAL ELEVEN
<U216B> "<S0078><S0069><S0069>";"<BASE><BASE><BASE>";"<COMPATCAP><COMPATCAP><COMPATCAP>";<U216B> % ROMAN NUMERAL TWELVE
<UA7B3> <SAB53>;<BASE>;<CAP>;<UA7B3> % LATIN CAPITAL LETTER CHI
-<U0059> <S0079>;<BASE>;<CAP>;<U0059> % LATIN CAPITAL LETTER Y
<UFF39> <S0079>;<BASE>;<WIDECAP>;<UFF39> % FULLWIDTH LATIN CAPITAL LETTER Y
<U0001F128> <S0079>;<BASE>;<COMPATCAP>;<U0001F128> % PARENTHESIZED LATIN CAPITAL LETTER Y
<U0001D418> <S0079>;<BASE>;<FONTCAP>;<U0001D418> % MATHEMATICAL BOLD CAPITAL Y
@@ -66708,7 +66718,6 @@ endif
<U01B3> <S01B4>;<BASE>;<CAP>;<U01B3> % LATIN CAPITAL LETTER Y WITH HOOK
<U1EFE> <S1EFF>;<BASE>;<CAP>;<U1EFE> % LATIN CAPITAL LETTER Y WITH LOOP
<U021C> <S021D>;<BASE>;<CAP>;<U021C> % LATIN CAPITAL LETTER YOGH
-<U005A> <S007A>;<BASE>;<CAP>;<U005A> % LATIN CAPITAL LETTER Z
<UFF3A> <S007A>;<BASE>;<WIDECAP>;<UFF3A> % FULLWIDTH LATIN CAPITAL LETTER Z
<U0001F129> <S007A>;<BASE>;<COMPATCAP>;<U0001F129> % PARENTHESIZED LATIN CAPITAL LETTER Z
<U2124> <S007A>;<BASE>;<FONTCAP>;<U2124> % DOUBLE-STRUCK CAPITAL Z
diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR
index f7c13ddf4b..7d5c9d878e 100644
--- a/localedata/locales/tr_TR
+++ b/localedata/locales/tr_TR
@@ -81,6 +81,8 @@ copy "iso14651_t1"
%
% The following rules implement the same order for glibc.
+% All of these collating symbols are used as primary weights
+% and cause equivalnce class problems, see Bug 23437.
collating-symbol <c-cedilla>
collating-symbol <g-breve>
collating-symbol <i-dotless>
@@ -111,8 +113,40 @@ reorder-after <AFTER-U>
<U011F> <g-breve>;<BASE>;<MIN>;IGNORE % Ä?
<U011E> <g-breve>;<BASE>;<CAP>;IGNORE % Ä?
<U0131> <i-dotless>;<BASE>;<MIN>;IGNORE % ı
+
+% tr_TR must copy the rational range definition here for CEO:
+% Implement rational range for [A-Z] in regular expressions.
+% We order the collation element order to support rational ranges.
+% Collation is unaffected because the 4-level weights remain the same.
+<U0041> <S0061>;<BASE>;<CAP>;<U0041> % LATIN CAPITAL LETTER A
+<U0042> <S0062>;<BASE>;<CAP>;<U0042> % LATIN CAPITAL LETTER B
+<U0043> <S0063>;<BASE>;<CAP>;<U0043> % LATIN CAPITAL LETTER C
+<U0044> <S0064>;<BASE>;<CAP>;<U0044> % LATIN CAPITAL LETTER D
+<U0045> <S0065>;<BASE>;<CAP>;<U0045> % LATIN CAPITAL LETTER E
+<U0046> <S0066>;<BASE>;<CAP>;<U0046> % LATIN CAPITAL LETTER F
+<U0047> <S0067>;<BASE>;<CAP>;<U0047> % LATIN CAPITAL LETTER G
+<U0048> <S0068>;<BASE>;<CAP>;<U0048> % LATIN CAPITAL LETTER H
+% Turkish sorting of I, but within rational range.
+% FIXME: 'I' is no longer in the equivalence class of i's.
<U0049> <i-dotless>;<BASE>;<CAP>;IGNORE % I
-<U0069> <S0069>;<BASE>;<MIN>;IGNORE % i
+<U004A> <S006A>;<BASE>;<CAP>;<U004A> % LATIN CAPITAL LETTER J
+<U004B> <S006B>;<BASE>;<CAP>;<U004B> % LATIN CAPITAL LETTER K
+<U004C> <S006C>;<BASE>;<CAP>;<U004C> % LATIN CAPITAL LETTER L
+<U004D> <S006D>;<BASE>;<CAP>;<U004D> % LATIN CAPITAL LETTER M
+<U004E> <S006E>;<BASE>;<CAP>;<U004E> % LATIN CAPITAL LETTER N
+<U004F> <S006F>;<BASE>;<CAP>;<U004F> % LATIN CAPITAL LETTER O
+<U0050> <S0070>;<BASE>;<CAP>;<U0050> % LATIN CAPITAL LETTER P
+<U0051> <S0071>;<BASE>;<CAP>;<U0051> % LATIN CAPITAL LETTER Q
+<U0052> <S0072>;<BASE>;<CAP>;<U0052> % LATIN CAPITAL LETTER R
+<U0053> <S0073>;<BASE>;<CAP>;<U0053> % LATIN CAPITAL LETTER S
+<U0054> <S0074>;<BASE>;<CAP>;<U0054> % LATIN CAPITAL LETTER T
+<U0055> <S0075>;<BASE>;<CAP>;<U0055> % LATIN CAPITAL LETTER U
+<U0056> <S0076>;<BASE>;<CAP>;<U0056> % LATIN CAPITAL LETTER V
+<U0057> <S0077>;<BASE>;<CAP>;<U0057> % LATIN CAPITAL LETTER W
+<U0058> <S0078>;<BASE>;<CAP>;<U0058> % LATIN CAPITAL LETTER X
+<U0059> <S0079>;<BASE>;<CAP>;<U0059> % LATIN CAPITAL LETTER Y
+<U005A> <S007A>;<BASE>;<CAP>;<U005A> % LATIN CAPITAL LETTER Z
+
<U0130> <S0069>;<BASE>;<CAP>;IGNORE % Ä°
<U00F6> <o-diaresis>;<BASE>;<MIN>;IGNORE % ö
<U00D6> <o-diaresis>;<BASE>;<CAP>;IGNORE % Ã?
diff --git a/posix/bug-regex17.c b/posix/bug-regex17.c
index 893b9654b8..341fe4d827 100644
--- a/posix/bug-regex17.c
+++ b/posix/bug-regex17.c
@@ -46,14 +46,25 @@ struct
{ { 2, 10 }, { -1, -1 } } },
/* Tests for bug 9697:
+ Look for a multibyte sequence in a range. We pick the range based
+ on collation element order, since a-z is no longer valid since it's
+ a rational range.
+
+ We use U+FF53 FULLWIDTH LATIN SMALL LETTER S as the start of the
+ range, and U+33DC SQUARE SV as the end of the range. These were
+ chosen by looking at collation element ordering and picking a range
+ in which the matching character was listed.
+
+ U+02E2 \xcb\xa2 MODIFIER LETTER SMALL S
U+00DF \xc3\x9f LATIN SMALL LETTER SHARP S
U+02DA \xcb\x9a RING ABOVE
- U+02E2 \xcb\xa2 MODIFIER LETTER SMALL S */
- { "[a-z]|[^a-z]", "\xcb\xa2", REG_EXTENDED, 2,
+
+ The U+02DA RING ABOVE is chosen because it's not in [ï½?-ã??]. */
+ { "[ï½?-ã??]|[^ï½?-ã??]", "\xcb\xa2", REG_EXTENDED, 2,
{ { 0, 2 }, { -1, -1 } } },
- { "[a-z]", "\xc3\x9f", REG_EXTENDED, 2,
+ { "[ï½?-ã??]", "\xc3\x9f", REG_EXTENDED, 2,
{ { 0, 2 }, { -1, -1 } } },
- { "[^a-z]", "\xcb\x9a", REG_EXTENDED, 2,
+ { "[^ï½?-ã??]", "\xcb\x9a", REG_EXTENDED, 2,
{ { 0, 2 }, { -1, -1 } } },
};
diff --git a/posix/tst-fnmatch.input b/posix/tst-fnmatch.input
index dc2ca8d01a..2131d1e437 100644
--- a/posix/tst-fnmatch.input
+++ b/posix/tst-fnmatch.input
@@ -67,9 +67,11 @@
# https://sourceware.org/bugzilla/show_bug.cgi?id=23393
# https://sourceware.org/bugzilla/show_bug.cgi?id=23420
#
-# No consensus exists on how best to handle the changes so the
-# iso14651_t1_common collation element order (CEO) has been changed to
-# deinterlace the a-z and A-Z regions.
+# The solution was to implement rational ranges by moving the collation
+# element order to fix this for [a-z], [A-Z], and [0-9]. Likewise the
+# upper and lower case letters are deinterlaced to allow for accented
+# ranges that don't include uppercase e.g. [a-ñ] should not include
+# any uppercase letters but may include a-z and more.
#
# With the deinterlacing commit ac3a3b4b0d561d776b60317d6a926050c8541655
# could be reverted to re-test the correct non-interleaved expectations.
@@ -77,9 +79,7 @@
# Please note that despite the region being deinterlaced, the ordering
# of collation remains the same. In glibc we implement CEO and because of
# that we can reorder the elements to reorder ranges without impacting
-# collation which depends on weights. The collation element ordering
-# could have been changed to include just a-z, A-Z, and 0-9 in three
-# distinct blocks, but this needs more discussion by the community.
+# collation which depends on weights.
# B.6 004(C)
C "!#%+,-./01234567889" "!#%+,-./01234567889" 0
@@ -477,9 +477,9 @@ C "-" "[Z-\\]]" NOMATCH
# handling of ranges and the recognition of character (vs bytes).
de_DE.ISO-8859-1 "a" "[a-z]" 0
de_DE.ISO-8859-1 "z" "[a-z]" 0
-de_DE.ISO-8859-1 "ä" "[a-z]" 0
-de_DE.ISO-8859-1 "ö" "[a-z]" 0
-de_DE.ISO-8859-1 "ü" "[a-z]" 0
+de_DE.ISO-8859-1 "ä" "[a-z]" NOMATCH
+de_DE.ISO-8859-1 "ö" "[a-z]" NOMATCH
+de_DE.ISO-8859-1 "ü" "[a-z]" NOMATCH
de_DE.ISO-8859-1 "A" "[a-z]" NOMATCH
de_DE.ISO-8859-1 "Z" "[a-z]" NOMATCH
de_DE.ISO-8859-1 "Ä" "[a-z]" NOMATCH
@@ -492,9 +492,9 @@ de_DE.ISO-8859-1 "
de_DE.ISO-8859-1 "ü" "[A-Z]" NOMATCH
de_DE.ISO-8859-1 "A" "[A-Z]" 0
de_DE.ISO-8859-1 "Z" "[A-Z]" 0
-de_DE.ISO-8859-1 "Ä" "[A-Z]" 0
-de_DE.ISO-8859-1 "Ö" "[A-Z]" 0
-de_DE.ISO-8859-1 "Ü" "[A-Z]" 0
+de_DE.ISO-8859-1 "Ä" "[A-Z]" NOMATCH
+de_DE.ISO-8859-1 "Ö" "[A-Z]" NOMATCH
+de_DE.ISO-8859-1 "Ü" "[A-Z]" NOMATCH
de_DE.ISO-8859-1 "a" "[[:lower:]]" 0
de_DE.ISO-8859-1 "z" "[[:lower:]]" 0
de_DE.ISO-8859-1 "ä" "[[:lower:]]" 0
@@ -566,22 +566,46 @@ de_DE.ISO-8859-1 "aa" "[[.a.]]a" 0
de_DE.ISO-8859-1 "ba" "[[.a.]]a" NOMATCH
-# And with a multibyte character set.
+# And with a multibyte character set:
+# Ensure that Turkish reordering rules don't move 'i' out of a-z set,
+# or 'I' out of A-Z set.
+tr_TR.UTF-8 "i" "[a-z]" 0
+tr_TR.UTF-8 "ı" "[a-z]" NOMATCH
+tr_TR.UTF-8 "I" "[A-Z]" 0
+tr_TR.UTF-8 "Ä°" "[A-Z]" NOMATCH
+tr_TR.ISO-8859-9 "i" "[a-z]" 0
+tr_TR.ISO-8859-9 "I" "[A-Z]" 0
+# See bug 23437 for I not being in [=i=].
+tr_TR.UTF-8 "I" "[=i=]" NOMATCH
en_US.UTF-8 "a" "[a-z]" 0
+# Test that <U00F1> LATIN SMALL LETTER N WITH TILDE is not in [a-z].
+en_US.UTF-8 "ñ" "[a-z]" NOMATCH
en_US.UTF-8 "z" "[a-z]" 0
en_US.UTF-8 "A" "[a-z]" NOMATCH
+# Test that <U00D1> LATIN CAPITAL LETTER N WITH TILDE is not in [a-z].
+en_US.UTF-8 "Ã?" "[a-z]" NOMATCH
en_US.UTF-8 "Z" "[a-z]" NOMATCH
en_US.UTF-8 "a" "[A-Z]" NOMATCH
+# Test that <U00F1> LATIN SMALL LETTER N WITH TILDE is not in [A-Z].
+en_US.UTF-8 "ñ" "[A-Z]" NOMATCH
en_US.UTF-8 "z" "[A-Z]" NOMATCH
en_US.UTF-8 "A" "[A-Z]" 0
+# Test that <U00D1> LATIN CAPITAL LETTER N WITH TILDE is not in [A-Z].
+en_US.UTF-8 "Ã?" "[A-Z]" NOMATCH
en_US.UTF-8 "Z" "[A-Z]" 0
en_US.UTF-8 "0" "[0-9]" 0
+# Test that <UFF10> FULLWIDTH DIGIT ZERO is not in [0-9].
+en_US.UTF-8 "ï¼?" "[0-9]" NOMATCH
+# Test that <U00BD> VULGAR FRACTION ONE HALF is not in [0-9].
+en_US.UTF-8 "½" "[0-9]" NOMATCH
en_US.UTF-8 "9" "[0-9]" 0
+# Test that <UFF19> FULLWIDTH DIGIT NINE is not in [0-9].
+en_US.UTF-8 "ï¼?" "[0-9]" NOMATCH
de_DE.UTF-8 "a" "[a-z]" 0
de_DE.UTF-8 "z" "[a-z]" 0
-de_DE.UTF-8 "ä" "[a-z]" 0
-de_DE.UTF-8 "ö" "[a-z]" 0
-de_DE.UTF-8 "ü" "[a-z]" 0
+de_DE.UTF-8 "ä" "[a-z]" NOMATCH
+de_DE.UTF-8 "ö" "[a-z]" NOMATCH
+de_DE.UTF-8 "ü" "[a-z]" NOMATCH
de_DE.UTF-8 "A" "[a-z]" NOMATCH
de_DE.UTF-8 "Z" "[a-z]" NOMATCH
de_DE.UTF-8 "Ã?" "[a-z]" NOMATCH
@@ -594,9 +618,9 @@ de_DE.UTF-8 "ö" "[A-Z]" NOMATCH
de_DE.UTF-8 "ü" "[A-Z]" NOMATCH
de_DE.UTF-8 "A" "[A-Z]" 0
de_DE.UTF-8 "Z" "[A-Z]" 0
-de_DE.UTF-8 "Ã?" "[A-Z]" 0
-de_DE.UTF-8 "Ã?" "[A-Z]" 0
-de_DE.UTF-8 "Ã?" "[A-Z]" 0
+de_DE.UTF-8 "Ã?" "[A-Z]" NOMATCH
+de_DE.UTF-8 "Ã?" "[A-Z]" NOMATCH
+de_DE.UTF-8 "Ã?" "[A-Z]" NOMATCH
de_DE.UTF-8 "a" "[[:lower:]]" 0
de_DE.UTF-8 "z" "[[:lower:]]" 0
de_DE.UTF-8 "ä" "[[:lower:]]" 0
diff --git a/posix/tst-rxspencer.c b/posix/tst-rxspencer.c
index 9d597ef3e9..a3d836679a 100644
--- a/posix/tst-rxspencer.c
+++ b/posix/tst-rxspencer.c
@@ -155,7 +155,12 @@ mb_frob_pattern (const char *str, const char *letters)
*dst++ = *src;
continue;
}
- else if (!in_class && strchr (letters, *src))
+ /* We do a replacement, but not for the start of ranges, because
+ mb_replace will create invalid rational ranges. For example
+ [á-z] is an invalid range because á comes after z, but [a-á]
+ is a valid range. So we avoid replacing the start of ranges
+ to avoid this problem. */
+ else if (!in_class && src[1] != '-' && strchr (letters, *src))
dst = mb_replace (dst, *src);
else
{