This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug regex/23036] regex equivalence class regression


https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #16 from Florian Weimer <fweimer at redhat dot com> ---
I looked at this from another angle.  After the update in bug 14095, the
following characters in the en_US.utf8 locale match [[=a=]]:

U+000041 [[A]]
U+000061 [[a]]
U+0000AA [[ª]]
U+000363 [[ͣ]]
U+001D2C [[ᴬ]]
U+001D43 [[ᵃ]]
U+002090 [[ₐ]]
U+00249C [[⒜]]
U+0024B6 [[Ⓐ]]
U+0024D0 [[ⓐ]]
U+00FF21 [[A]]
U+00FF41 [[a]]
U+01D400 [[𝐀]]
U+01D41A [[𝐚]]
U+01D434 [[𝐴]]
U+01D44E [[𝑎]]
U+01D468 [[𝑨]]
U+01D482 [[𝒂]]
U+01D49C [[𝒜]]
U+01D4B6 [[𝒶]]
U+01D4D0 [[𝓐]]
U+01D4EA [[𝓪]]
U+01D504 [[𝔄]]
U+01D51E [[𝔞]]
U+01D538 [[𝔸]]
U+01D552 [[𝕒]]
U+01D56C [[𝕬]]
U+01D586 [[𝖆]]
U+01D5A0 [[𝖠]]
U+01D5BA [[𝖺]]
U+01D5D4 [[𝗔]]
U+01D5EE [[𝗮]]
U+01D608 [[𝘈]]
U+01D622 [[𝘢]]
U+01D63C [[𝘼]]
U+01D656 [[𝙖]]
U+01D670 [[𝙰]]
U+01D68A [[𝚊]]
U+01F110 [[🄐]]
U+01F130 [[🄰]]
U+01F150 [[🅐]]
U+01F170 [[🅰]]

Before, the list was:

U+000041 [[A]]
U+000061 [[a]]
U+0000AA [[ª]]
U+0000C0 [[À]]
U+0000C1 [[Á]]
U+0000C2 [[Â]]
U+0000C3 [[Ã]]
U+0000C4 [[Ä]]
U+0000C5 [[Å]]
U+0000E0 [[à]]
U+0000E1 [[á]]
U+0000E2 [[â]]
U+0000E3 [[ã]]
U+0000E4 [[ä]]
U+0000E5 [[å]]
U+000100 [[Ā]]
U+000101 [[ā]]
U+000102 [[Ă]]
U+000103 [[ă]]
U+000104 [[Ą]]
U+000105 [[ą]]
U+0001CD [[Ǎ]]
U+0001CE [[ǎ]]
U+0001DE [[Ǟ]]
U+0001DF [[ǟ]]
U+0001E0 [[Ǡ]]
U+0001E1 [[ǡ]]
U+0001FA [[Ǻ]]
U+0001FB [[ǻ]]
U+000200 [[Ȁ]]
U+000201 [[ȁ]]
U+000202 [[Ȃ]]
U+000203 [[ȃ]]
U+000226 [[Ȧ]]
U+000227 [[ȧ]]
U+001E00 [[Ḁ]]
U+001E01 [[ḁ]]
U+001E9A [[ẚ]]
U+001EA0 [[Ạ]]
U+001EA1 [[ạ]]
U+001EA2 [[Ả]]
U+001EA3 [[ả]]
U+001EA4 [[Ấ]]
U+001EA5 [[ấ]]
U+001EA6 [[Ầ]]
U+001EA7 [[ầ]]
U+001EA8 [[Ẩ]]
U+001EA9 [[ẩ]]
U+001EAA [[Ẫ]]
U+001EAB [[ẫ]]
U+001EAC [[Ậ]]
U+001EAD [[ậ]]
U+001EAE [[Ắ]]
U+001EAF [[ắ]]
U+001EB0 [[Ằ]]
U+001EB1 [[ằ]]
U+001EB2 [[Ẳ]]
U+001EB3 [[ẳ]]
U+001EB4 [[Ẵ]]
U+001EB5 [[ẵ]]
U+001EB6 [[Ặ]]
U+001EB7 [[ặ]]

Conversely, the *new* list for [[=á=]] looks like this (the old list is the
same as for [[=a]]):

U+0000C0 [[À]]
U+0000C1 [[Á]]
U+0000C2 [[Â]]
U+0000C3 [[Ã]]
U+0000C4 [[Ä]]
U+0000C5 [[Å]]
U+0000E0 [[à]]
U+0000E1 [[á]]
U+0000E2 [[â]]
U+0000E3 [[ã]]
U+0000E4 [[ä]]
U+0000E5 [[å]]
U+000100 [[Ā]]
U+000101 [[ā]]
U+000102 [[Ă]]
U+000103 [[ă]]
U+000104 [[Ą]]
U+000105 [[ą]]
U+0001CD [[Ǎ]]
U+0001CE [[ǎ]]
U+000200 [[Ȁ]]
U+000201 [[ȁ]]
U+000202 [[Ȃ]]
U+000203 [[ȃ]]
U+000226 [[Ȧ]]
U+000227 [[ȧ]]
U+001DF2 [[ᷲ]]
U+001E00 [[Ḁ]]
U+001E01 [[ḁ]]
U+001EA0 [[Ạ]]
U+001EA1 [[ạ]]
U+001EA2 [[Ả]]
U+001EA3 [[ả]]
U+00212B [[Å]]
U+00A79A [[Ꞛ]]
U+00A79B [[ꞛ]]

I do not know which choice is more desirable.  This no longer looks like an
algorithmic issue to me.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]