View Bug Activity | Format For Printing
The UTF-8 charset definition and the i18n locale data (which has the default LC_CTYPE definitions used by almost all locales, for toupper, tolower, isalpha, etc) are quite outdated, using unicode 3.2 data. (there are 3,868 new characters between unicode 3.2 and unicode 5.0) Attached is a patch for UTF-8 and an updated i18n file (as a patch for that one is bigger than the file itself)
Created an attachment (id=1506) patch for the UTF-8 file, adding newly defined characters as of unicode 5.0
Created an attachment (id=1507) updated i18n, with the definitions updated for unicode 5.0
Created an attachment (id=1508) well, if you prefer a diff, here it is
Applied.