[Bug localedata/13096] New: fi_FI collation: [=?UTF-8?Q?vw=C3=A5=C3=A4=C3=B6=C3=BE?=] and [=?UTF-8?Q?=C3=90=C3=9C?=] are in wrong ranges
lauri.kentta at gmail dot com
sourceware-bugzilla@sourceware.org
Tue Aug 16 13:30:00 GMT 2011
http://sourceware.org/bugzilla/show_bug.cgi?id=13096
Bug #: 13096
Summary: fi_FI collation: [vwåäöþ] and [ÐÜ] are in wrong ranges
Product: glibc
Version: 2.14
Status: NEW
Severity: minor
Priority: P2
Component: localedata
AssignedTo: libc-locales@sources.redhat.com
ReportedBy: lauri.kentta@gmail.com
Classification: Unclassified
Created attachment 5900
--> http://sourceware.org/bugzilla/attachment.cgi?id=5900
Proposed fix
In the Finnish locale (fi_FI), a couple of lower-case letters (namely [vwåäöþ])
have been put between upper-case letters. The converse is true for upper-case
letters Ð and Ü. This causes unexpected results in grep, for example:
export LC_COLLATE=fi_FI.UTF-8
echo v | grep -E '[a-z]' # actual: empty, expected: "v"
echo v | grep -E '[A-Z]' # actual: "v", expected: empty
echo x | grep -E '[a-z]' # actual: "x", expected: "x"
echo x | grep -E '[A-Z]' # actual: empty, expected: empty
I'm aware that the locales don't guarantee much about character ranges, but
this behaviour is clearly illogical, serves no purpose and might break
somebody's scripts.
This has been fixed in Debian years ago.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=441026
If I read their bug report correctly, this has been right in the past (glibc
2.3.6), probably broken by mistake.
Proposed fix attached.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the Libc-locales
mailing list