This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/13547] New: Different strings collate as equal in Hungarian
- From: "egmont at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sources dot redhat dot com
- Date: Tue, 03 Jan 2012 00:17:49 +0000
- Subject: [Bug localedata/13547] New: Different strings collate as equal in Hungarian
- Auto-submitted: auto-generated
http://sourceware.org/bugzilla/show_bug.cgi?id=13547
Bug #: 13547
Summary: Different strings collate as equal in Hungarian
Product: glibc
Version: 2.14
Status: NEW
Severity: normal
Priority: P2
Component: localedata
AssignedTo: libc-locales@sources.redhat.com
ReportedBy: egmont@gmail.com
Classification: Unclassified
Created attachment 6139
--> http://sourceware.org/bugzilla/attachment.cgi?id=6139
collate fix for Hungarian
Please apply the attached patch to the Hungarian locale definition.
Using the current definition, certain strings collate as equal, e.g.
strcoll("ccs", "cscs") returns zero. This causes confusion with programs such
as sort (the order is undefined, might vary from run to run), or uniq
(different lines being reported as equal).
The given patch addresses this problem and makes them collate as different,
without modifying the actual sorting order of valid Hungarian words.
The problem in more detail:
We have compound letters, such as "sh" in English, e.g. we have "cs". Whenever
such a letter is pronounced long, we write it using a shorthand "ccs" notation
(only the first letter is duplicated), rather than "cscs".
Currently "ccs" is tokenized as <cs><cs>, which is correct, but "cscs" (not
used in valid Hungarian words, but might occur in text files anyways) is also
tokenized as <cs><cs>, hence they collate equal.
The solution is to tokenize "ccs" as <c_or_cs><cs>, and reorder the tokens like
<a> <b> <c> <c_or_cs> <cs> <d> ...
The problem was originally discovered at http://hup.hu/node/110267 (forum in
Hungarian).
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.