This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/18587] New: Minor collate issues in Hungarian locale

From: "egmont at gmail dot com" <sourceware-bugzilla at sourceware dot org>
To: glibc-bugs at sourceware dot org
Date: Tue, 23 Jun 2015 22:05:26 +0000
Subject: [Bug localedata/18587] New: Minor collate issues in Hungarian locale
Auto-submitted: auto-generated

https://sourceware.org/bugzilla/show_bug.cgi?id=18587

Bug ID: 18587
Summary: Minor collate issues in Hungarian locale
Product: glibc
Version: 2.21
Status: NEW
Severity: minor
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: egmont at gmail dot com
CC: libc-locales at sourceware dot org
Target Milestone: ---

Created attachment 8385
--> https://sourceware.org/bugzilla/attachment.cgi?id=8385&action=edit
Fix

There are two minor issues with the Hungarian locale when sorting strings that
only differ in their case. Please apply the attached patch to fix them.

Issue 1:

Most of the time the lowercase counterpart is sorted before the uppercase;
however it's not the case for "CS" < "Cs", and similarly for all the other
double consonants (dz, gy, ..., there are 8 of them in total).

To test:

LC_ALL=hu_HU.UTF-8 sort -k 1,1 -s << END
cs 1
cS 2
Cs 3
CS 4
END

Expected output: according to the numbers. Current output: in the order 1 2 4
3.

The fix copies the pattern found at the only triple consonant "dzs", by using
the new <MIN-MIN> or <CAP-CAP> instead of <MIN> or <CAP> to explicitly denote
the case of both of the codepoints in the compound letter. This also makes the
file's layout more nicely tabulated and easier to read.

Issue 2:

When the only triple letter "dzs" is pronounced long, it's spelled as "ddzs",
however, due to stupid obvious typos of using <CAP-x-y> instead of <MIN-x-y>
(this mistake might have been introduced by me a long time ago, can't
remember), the case of the second "d" is ignored rather than lowercase being
sorted before uppercase.

To test:

LC_ALL=hu_HU.UTF-8 sort -k 1,1 -s << END
DDzs 2
Ddzs 1
DDzs 3
END

Expected output: according to the numbers. Actual output: unchanged order,
proving that they all compare equal.

On a slightly related note: the new version of the Hungarian spelling rules is
planned to be released this September [1], replacing the current 30 year old
version. The old version's section about alphabetical sorting doesn't say what
to do when only the case differs. Allegedly the new version will specify that
lowercase is to be sorted first, followed by uppercase: [2] -> "arany, Arany",
which is what the current version already implements - apart from these bugs.
So this patch is also in preparation for the new rules.

[1]
http://mta.hu/mta_hirei/szeptemberben-jelenik-meg-a-magyar-helyesiras-szabalyai-tizenkettedik-kiadasa-136386/
[2] http://www.nyest.hu/hirek/mi-ujsag-a-helyesirasban

--
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]