Bug 18587 - Minor collate issues in Hungarian locale
Description Egmont Koblinger 2015-06-23 22:05:26 UTC
Created attachment 8385 [details]

There are two minor issues with the Hungarian locale when sorting strings that only differ in their case. Please apply the attached patch to fix them.

Issue 1:

Most of the time the lowercase counterpart is sorted before the uppercase; however it's not the case for "CS" < "Cs", and similarly for all the other double consonants (dz, gy, ..., there are 8 of them in total).

To test:

LC_ALL=hu_HU.UTF-8 sort -k 1,1 -s << END
cs 1
cS 2
Cs 3
CS 4

Expected output: according to the numbers. Current output: in the order 1 2 4 3.

The fix copies the pattern found at the only triple consonant "dzs", by using the new <MIN-MIN> or <CAP-CAP> instead of <MIN> or <CAP> to explicitly denote the case of both of the codepoints in the compound letter. This also makes the file's layout more nicely tabulated and easier to read.

Issue 2:

When the only triple letter "dzs" is pronounced long, it's spelled as "ddzs", however, due to stupid obvious typos of using <CAP-x-y> instead of <MIN-x-y> (this mistake might have been introduced by me a long time ago, can't remember), the case of the second "d" is ignored rather than lowercase being sorted before uppercase.

To test:

LC_ALL=hu_HU.UTF-8 sort -k 1,1 -s << END
DDzs 2
Ddzs 1
DDzs 3

Expected output: according to the numbers. Actual output: unchanged order, proving that they all compare equal.

On a slightly related note: the new version of the Hungarian spelling rules is planned to be released this September [1], replacing the current 30 year old version. The old version's section about alphabetical sorting doesn't say what to do when only the case differs. Allegedly the new version will specify that lowercase is to be sorted first, followed by uppercase: [2] -> "arany, Arany", which is what the current version already implements - apart from these bugs. So this patch is also in preparation for the new rules.

[1] http://mta.hu/mta_hirei/szeptemberben-jelenik-meg-a-magyar-helyesiras-szabalyai-tizenkettedik-kiadasa-136386/
[2] http://www.nyest.hu/hirek/mi-ujsag-a-helyesirasban
Comment 1 Egmont Koblinger 2015-09-08 08:38:43 UTC
I discovered other bugs as well, and created a patch that does not only address all of them but also adds extensive test coverage. I wouldn't want to pollute this bug by squeezing in new ones, so I decided to create a new one.

Let's mark this bug as obsoleted by bug 18934.

*** This bug has been marked as a duplicate of bug 18934 ***
