Please revert libc/localedata/locales/hu_HU revision 1.18, "Better collation". It is not better, it is worse. According to the Hungarian rules, aacute, eacute, iacute, oacute and uacute must be treated the same as their unaccented counterparts, also wovels with diaeresis should be treated the same as ther counterparts with double acutes. In other words: a = á < e = é < i = í < o = ó < ö = ő < u = ú < ü = ű For example, the following is a correct alphabetical order: ablak állat apa áru az These wovels in one equivalence class only make a difference if they are the only letters which differ, e.g.: Eger egér éger eget éget This was perfectly implemented in the previous version, as well as mentioned in some comment lines within this file (which comment is still there although it doesn't correspond to what's implemented right now). I don't know who and why suggested the modifications of 1.18, but he was surely wrong. If needed, I can scan some pages of dictionaries or phone books and upload it to prove these sorting rules. If someone just happens to prefer sorting this way, then he is of course absolutely free to create an own locale for himself, or set LC_COLLATE=C or something similar, but there's hardly any place for that work in glibc. Glibc should follow the national rules, and r1.18 was a move against it. Ulrich, If I recall correctly, some years ago it was you to whom I sent the hu_HU sorting rules which fixed some bugs. Then you asked me to manually sort a lot of words you had previously received from some other Hungarian guy and test whether glibc sorts it in the same order. Then glibc with those Hungarian collating rules passed that test, but the new rules would obviously fail on them. Do you happen to still have that file? (I don't think I have them, but I'll take a look at it.) I guess it would be a really wise move to put such kind of sorted files into glibc's source and add a sorting test case for them. Ps1: a and á, as well as e and é are different voices so it's often argued if it's logical to put them in the same group, this is rather a tradition than a logical decision. On the other hand, i and í, o and ó, ö and ő, u and ú, and finally ü and ű are the same voices, with the latter ones pronounced longer. Crosswords and similar stuff treat a and á, and é and é differently, while the other pairs are interchangeable there. But alphabetical sorting uses different rules. Ps2: All the words above in the examples are real Hungarian words.
Discuss this with the other reported and get back with the result. I have no reason to believe anyone over somebody else.
Who is the other reporter? Please give me some contact info, I couldn't find such an entry in this bugzilla.
2005-07-26 Ulrich Drepper <drepper@redhat.com> * locales/hu_HU: Better collation. Patch by Gyuro Lehel <lehel@freemail.hu>.
I received a reply from Lehel. He writes (in Hungarian) that he doesn't want to create an account in bugzilla because he receives twice as much spam since he registered in redhat bugzilla. On the other hand he asked me to copy/paste this text here: Well, I do not argue the point, it was just the customers at my old job who did not really liked this kind of sorting. Maybe the solution could be to add a locale that contains the alphabetical sorting and let the users choose their preferred one.
> Maybe the solution could be to add a locale that contains the alphabetical > sorting and let the users choose their preferred one. No, creating variant locales is not an option. There is one and only one locale.
> No, creating variant locales is not an option. I perfectly agree, I also answered him this. (If there are 2 choices then in a few minutes there'll be request for about 2^N choices where N keeps on growing forever...).
No response in 6+ months. Closing.
No response to what? Sorry, but I think that _I_'ve been waiting 6+ months for _you_ to fix this bug. I told you that Lehel agreed in private mail that he was wrong and I am right, unfortunately I couldn't get him to comment here in bugzilla so I cannot prove this, but I hope you do not think I'm lying; and it's not my fault that he is not as co-operative as he should be. In the original report I told "If needed, I can scan some pages of dictionaries..." It's not easy for me to find access to a scanner but I am happily willing to do this _if_ I know that it's needed to get this bug fixed. But I still don't know if that would make you happy, you haven't replied anything like "yes, scanning those pages would be cool". I'll be back shortly with some scanned pages. If that's not enough then please, please let me know what to do to prove I'm right.
Created attachment 999 [details] dictionary A random page scanned from a Hungarian-German dictionary. Words beginning with e and é appear in mixed order.
Created attachment 1000 [details] phonebook The page where ö and ő starts, scanned from a quite recent phonebook.
I modified the scanned pictures due to potential privacy or legal problems. I can send the unmodified versions in private e-mail, if required. If you need any other proof, please let me know.
No response in 6+ months. Last time you closed this bug with this justification. Now _you_ haven't replied in half year, so let me please increase the severity (as requested in the help pages of this bugzilla -- though I admit this is not a critical bug at all, but somehow I'd like to draw your attention on it, and anyway your docs say I should do this). It is a regression anyway (now already present in 2 consecuvite official releases), and I see no reason why it couldn't be fixed quickly. I hope that regression bugs are handled with higher priority (as this is the case with many other software projects). In the mean time I also changed the summary according to the docs, HTH too.
I reverted the patch.