This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/17750] wrong collation order of diacritics in most locales


https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #20 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Florian Weimer from comment #19)

> I expect that many languages/scripts have multiple collation rules,
> depending on use, particularly when it comes to sorting foreign languages
> using the same base script.

Let's not forget that most languages with Latin scripts do use accents
regularly. I don't think glibc allows different diacrit ordering for "own"
accents and "foreign" accents, e.g. in case of Finnish to use forward diacrit
ordering for ä and ö, and backward diacrit ordering for é and û (and what if
they're mixed?).

So the question is not how to sort _foreign_ words within the language, the
question is how to sort _own_ words of the language. This defines the diacrit
sorting. Foreign words will follow.

If a list to be sorted is composed solely of foreign words from a particular
language, e.g. solely French words in an otherwise Finnish environment, it
might be reasonable to sort using the rules of that language, e.g. French in
this case. This can be achieved by setting LC_COLLATE=fr_FR.UTF-8.

In my opinion, the only valid question is what to do with English in
territories where French is by far the second most popular language: is it
reasonable to go with backward diacrits ordering there?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]