This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/17750] wrong collation order of diacritics in most locales


https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #25 from keld at keldix dot com <keld at keldix dot com> ---
On Thu, Nov 30, 2017 at 09:09:25AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17750
> 
> --- Comment #24 from Egmont Koblinger <egmont at gmail dot com> ---
> (In reply to keld@keldix.com from comment #21)
> 
> > Well, in Finnish and other Nordic languages like Danish, Swedish and
> > Norwegian, ö and ä etc
> > are not considered accented letters, but genuine separated letters, so that
> > is why 
> > there are few strings with more than one accented letter.
> 
> To clarify: If they sort German words containing ä and ö, they're sorted among
> the same letters of their own language, right? And what about French accents,
> are they on the other hand mixed together with their unaccented counterparts?

Yes, German ö and ä are treated exactly as the Swedish letters.
And French accented letters like é and è are treated as 'e' but with an accent.
é is actually
much used in Swedish proper.

> > German umlaut letters are much the same in Finnish (and Swedish) and ä and ö
> > are
> > then the same as the genuine Finnish/Swedish letters.
> 
> What about German ü?

ü is treated as an y AFAIK, but as with an accent. Danish æ and ø are treated
as ä and ö
but as if they have an accent.

> (In reply to keld@keldix.com from comment #22)
> 
> > [...] Then there is a spec from Danish Standard
> > that is more elaborate [...] with the backwards diacrit spec.
> 
> I'm shocked to hear that there's not only one language but more languages that
> use backwards diacritics, something that IMO no sane man with any tiny bit of
> common sense would ever decide on :-)

Well, it is because the last accented character in French are more important
when pronounciated. I agree the it is a bit coulter-intuitive, but I do favour
the actual habits in the real world over what is logic.

> (In reply to keld@keldix.com from comment #23)
> 
> > That is what I am suggesting, at least for Canada.
> > The same reasoning could be done for Dutch in Belgium, and then also the
> > Netherlands.
> 
> If this is indeed what's correct for these languages / what people living there
> prefer then it's okay for me. I'm just hoping that the kinda de-facto standard
> en_US will stay with forward diacrits. I _guess_ Spanish is more frequently
> used there than French, plus again, I can't imagine how anyone ever could have
> come up with this braindamaged idea of backward diacrit sorting so I'd
> personally prefer en_US not to have this craziness :-)

the kind of defacto i18n locale has forward diacrits. i18n is the standard
locale of ISO TR 30112.
I think both Spanish and German needs forward diacrits, and Spanish being a
bigger
language than French would give that we should use forward diacrit as the
default.

Best regards
Keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]