This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/22469] pl_PL LC_COLLATE does not use i18n

From: "digitalfreak at lingonborough dot com" <sourceware-bugzilla at sourceware dot org>
To: libc-locales at sourceware dot org
Date: Fri, 01 Dec 2017 01:09:14 +0000
Subject: [Bug localedata/22469] pl_PL LC_COLLATE does not use i18n
Auto-submitted: auto-generated
References: <bug-22469-716@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=22469

--- Comment #5 from Rafal Luzynski <digitalfreak at lingonborough dot com> ---
For the record and for the future reference: Polish alphabetical sorting is
standardized by PN–80/N–01223 standard (by Polish Committee for
Standardization). Some of its rules:

1. Alphabetical order must accord with the Polish alphabet with the letters: q,
v, x added.
2. Non-Polish diacritical characters are ignored, ex.: Hašek < Hass
2a. It is also allowed to ignore Polish diacritical characters (although nobody
seems to apply this rule, Polish diacritical characters are always respected).
3. Spaces and punctuation characters are before the letters, ex.: "mur z cegły"
< "murawa".
4. Lowercase letter is before the uppercase, ex.: arab < Arab.
5. Numbers (also spelled) must be sorted according to their numerical value and
placed before the letters, ex.: 1 < 5 < ósmy < trzynaście < 17 < XXI <
Agnieszka < Antoni ... (This rule is difficult to implement, let's skip it.)
6. The placement of the Icelandic letter Þ (Thorn) is not regulated but the
Icelandic alphabet places it at the end, after Z. We are encouraged to follow
this rule as well, ex.: X < Y < Z < Þ.

Source: https://pl.wikipedia.org/wiki/Porz%C4%85dek_alfabetyczny

Another scientific source says that Polish language has two rules of sorting:
for dictionaries the spaces and punctuation characters are ignored
(letter-by-letter order) but for encyclopedias they are not (word-by-word
order). Thanks to these rules people who don't know whether the correct
spelling is „na pewno” or „napewno” will find the word ("na pewno" ==
"napewno"). On the other hand in encyclopedias all monarchs named Jan are
grouped together: "Jan III Sobieski" < "Jan XXIII" < "Janina". We can't
implement two different rules, here we have implemented the word-by-word rule
and it is correct. The same has been requested in bug 388.

Source:
https://sjp.pwn.pl/poradnia/haslo/porzadek-alfabetyczny-ale-jaki;16226.html

One more source saying that non-Polish diacritical characters should be
ignored: https://sjp.pwn.pl/poradnia/haslo/porzadek-alfabetyczny;4208.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

References:
- [Bug localedata/22469] New: pl_PL LC_COLLATE does not use i18n
  - From: maiku.fabian at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]