This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug locale/18927] Different strings should never collate as equal
- From: "stephane.chazelas+sourceware at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Sat, 24 Feb 2018 22:31:20 +0000
- Subject: [Bug locale/18927] Different strings should never collate as equal
- Auto-submitted: auto-generated
- References: <bug-18927-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=18927
--- Comment #16 from Stephane Chazelas <stephane.chazelas+sourceware at gmail dot com> ---
Note that there are thousands of characters for which the sorting order is not
defined and end up sorting the same like for those ①②③④⑤⑥⑦ mentioned earlier:
$ expr ① = ②
1
And there are several characters and even collating sequences that have
identical weights. For instance, Ǝ, Ə and Ɛ are explicitly defined as having
the same collation order, which makes no sense.
$ printf '%s\n' Ǝ Ə Ɛ | sort | uniq -c
3 Ǝ
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/iso14651_t1_common;h=eb0fe9ec9d813cbbff78c1ea66b8271f2b018b99;hb=HEAD#l5526
Even having é (U+00E9) sort the same as é (e followed by U+0301) would not be
desirable IMO.
Though it would be more useful if their first few weights were the same as it
is on some systems. Instead, in GNU locales, the collating order U+0301, the
combining acute accent and that of a few other (but not all) combining
diacritics is not defined. So for instance:
$ (set -x; expr $'e\u301' = $'e\u302')
+ expr é '=' ê
1
While:
$ (set -x; expr $'\ue9' = $'\uea')
+ expr é '=' ê
0
--
You are receiving this mail because:
You are on the CC list for the bug.