This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug locale/18927] Different strings should never collate as equal


https://sourceware.org/bugzilla/show_bug.cgi?id=18927

--- Comment #16 from Stephane Chazelas <stephane.chazelas+sourceware at gmail dot com> ---
Note that there are thousands of characters for which the sorting order is not
defined and end up sorting the same like for those ①②③④⑤⑥⑦ mentioned earlier:

$ expr ① = ②
1

And there are several characters and even collating sequences that have
identical weights. For instance,  Ǝ, Ə and Ɛ are explicitly defined as having
the same collation order, which makes no sense.

$ printf '%s\n' Ǝ Ə Ɛ | sort | uniq -c
      3 Ǝ

https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/iso14651_t1_common;h=eb0fe9ec9d813cbbff78c1ea66b8271f2b018b99;hb=HEAD#l5526

Even having é (U+00E9) sort the same as é (e followed by U+0301) would not be
desirable IMO.

Though it would be more useful if their first few weights were the  same as it
is on some systems. Instead, in GNU locales, the collating order U+0301, the
combining acute accent and that of a few other (but not all) combining
diacritics is not defined. So for instance:

$ (set -x; expr $'e\u301' = $'e\u302')
+ expr é '=' ê
1

While:

$ (set -x; expr $'\ue9' = $'\uea')
+ expr é '=' ê
0

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]