This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/23421] Strange collation rules for A and space with UTF-8 locale when other characters appended
- From: "b.cama at kerlink dot fr" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Wed, 18 Jul 2018 08:10:40 +0000
- Subject: [Bug localedata/23421] Strange collation rules for A and space with UTF-8 locale when other characters appended
- Auto-submitted: auto-generated
- References: <bug-23421-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=23421
Benjamin Cama <b.cama at kerlink dot fr> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|INVALID |---
--- Comment #5 from Benjamin Cama <b.cama at kerlink dot fr> ---
Thanks again for the clarification. I understand that this is a POSIX-defined
behavior, and I cannot do much about it. Thanks for the example describing a
situation where using the C locale is mandated.
I know I cannot convince anyone of changing POSIX, but one last *real* example
of “weird” sorting:
ENCR_DES DES_CBC
ENCR_DES_ECB DES_ECB
ENCR_DES_IV32 DES_IV32
ENCR_DES_IV64 DES_IV64
ENCR_IDEA IDEA_CBC
ENCR_NULL_AUTH_AES_GMAC NULL_AES_GMAC
ENCR_NULL NULL
The “usual” rule (as in “historically in Unix, which for a long time used the
C/POSIX locale everywhere”; I am speaking of 2000's kind of old, not the 80's,
but I am old enough to have lived the Unicode transition in Debian) of having
shorter strings sorted before longer ones does not stand (i.e. ENCR_NULL* looks
sorted the opposite way of ENCR_DES*). This is with tabs instead of spaces
(which have the same ordering rule, it seems), so it stands out more.
It is even stranger in this made up example:
% printf "A\tA\nAA\tA\nA\tD\n"|sort
A A
AA A
A D
I will from now on try not to forget setting the right collation rule before
expecting the C sorting behavior. I hope not to be bitten again.
Sorry for the noise and thanks again.
--
You are receiving this mail because:
You are on the CC list for the bug.