This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Collation of INFINITY vs. EMPTY SET?
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Mike Fabian <mfabian at redhat dot com>, Pravin Satpute <psatpute at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Mike Frysinger <vapier at gentoo dot org>
- Date: Tue, 17 May 2016 01:07:11 -0400
- Subject: Collation of INFINITY vs. EMPTY SET?
- Authentication-results: sourceware.org; auth=none
In en_US we use localedata/locales/iso14651_t1_common
for collation.
A recent Fedora bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1336308
Shows we don't have code-point-based collation for
elements which are not defined in the collation source
files (the locale files themselves and the files
they include to define their LC_COLLATE).
In localedata/locales/iso14651_t1_common we have the
following comment:
4827 # Any character not precisely specified will be considered as a special
4828 # character and considered only at the last level.
4829 # <U0000>......<U7FFFFFFF> IGNORE;IGNORE;IGNORE;<U0000>......<U7FFFFFFF>
... and then:
5001 # The comment at the beginning of this section mentions characters which
5002 # are not otherwise covered. But this description cannot express this.
5003 # Therefore we add here a few entries which are used in older implementations
5004 # to be compatible. --drepper
I always thought we would fall back to code point
order (former comment implies), but Drepper's comment
makes it seem like that's not true? The Fedora bug
also makes it seem like that's not true.
Why might we not want code-point-based sorting for
entries not defined?
Is the solution to write automation to create iso14651_t1_common
and list all the unspecified elements in code point order?
--
Cheers,
Carlos.