This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Collation of INFINITY vs. EMPTY SET?

In en_US we use localedata/locales/iso14651_t1_common
for collation.

A recent Fedora bug:

Shows we don't have code-point-based collation for
elements which are not defined in the collation source
files (the locale files themselves and the files
they include to define their LC_COLLATE).

In localedata/locales/iso14651_t1_common we have the
following comment:

4827 # Any character not precisely specified will be considered as a special
4828 # character and considered only at the last level.
4829 # <U0000>......<U7FFFFFFF> IGNORE;IGNORE;IGNORE;<U0000>......<U7FFFFFFF>

... and then:

5001 # The comment at the beginning of this section mentions characters which
5002 # are not otherwise covered.  But this description cannot express this.
5003 # Therefore we add here a few entries which are used in older implementations
5004 # to be compatible.  --drepper

I always thought we would fall back to code point
order (former comment implies), but Drepper's comment
makes it seem like that's not true? The Fedora bug
also makes it seem like that's not true.

Why might we not want code-point-based sorting for
entries not defined?

Is the solution to write automation to create iso14651_t1_common
and list all the unspecified elements in code point order?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]