This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Collation of INFINITY vs. EMPTY SET?

"Carlos O'Donell" <> wrote:

> In en_US we use localedata/locales/iso14651_t1_common
> for collation.
> A recent Fedora bug:
> Shows we don't have code-point-based collation for
> elements which are not defined in the collation source
> files (the locale files themselves and the files
> they include to define their LC_COLLATE).

I reported a similar bug a while ago:

> In localedata/locales/iso14651_t1_common we have the
> following comment:
> 4827 # Any character not precisely specified will be considered as a special
> 4828 # character and considered only at the last level.
> 4829 # <U0000>......<U7FFFFFFF> IGNORE;IGNORE;IGNORE;<U0000>......<U7FFFFFFF>
> ... and then:
> 5001 # The comment at the beginning of this section mentions characters which
> 5002 # are not otherwise covered.  But this description cannot express this.
> 5003 # Therefore we add here a few entries which are used in older
> implementations
> 5004 # to be compatible.  --drepper
> I always thought we would fall back to code point
> order (former comment implies), but Drepper's comment
> makes it seem like that's not true? The Fedora bug
> also makes it seem like that's not true.
> Why might we not want code-point-based sorting for
> entries not defined?
> Is the solution to write automation to create iso14651_t1_common
> and list all the unspecified elements in code point order?

opengroup> The symbol UNDEFINED shall be interpreted as including all
opengroup> coded character set values not specified explicitly or via
opengroup> the ellipsis symbol. Such characters shall be inserted in
opengroup> the character collation order at the point indicated by the
opengroup> symbol, and in ascending order according to their coded
opengroup> character set values. If no UNDEFINED symbol is specified,
opengroup> and the current coded character set contains characters not
opengroup> specified in this section, the utility shall issue a warning
opengroup> message and place such characters at the end of the
opengroup> character collation order.

If this UNDEFINED symbol worked as specified, we could easily use code
point order as a fallback for entries not defined in the collation
order by inserting the UNDEFINED symbol in the LC_COLLATE definition
of the locale sources at an appropriate place.

Unfortunately UNDEFINED does not work as specified.

Some locale sources use it but it does not work.

Mike FABIAN <>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]