This is the mail archive of the
libc-locales@sources.redhat.com
mailing list for the GNU libc locales project.
Re: Collating bug with period?
On Wed, Apr 06, 2005 at 10:50:43AM +0200, Danilo Segan wrote:
> Today at 10:06, Ole Laursen wrote:
>
> > The Danish locale is apparently dictated by a standard. But why do you
> > think this is a locale bug? I would expect the locale to support
> > dictionary-like sorting, not filename sorting.
>
> Of course. So, what you want is da_DK@filenamesorting instead, where
> a dot would be a sortable character. You can try playing with that
> yourself, by doing 'copy "da_DK"' inside LC_COLLATE section, and then
> modifying the table to suit your needs. Or perhaps we want a
> completely new locale based on iso14651_t1 suitable for filename
> sorting? But, this again wouldn't work for all locales, so it's still
> doomed.
>
> Of course, if there are any problems they are in the inscalability of
> POSIX locale system, not in any particular application making use of
> it. Whatever you're thinking of is simply a special case of the more
> general problem, and it needs a more general solution (like having
> LC_FILENAME_COLLATE). Though, I don't think this is necessary, since
> you either prefer the dictionary collation, or you don't.
>
> What you need to think about is: where do I want dictionary collation
> and why, and where do I not want it and why? I suspect that you'll
> end up wanting only one most of the time, though I'm of course only
> guessing.
>
> > Another common option is whether to treat punctuation (including
> > spaces) as base characters or treate them as a level 4 difference.
> >
> > Doesn't this support the idea that an application may need a slighly
> > different sort order?
>
> Yes, I thought of UCA only as a reference on where all of these issues
> are discussed. *How* are these customizations done entirely depends on
> the system in use.
>
> Unfortunately, POSIX (and by extension, GNU libc) doesn't support it
> that easily (it doesn't "scale" on that dimension): you need to have
> separate locale for that, even though most of the data inside
> LC_COLLATE can be reused, and only weights on a few elements need to
> be reassigned.
What do yoy mean by "not scaling"? Glibc has a mechanism "reorder-after"
that can build on an existing LC_COLLATE spec and then just reorder a
few characters, like the PERIOD character. This functionality is also
included in ISO 14651 and TR 14652.
Best regards
keld