Collating bug with period?

Ole Laursen olau@cs.aau.dk
Thu Apr 7 07:55:00 GMT 2005


Danilo Segan <dsegan@gmx.net> writes:

> Yesterday at 13:44, Ole Laursen wrote:
> 
> > Ole Laursen <olau@cs.aau.dk> writes:
> >
> >> Yeah, it is merely a convention. I think if you split at the first
> >> dot, you will get a good approximation that also mimics would you get
> >> with the C locale. It does not catch "sample.ramble.tar.gz", but such
> >> constructions are rare.
> >
> > I accidentally hit C-c C-c before writing "hopefully rare". The idea
> > is that it won't work if you actively work against system, but as long
> > you obey a "the first dot signifies the extension" rule, it would work
> > fine, just as it does in the C locale.
> 
> But you don't have to "actively work against the system".  Files with
> version numbers (look at eg. /usr/bin/python-2.4, emacs-21.3,...) and
> tarballs such as something.tar.gz are very common.  It doesn't work
> that way, and this approximation is very bad. 

It will work perfectly with tar files if you split at the first dot
(from the left, mind you):

  event.gz
  event.tar.gz
  event.zip
  event-1.tar.gz
  eventgenerator.tar.gz

Version numbered files are not common for anything else than system
files, but it will work OK for them, too, AFAICS. You get

  python
  python-1.5
  python-1.6
  python-2.12   <- multi-digit numbers are of course bad, as always
  python-2.3
  python-2.4
  python-3.0

And anyway, what you get is no worse than what the C locale gives,
actually a bit better since you get event-1.tar.gz after event.tar.gz.


> You certainly want a new collation table.

Why do you think a collation table would help? It can't do magic, it
still needs to use some kind of algorithm. Do you have something
particular in mind?

-- 
Ole Laursen
http://www.cs.aau.dk/~olau/



More information about the Libc-locales mailing list