This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][BZ #16009] Memory handling in strxfrm_l()

On Mon, Nov 24, 2014 at 10:40:51AM -0800, Paul Eggert wrote:
> On 11/24/2014 01:26 AM, Leonhard Holz wrote:
> >strxfrm is a function for pre-computing things that need to be
> >fast somewhere else
> That may be true in theory, but in practice strxfrm is pretty much
> useless everywhere.  When I tried to use it in GNU sort, I gave up
> in frustration, as strxfrm was soooo sllooooow that it was much
> better to simply use strcoll and be smart about it.  I don't know of
> any practical application that uses glibc strxfrm in a real way, and
> my advice for optimizating strxfrm is to not bother, and to focus
> one's efforts on making strcoll go faster.
> Unless perhaps you're thinking of rewriting strxfrm from scratch, in
> which case we can talk about what's needed.

A strxfrm is flawed design as most of time one needs to know just first
ten characters to do comparison. As one cannot generate small prefix
because strxfrm needs to report size of entire string and content is
unspecificed by standard it needs to do lot of unnecessary work.

With strcoll improvement that I posted earlier a possible strcoll speedup 
is quite big, when I tested a time of 'find | sort' command sort runs
around ten times faster. However is a use case with long common prefixes
which gives that speedup, it would be less when sorting english strings.

What would help is add another function, say strtrim. That function
would delete characters ignored by comparison by string. That would fix
weakness of my approach where comparing long prefixes that differ only
in ignored characters is still slow.

As sorting goes we should add a dedicated algorithm for strcoll, one
wants to use radixsort which is possible but it needs to access locale
specification to get weigths and deal with specific cases.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]