This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] further speedup of strcoll

From: "Carlos O'Donell" <carlos at redhat dot com>
To: Leonhard Holz <leonhard dot holz at web dot de>, libc-alpha at sourceware dot org
Date: Wed, 11 Feb 2015 13:16:39 -0500
Subject: Re: [RFC] further speedup of strcoll
Authentication-results: sourceware.org; auth=none
References: <54DB22FA dot 3030208 at web dot de>

On 02/11/2015 04:38 AM, Leonhard Holz wrote:
> The strcoll method can probably be made faster by fast-forwarding to
> the position of the first difference of the two strings via bytewise
> comparison so that the expensive location-aware comparison has to
> deal with the interesting part of the strings only. But this has some
> implications so I ask for your feedback.
> 
> First issue is that I do not know of a function which returns the
> position of the first difference of two strings. This functionality
> is actually included in strcmp but not accessible. I could implement
> a new __ function in C and later carve out platform dependent
> versions from strcmp.

It's always OK to start with an internal function.
 
> Then it needs to be encoding aware as for anything but 8bit encodings
> the bytewise comparison can stop in the middle of a multi-byte value.
> Actually there is no easy way get the encoding type (one could check
> for "UTF-8" in the locale name, but this is way to expensive) and
> probably __locale_t has to be extended with an int encoding_type that
> e.g. states 0 for other, 1 for 8-bit, 2 for UTF-8 and so on. Is this
> feasible?

The size and alignment of locale_t must not change. That's the hard
requirement from the ABI perspective.

> And finally it is necessary to check if the difference is inside a
> backward sequence since then there could be a difference after the
> found difference that is logically before it. And since there is no
> way to find the start of a backward sequence from within the
> comparison has to start from the beginning of the strings and the
> effort to get the position of the first difference is wasted. But
> this should be overweighted by the speedup in the other cases,
> especially if the strings are actually equal.

We'll need real data and benchmarking to validate that.

Cheers,
Carlos.

References:
- [RFC] further speedup of strcoll
  - From: Leonhard Holz

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]