This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] further speedup of strcoll

On 02/11/2015 04:38 AM, Leonhard Holz wrote:
> The strcoll method can probably be made faster by fast-forwarding to
> the position of the first difference of the two strings via bytewise
> comparison so that the expensive location-aware comparison has to
> deal with the interesting part of the strings only. But this has some
> implications so I ask for your feedback.
> First issue is that I do not know of a function which returns the
> position of the first difference of two strings. This functionality
> is actually included in strcmp but not accessible. I could implement
> a new __ function in C and later carve out platform dependent
> versions from strcmp.

It's always OK to start with an internal function.
> Then it needs to be encoding aware as for anything but 8bit encodings
> the bytewise comparison can stop in the middle of a multi-byte value.
> Actually there is no easy way get the encoding type (one could check
> for "UTF-8" in the locale name, but this is way to expensive) and
> probably __locale_t has to be extended with an int encoding_type that
> e.g. states 0 for other, 1 for 8-bit, 2 for UTF-8 and so on. Is this
> feasible?

The size and alignment of locale_t must not change. That's the hard
requirement from the ABI perspective.

> And finally it is necessary to check if the difference is inside a
> backward sequence since then there could be a difference after the
> found difference that is logically before it. And since there is no
> way to find the start of a backward sequence from within the
> comparison has to start from the beginning of the strings and the
> effort to get the position of the first difference is wasted. But
> this should be overweighted by the speedup in the other cases,
> especially if the strings are actually equal.

We'll need real data and benchmarking to validate that.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]