This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Add fast path for strcoll and strcasecmp

Am 24.11.2014 00:47, schrieb OndÅej BÃlka:
On Sun, Nov 23, 2014 at 11:52:06PM +0100, Leonhard Holz wrote:
Hi OndÅej,

as far as I understood, the current strcoll implementation scans
both strings for collation sequences and compares the weights of
them, whereby a collation sequence can be multiple bytes long. So
whatever strcmp_l returns as index, you would need a general way of
finding the start of the collation sequence this index is in.
Unfortunately I cannot tell if or how this can be done.

As I wrote below you do not have to do that. Just precompute a table that
is zero for characters that are part of some collation sequence and use
old method when one of compared characters is in that table.

Ok, I understand the idea and it would be great if it worked. BTW do you know how UTF-8 chars above 7F are handled?

 From performance perspective these are not problem as they should be
infrequent enough. Ignored ones are worse as they could make otherwise
identical long prefixes different.

BTW I have implemented a benchmark for strcoll that is
not-yet-pushed because I didn't manage to patch the bench-tests
Makefile to generate additionally needed locales

I can send you the test files if you like.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]