This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] Add fast path for strcoll and strcasecmp
- From: Leonhard Holz <leonhard dot holz at web dot de>
- To: libc-alpha at sourceware dot org
- Date: Tue, 25 Nov 2014 21:04:04 +0100
- Subject: Re: [RFC] Add fast path for strcoll and strcasecmp
- Authentication-results: sourceware.org; auth=none
- References: <20141123214718 dot GA28222 at domone> <54726516 dot 5060409 at web dot de> <20141123234728 dot GA31572 at domone>
Am 24.11.2014 00:47, schrieb OndÅej BÃlka:
On Sun, Nov 23, 2014 at 11:52:06PM +0100, Leonhard Holz wrote:
as far as I understood, the current strcoll implementation scans
both strings for collation sequences and compares the weights of
them, whereby a collation sequence can be multiple bytes long. So
whatever strcmp_l returns as index, you would need a general way of
finding the start of the collation sequence this index is in.
Unfortunately I cannot tell if or how this can be done.
As I wrote below you do not have to do that. Just precompute a table that
is zero for characters that are part of some collation sequence and use
old method when one of compared characters is in that table.
Ok, I understand the idea and it would be great if it worked. BTW do you
know how UTF-8 chars above 7F are handled?
From performance perspective these are not problem as they should be
infrequent enough. Ignored ones are worse as they could make otherwise
identical long prefixes different.
BTW I have implemented a benchmark for strcoll that is
not-yet-pushed because I didn't manage to patch the bench-tests
Makefile to generate additionally needed locales
I can send you the test files if you like.