This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] Add fast path for strcoll and strcasecmp
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Leonhard Holz <leonhard dot holz at web dot de>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 27 Nov 2014 16:25:27 +0100
- Subject: Re: [RFC] Add fast path for strcoll and strcasecmp
- Authentication-results: sourceware.org; auth=none
- References: <20141123214718 dot GA28222 at domone> <54726516 dot 5060409 at web dot de> <20141123234728 dot GA31572 at domone> <5474E0B4 dot 9020908 at web dot de> <20141125203612 dot GA21077 at domone> <54759849 dot 80109 at web dot de>
On Wed, Nov 26, 2014 at 10:07:21AM +0100, Leonhard Holz wrote:
>
>
> Am 25.11.2014 21:36, schrieb OndÅej BÃlka:
> >On Tue, Nov 25, 2014 at 09:04:04PM +0100, Leonhard Holz wrote:
> >>Am 24.11.2014 00:47, schrieb OndÅej BÃlka:
> >>>On Sun, Nov 23, 2014 at 11:52:06PM +0100, Leonhard Holz wrote:
> >>>>Hi OndÅej,
> >>>>
> >>>>as far as I understood, the current strcoll implementation scans
> >>>>both strings for collation sequences and compares the weights of
> >>>>them, whereby a collation sequence can be multiple bytes long. So
> >>>>whatever strcmp_l returns as index, you would need a general way of
> >>>>finding the start of the collation sequence this index is in.
> >>>>Unfortunately I cannot tell if or how this can be done.
> >>>>
> >>>As I wrote below you do not have to do that. Just precompute a table that
> >>>is zero for characters that are part of some collation sequence and use
> >>>old method when one of compared characters is in that table.
> >>>
> >>
> >>Ok, I understand the idea and it would be great if it worked. BTW do
> >>you know how UTF-8 chars above 7F are handled?
> >>
> >A UTF-8 char consist of starting byte larger than 0xbf followed by
> >characters in 0x80-0xbf range, see
> >
> >http://en.wikipedia.org/wiki/UTF-8
> >
>
> Sorry for confusion. The question was ought to ask how the algorithm
> handles them. E.g. what to do when strcmp stops at a char with value
> 0x81.
check which of three characters before is starting one.