This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Add fast path for strcoll and strcasecmp

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Leonhard Holz <leonhard dot holz at web dot de>
Cc: libc-alpha at sourceware dot org
Date: Tue, 25 Nov 2014 21:36:12 +0100
Subject: Re: [RFC] Add fast path for strcoll and strcasecmp
Authentication-results: sourceware.org; auth=none
References: <20141123214718 dot GA28222 at domone> <54726516 dot 5060409 at web dot de> <20141123234728 dot GA31572 at domone> <5474E0B4 dot 9020908 at web dot de>

On Tue, Nov 25, 2014 at 09:04:04PM +0100, Leonhard Holz wrote:
> Am 24.11.2014 00:47, schrieb OndÅej BÃlka:
> >On Sun, Nov 23, 2014 at 11:52:06PM +0100, Leonhard Holz wrote:
> >>Hi OndÅej,
> >>
> >>as far as I understood, the current strcoll implementation scans
> >>both strings for collation sequences and compares the weights of
> >>them, whereby a collation sequence can be multiple bytes long. So
> >>whatever strcmp_l returns as index, you would need a general way of
> >>finding the start of the collation sequence this index is in.
> >>Unfortunately I cannot tell if or how this can be done.
> >>
> >As I wrote below you do not have to do that. Just precompute a table that
> >is zero for characters that are part of some collation sequence and use
> >old method when one of compared characters is in that table.
> >
> 
> Ok, I understand the idea and it would be great if it worked. BTW do
> you know how UTF-8 chars above 7F are handled?
>
A UTF-8 char consist of starting byte larger than 0xbf followed by
characters in 0x80-0xbf range, see

http://en.wikipedia.org/wiki/UTF-8

 
> > From performance perspective these are not problem as they should be
> >infrequent enough. Ignored ones are worse as they could make otherwise
> >identical long prefixes different.
> >
> >
> >>BTW I have implemented a benchmark for strcoll that is
> >>not-yet-pushed because I didn't manage to patch the bench-tests
> >>Makefile to generate additionally needed locales
> >>(https://sourceware.org/ml/libc-alpha/2014-10/msg00431.html).
> >>
> 
> I can send you the test files if you like.
> 
> Leonhard

Follow-Ups:
- Re: [RFC] Add fast path for strcoll and strcasecmp
  - From: Leonhard Holz

References:
- [RFC] Add fast path for strcoll and strcasecmp
  - From: OndÅej BÃlka
- Re: [RFC] Add fast path for strcoll and strcasecmp
  - From: Leonhard Holz
- Re: [RFC] Add fast path for strcoll and strcasecmp
  - From: OndÅej BÃlka
- Re: [RFC] Add fast path for strcoll and strcasecmp
  - From: Leonhard Holz

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]