This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Thoughts on bug 15884

From: Leonhard Holz <leonhard dot holz at web dot de>
To: libc-alpha at sourceware dot org
Date: Mon, 15 Sep 2014 10:41:10 +0200
Subject: Thoughts on bug 15884
Authentication-results: sourceware.org; auth=none

Hi everybody,

since I'm new to a the list: My name is Leonhard Holz and I'm aGerman/Berliner software developer with some free time to spend onhunting bugs in open source software.

Bug 15884 (https://sourceware.org/bugzilla/show_bug.cgi?id=15884)complains about bad runtime performance of the strcoll function. Thisprobably has some relevance beyond this bug report as e.g. databases usethis function to sort string values. Previous investigators figured outthat the strlen function called to calculate the size of the cache forthe collation sequences is dominating the measured time of strcoll.


Thoughts on this:

* the strlen function is not needed by the comparison algorithm itself,so it could be removed if

	* the cache is removed
	* the cache is resized during comparison as needed

* the first pass / ruleset is used to fill the cache but does notbenefit from it* two strings have to be the same size and very similar (e.g. vary onlyin accents or capitalization) to get more than one pass (please correctif I got this wrong)-> in most cases the cache will not be used (notice the difference tostrxfrm() which basicly uses the same algorithm)

* caching adds overhead to parsing & comparison

* caching could cause memory problems - it tries to allocate five timesthe length of both strings

So I wonder if it's not the best option to completly remove the cachefeature. It will make some scenarios slower but probably most real worldscenarios a lot faster.


BTW: Does anybody have a benchmark script for strcoll()?

Kind regards,
Leonhard

Follow-Ups:
- Re: Thoughts on bug 15884
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]