This is the mail archive of the
mailing list for the glibc project.
Re: Thoughts on bug 15884
- From: Rich Felker <dalias at libc dot org>
- To: Paul Eggert <eggert at cs dot ucla dot edu>
- Cc: Leonhard Holz <leonhard dot holz at web dot de>, libc-alpha at sourceware dot org
- Date: Tue, 16 Sep 2014 12:20:47 -0400
- Subject: Re: Thoughts on bug 15884
- Authentication-results: sourceware.org; auth=none
- References: <5416A626 dot 3050501 at web dot de> <54171E87 dot 8050604 at redhat dot com> <541723E1 dot 3040705 at cs dot ucla dot edu>
On Mon, Sep 15, 2014 at 10:37:37AM -0700, Paul Eggert wrote:
> Thanks, Leonhard, for taking this on. Carlos's suggestions are good
> ones. One other idea is to see how GNU sort's performance is
> affected, as strcoll has long been a bottleneck in its performance.
> The fun part is that GNU sort is multithreaded, and perhaps that'll
> give you some ideas for your microbenchmark.
> Quick war story: I once tried to optimize GNU sort by using strxfrm
> instead of strcoll. It made performance waaayy worse. Oops. To
> this day I don't know what strxfrm is good for in practice.
Obviously the case where it matters is where each of your strings is
in the form of a common large prefix and a tiny difference at the end.
Here strcoll will perform pathologically bad (at least without some
kind of memoization) and strxfrm solves the problem. Of course strxfrm
pessimizes the case where each string is long but they all differ in
the first few characters, since strcoll then never has to inspect or
convert anything but the first few characters, and strxfrom performs
huge useless conversions.