This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[BZ #18441] strcoll performance regression


Hello,

the trigger for the regression is that the locale has no information about the
sort order of the chars given. With the locale th_TH it is pretty quick:

"strcoll": {
   "wikipedia-th#th_TH.UTF-8": {
    "duration": 4.31123e+06,
    "iterations": 16,
    "mean": 269452
   }
}

The english locale has four passes to determine the sort order. In the first three
passes it reports one recognized sequence length of zero independent of the thai
word given. At the fourths levels it recognizes the characters which are all
considered equal so actually the string length is determining the sort order.

The former version had a cache that avoided lookups in the locale data tables for
passes > 1 which did probably help in this scenario (but slows down for all others).

Anyhow the huge difference is astonishing. Next I will investigate how exactly the
sequence lookup works to figure out why it takes so long. But if anyone has an
idea and can point me in the right direction please comment.

Best,
Leonhard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]