This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.

Index Nav: Message Nav: [Date Index] [Subject Index] [Author Index] [Thread Index] [Date Prev] [Date Next] [Thread Prev] [Thread Next] [Raw text]

# Re: Patricia Trie pseudo-test

• From: John Moser <john dot r dot moser at gmail dot com>
• To: libc-alpha at sourceware dot org
• Cc: binutils <binutils at sourceware dot org>
• Date: Tue, 22 Aug 2006 02:44:08 -0400
• Subject: Re: Patricia Trie pseudo-test
• References: <1156225627.5373.83.camel@localhost>

```On Tue, 2006-08-22 at 01:47 -0400, John Moser wrote:

[...]
> I'm still thinking this is probably a win
> since we're unlikely to branch 8 times from a common prefix and even if
> we do we're suddenly faced with a situation where we CAN assume we're
> winning something*; Drepper are you around, I could use your input.

So I had some math run.

You have N string elements each sharing a common prefix of X characters.

If you arrange these in a trie, you have N branches to follow after the
prefix.

Each direction tested in a branch is identical in operation to a
character comparison.  (I designed the storage format that way
specifically.)

Every string is searched for.  The linker eventually goes through all
symbols, taking every possible search path.

If the strings are in a linked list, you compare the common prefixes 1+2
+...+N times, making X comparisons each time, totaling (1+2+...+N)*X or
X*((N+1) * N)/2 character comparisons.

If the strings are in a trie, you compare the common prefixes N times
and compare each branch direction ((N+1)*N)/2 times, totaling N*X + ((N
+1)*N)/2 branches.

These simplify to (N*(1 + N)*X)/2 (linked list) and (N*(1 + N))/2 + N*X
(patricia trie).

A little help from #math (who helped me figure out the simplified
versions) and we get:

<+Cale> % Reduce[{x*((n + 1)*n)/2 > n*x + ((n + 1)*n)/2, x > 0, n > 0}]
<mbot> Cale: x > 1 && n > (1 + x)/(-1 + x)

Now here's some interesting thoughts:

X = 2; N > 3/1 (3)
X = 3; N > 4/2 (2)
X = 4; N > 5/3 (1.6)

Once we've passed prefixes 4 characters long, any branch will be a gain
because N will be 2 or more.  Also the more directions we branch in, the
more time we save; for example, take a prefix 3 characters long (let's
say namespace std, which is probably like _Z22std anyway) and branch it
3, 5, and 10 ways.
LL     PT
3:  18     15
5:  45     30
10: 165    85

Or let's try a branch between 2 with prefixes 3, 5, 10, 20, and 50 long:
LL     PT
3:  9      9
5:  15     13
10: 30     23
20: 60     43
50: 150    103

In other words, this is harmless if prefixes are longer than 2
characters; and a gain if prefixes are longer than 3 characters OR the
number of symbols in the bucket sharing that number of prefixes is
greater than 2.  This is EXTREMELY likely.

Of course, this is still all in theory.  But the numbers are pretty.
>
--
John Moser <john.r.moser@gmail.com>
```

Attachment: signature.asc
Description: This is a digitally signed message part

Index Nav: Message Nav: [Date Index] [Subject Index] [Author Index] [Thread Index] [Date Prev] [Date Next] [Thread Prev] [Thread Next]