key64_t? ino64_t?

Charles Wilson
Thu May 15 00:58:00 GMT 2003

On 15 May 2003 09:41:50 +1000, "Robert Collins" <>
> Hmm. Trees depend on '<'. Its the minimum needed to build a tree
> structure. Tries don't depend on '<', but they do depend on a bit-string
> as a key.

Thinko.  I was thinking specifically of a trie (the bit-string thing).

> No - vtables, ctor and dtor don't apply *at all* unless we deliberately
> use those features. And operator < doesn't require any of vtables, ctor
> or dtor.

I'll take your word for it.

> > Why "must" we? 
> Because if we have a conflict, and we don't resolve it do deliver a
> unique key to the client, then a client may ask for a shm segment X, and
> get Y instead. Limits to the number of keys are less destructive than
> having a shm area that *cannot* be accessed because of a collision.

So returning key = -1 (or other BADVAL) to indicate out-of-keys is better
than returning a valid key -- which is actually invalid because of
clashing.   I can see that.

> You can make it impossible for a collision to occur. My psuedo code does
> exactly that.

Until you run out of keys, and then (I think) your code *doesn't* return
BADVAL (and it should).  When keybits are exhausted, wouldn't it just
start over, handing out the 0th key again?

But, that's really a moot point.  You've got to have one heck of a
machine to allocate 2^32 + 1 FTOKNode objects...

> > I don't like this.  Without the full implementation, I'm just whistling
> > in the dark here -- but there are a lot of gotchas in this code.  All of
> > them can be coded around, within the Tree/Trie implementation.  But more
> > code == more complexity.
> What gotchas? Limits to number of keys - sure. What other ones?

Implementation details of FTOKNode::operator++.  The general
thread-safety issues with "singleton" data structures (colloguial; no
lectures about strict definitions from GOF, thanks...) -- which you are
obviously aware of since you guarded ftok().  But is that the only place
that needs guards?  Etc.  

As I said, it's all details; if you're careful they won't bite you.  But
'careful' means 'more code' (usually).

> > * very low: you make an extremely unlikely event even more unlikely --
> > but still not impossible.  I *can* construct a scenario in which this
> > tree-based thing fails.
> Please, do so. AFAIK such a scenario doesn't exist within the bounds of
> valid (according to SuSv2) parameters to ftok().

for i = 0 .. 2^32+1
  s = num2str(i);

However, as I pointed out above, you'll run out of memory long before
then.  However, "Given a ..., Construct a ..." is a formula for a
theoretical case -- a worst case failure mode.  So, even on a machine
with infinite memory, the above code snippet breaks the "contract" for
ftok's unique return value.

> > Is the value worth the cost (complexity)?
> > 
> > I don't think so, but that's just my opinion.
> Well, I've also put forward my opinion. Until I get time to dig up
> SUSv2's reference... 

Your Posix P1003.1 reference pasted below, to consolidate this thread. 
See way below.

> > > Should give a reasonably performing implementation with space for
> > > 16.7Million ftok() calls on unique paths. It will use space linear to
> > > the path length of each unique path presented to ftok().
> > > 
> > > This is what I meant by a lookaside table.
> > 
> > See, to me this is a tree-based name hash.  You say 'lookaside' and I
> > think L2 cache... <g> 

Sigh.  I was using the word 'hash' in its colloquial,
non-computer-science-lingo form.  That is, the bit-string for a given
token in a Trie *is* a hash (in the colloquial sense), just as a gzipped
(or better, lossy-compressed) version of the token is a 'hash' of sorts
-- but not in the CS pedantic sense.  Sorry for the confusion.

> Firstly, neither a trie or a tree is a hash. hash's have collisions and
> approaches to resolving that. Storing a hash in a user-visible value
> means that the user has to resolve collisions: but they can't, a key is
> opaque to the user.

Yes, yes, I know: more CS pedanticism...

> Secondly, this is one implementation of a lookaside table. It's not the
> only possible one - you can do a lookaside table using a hash if you
> want to. The point is that the key handed out is *not* dependent on the
> node in in the table, but on data within that node.

Ah -- more obviousness until the left curve at the end there.  It was
hard (impossible?) to tell from your pseudo code, but I thought that the
key was *solely* dependent on the node position in the tree (which is
not, in general, directly related to the data in the node -- although
many tree structures do have a relationship between the node data and
node position)

OTOH, if the key is solely dependent on the **node data** and not node
position at all, then that removes one of my objections (deterministic
key <-> node value, independent of 'insertion' order).

> Finally, it's different from a cache: We're not optimising performance
> for data available elsewhere, we're creating and  storing unique
> associations.

I know, I know -- it was simply more colloquialism.  Remember, I'm a
hardware engineer at heart.  When I say L2 cache, I think transistors,
registers, pipelines, VLSI layout.  And 'lookaside' *really* *does*
*mean* "looking aside" -- the cache is physically "over there" in "that"
part of the chip; the main memory is "straight ahead," past the I/O
drivers and off the chip.  So, if you get a cache hit, you actually
collect data from the side.  Literally and physically.

So when you say "lookaside table" my unalterable instant mental picture
is a block diagram of a CPU/TLB cache.  Forgive the poor hardware
engineer in his struggles to translate "mental image of TLB cache" into
"Trie-structured GUID assignment scheme based on the value of a string". 
Worlds collide...

Now, about your quote from P1003.1:

> "The ftok() function shall return the same key value for all paths that
> name the same file, when called with the same id value, and return
> different key values when called with different id values or with paths
> that name differnet files existing on the same file system at the same
> time."

(a) this is Posix, not SuS (minor nit).
(b) it says nothing about what happens when the key size is exhausted
(which in a theoretical sense is sort of a major ommission: how can you
guarantee uniqueness within a finite range space, if you don't describe
the boundary conditions?)


(c) it does require absolute uniqueness (provided key_t is not 'full') on
a single file system, and 
(d) exhausting key_t is pretty darn unlikely; you'll hit memory full,

> Key things to note: sus doesn't require uniqueness across file systems 
> - but it's a good idea if we can do that for low or no extra cost/
> It *does* require uniqueness within a file system.

*POSIX* has those requirements, at least given your quote above.  But
that's good enough for me.

So, we've established that the premier open source operating system does
not comply with POSIX (SuS?) in this regard.  Why not?  If this is so
damn important, why doesn't Linux follow the rules?  Surely you're not
the first person to notice this...  Where are the massive failure
reports, streaming across the Net, declaring the IPC is fundamentally
broken on Linux.  Where's a real-world, in-the-wild example of this
conformance breach causing a failure?

Perhaps the Linux people deliberately decided not to follow those
requirements?  If so, why?  (IMO, probably because they judged the
overhead of rigid adherence -- requiring something like your Trie
proposal, or some other mechanism -- to be too great, with too much
impact on performance, for too little gain.)

- - -

But back to a minor point: how can both of the following (which you have
asserted concerning a Tree/Trie implementation) be true?

(1) returned key value is solely dependent on the data in a node.  

Since the data in the node is the pathname, which can be (at most) a
PATHMAX array of chars (where each char -- let's be charitable and leave
out diacritics -- can have one of about 100 different values.  That's a
domain space of 100^PATHMAX == 10^(2PATHMAX)  Now, the output range space
is 2^32 ~ 10^9 which is MUCH smaller than the domain.  So, IFF the output
key is dependent SOLELY on the input data, then we WILL hash from 10^512
down to 10^9.

(2) returned keys will be unique, as long as less than MAXKEYS (2^32)
keys are issued.

Logically, these can't be true -- because (2) implies that the returned
key depends on node position.

I just don't get it.  :-(

Just for the sake of argument, suppose we have two different machines: A
and B.  On each machine, there exists a filesystem with 2^32 + 1
different unique pathnames, which are identical on A and B.  On both
machines, we call ftok() on the first 2^32 - 1 pathnames, and get 2^32 -
1 unique key_t values back.  Further, because "key value is dependent
solely on node data", we also know that each returned value of key was
the same on A as on B, for each identical pair of pathnames.

After this is done, on both machines, each cygserver has only a single
remaining "free" key value left in its Tree/Trie.  And, because cygserver
on A and on B has been returning the same values for the same pathnames
all along, we KNOW that the only remaining key value on both A and on B
is the same.

On A, we call ftok() with the 2^32'th path, and on B we call ftok() with
the 2^32+1'th path -- which are different.  Yet, because the cygservers
have no choice, cygserverA returns its last key value for the 2^32'th
path, while cygserverB returnes its last key value (the SAME value as
A's) for the 2^32+1'th path.

That is, in this case, the returned value is not dependent on the node
data AT ALL -- you get the last key value regardless of what your
pathname or node data is.

Again, I just don't get it. :-(

  Charles Wilson
  cygwin at removespam cwilson dot fastmail dot fm

More information about the Cygwin-developers mailing list