key64_t? ino64_t?

Robert Collins rbcollins@cygwin.com
Thu May 15 01:39:00 GMT 2003


On Thu, 2003-05-15 at 10:58, Charles Wilson wrote:

> > You can make it impossible for a collision to occur. My psuedo code does
> > exactly that.
> 
> Until you run out of keys, and then (I think) your code *doesn't* return
> BADVAL (and it should).  When keybits are exhausted, wouldn't it just
> start over, handing out the 0th key again?

Well, thats why it was pseudo code :}. Yes, it should return -1 when it
hits the wall.

> But, that's really a moot point.  You've got to have one heck of a
> machine to allocate 2^32 + 1 FTOKNode objects...


> Implementation details of FTOKNode::operator++.  

We don't need such an operator - each node is created once. We have a
static counter which is simply long (NextKey).

> The general
> thread-safety issues with "singleton" data structures (colloguial; no
> lectures about strict definitions from GOF, thanks...) -- which you are
> obviously aware of since you guarded ftok().  But is that the only place
> that needs guards?  Etc.  

Correct, it's the only place that needs guarding, as everything else
accesses it via ftok().

> As I said, it's all details; if you're careful they won't bite you.  But
> 'careful' means 'more code' (usually).
> 
> > > * very low: you make an extremely unlikely event even more unlikely --
> > > but still not impossible.  I *can* construct a scenario in which this
> > > tree-based thing fails.
> > 
> > Please, do so. AFAIK such a scenario doesn't exist within the bounds of
> > valid (according to SuSv2) parameters to ftok().
> 
> for i = 0 .. 2^32+1
>   s = num2str(i);
>   ftok(s,1);
> end
> 
> However, as I pointed out above, you'll run out of memory long before
> then.  However, "Given a ..., Construct a ..." is a formula for a
> theoretical case -- a worst case failure mode.  So, even on a machine
> with infinite memory, the above code snippet breaks the "contract" for
> ftok's unique return value.

Well, another tweak to the pseudo code is needed - the canonical_path
(which is a little theorectical, there is a routine that will do the
job, but I don't recall it's name offhand) should have it's result
checked for statability(). Then you'll need 1.6Million unique files -
which should be more of a problem :}.

> > > See, to me this is a tree-based name hash.  You say 'lookaside' and I
> > > think L2 cache... <g> 
> 
> Sigh.  I was using the word 'hash' in its colloquial,
> non-computer-science-lingo form.  That is, the bit-string for a given
> token in a Trie *is* a hash (in the colloquial sense), just as a gzipped
> (or better, lossy-compressed) version of the token is a 'hash' of sorts
> -- but not in the CS pedantic sense.  Sorry for the confusion.

Whoa. I never imagined that there was a non-CS version of hash, other
than 'hash brown'.

> > Secondly, this is one implementation of a lookaside table. It's not the
> > only possible one - you can do a lookaside table using a hash if you
> > want to. The point is that the key handed out is *not* dependent on the
> > node in in the table, but on data within that node.
> 
> Ah -- more obviousness until the left curve at the end there.  It was
> hard (impossible?) to tell from your pseudo code, but I thought that the
> key was *solely* dependent on the node position in the tree (which is
> not, in general, directly related to the data in the node -- although
> many tree structures do have a relationship between the node data and
> node position)
> 
> OTOH, if the key is solely dependent on the **node data** and not node
> position at all, then that removes one of my objections (deterministic
> key <-> node value, independent of 'insertion' order).

The key is in the FTOKNode, assigned in FTOKNode::FTOKNode.

> > Finally, it's different from a cache: We're not optimising performance
> > for data available elsewhere, we're creating and  storing unique
> > associations.

...
> So, if you get a cache hit, you actually
> collect data from the side.  Literally and physically.

Gotcha.
...
> Now, about your quote from P1003.1:
> 
> > "The ftok() function shall return the same key value for all paths that
> > name the same file, when called with the same id value, and return
> > different key values when called with different id values or with paths
> > that name differnet files existing on the same file system at the same
> > time."
> 
> (a) this is Posix, not SuS (minor nit).
> (b) it says nothing about what happens when the key size is exhausted
> (which in a theoretical sense is sort of a major ommission: how can you
> guarantee uniqueness within a finite range space, if you don't describe
> the boundary conditions?)

I am quoting a subsection, to move the discussion to the spec, not to
the linux man page. The return value "(key_t) -1" is reserved for such a
condition.

...
> *POSIX* has those requirements, at least given your quote above.  But
> that's good enough for me.
> 
> So, we've established that the premier open source operating system does
> not comply with POSIX (SuS?) in this regard.  Why not?  If this is so
> damn important, why doesn't Linux follow the rules?  Surely you're not
> the first person to notice this...  Where are the massive failure
> reports, streaming across the Net, declaring the IPC is fundamentally
> broken on Linux.  Where's a real-world, in-the-wild example of this
> conformance breach causing a failure?

I've no idea. It may be that the needed conditions are so low it doesn't
matter in a pragmatic sense. OTOH, care to grab the solaris source and
do some clean room engineering? I'd expect it to accomodate hard links
better than Linux for this...

> Perhaps the Linux people deliberately decided not to follow those
> requirements?  If so, why?  (IMO, probably because they judged the
> overhead of rigid adherence -- requiring something like your Trie
> proposal, or some other mechanism -- to be too great, with too much
> impact on performance, for too little gain.)
> 
> - - -
> 
> But back to a minor point: how can both of the following (which you have
> asserted concerning a Tree/Trie implementation) be true?
> 
> (1) returned key value is solely dependent on the data in a node.  
> 
> Since the data in the node is the pathname, which can be (at most) a
> PATHMAX array of chars (where each char -- let's be charitable and leave
> out diacritics -- can have one of about 100 different values.  That's a
> domain space of 100^PATHMAX == 10^(2PATHMAX)  Now, the output range space
> is 2^32 ~ 10^9 which is MUCH smaller than the domain.  So, IFF the output
> key is dependent SOLELY on the input data, then we WILL hash from 10^512
> down to 10^9.

Are you using hash literally there, or as a placeholder for some other
concept? Because we aren't hashing. 

> (2) returned keys will be unique, as long as less than MAXKEYS (2^32)
> keys are issued.
> 
> Logically, these can't be true -- because (2) implies that the returned
> key depends on node position.

(2) implies that the returned key for previously unseen pathss depends
on the number of previously seen paths - it does not imply that the
returned key depends on the position of the node in our internal storage
structure.

> I just don't get it.  :-(

> Just for the sake of argument, suppose we have two different machines: A
> and B.  On each machine, there exists a filesystem with 2^32 + 1
> different unique pathnames, which are identical on A and B.  On both
> machines, we call ftok() on the first 2^32 - 1 pathnames, and get 2^32 -
> 1 unique key_t values back.  Further, because "key value is dependent
> solely on node data", we also know that each returned value of key was
> the same on A as on B, for each identical pair of pathnames.

Yes.

> After this is done, on both machines, each cygserver has only a single
> remaining "free" key value left in its Tree/Trie.  And, because cygserver
> on A and on B has been returning the same values for the same pathnames
> all along, we KNOW that the only remaining key value on both A and on B
> is the same.

Yes.

> On A, we call ftok() with the 2^32'th path, and on B we call ftok() with
> the 2^32+1'th path -- which are different.  Yet, because the cygservers
> have no choice, cygserverA returns its last key value for the 2^32'th
> path, while cygserverB returnes its last key value (the SAME value as
> A's) for the 2^32+1'th path.

Yes.

> That is, in this case, the returned value is not dependent on the node
> data AT ALL -- you get the last key value regardless of what your
> pathname or node data is.

The returned value is 100% dependent on the node data. The node data is
dependent on creation order, not on storage location.

Thats the key to this:
If you return your storage data (be that hash value, array index, node
pointer), then your system is fragile if that value is either mutable or
subject to collisions from the domain.

So what I am suggesting we do is:
Assign a unique key prefix 24 bits long each time a new node is needed.
Store that node *any old how* so we can find it again in an efficient
manner.
Return the key prefix from the node matching the requested path || the
requested 8 bit id.

Rob

-- 
GPG key available at: <http://users.bigpond.net.au/robertc/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://cygwin.com/pipermail/cygwin-developers/attachments/20030515/a76c3f90/attachment.sig>


More information about the Cygwin-developers mailing list