Next: , Previous: File layout, Up: mmo


3.5.2 Symbol table format

From mmixal.w (or really, the generated mmixal.tex) in http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz): “Symbols are stored and retrieved by means of a `ternary search trie', following ideas of Bentley and Sedgewick. (See ACM–SIAM Symp. on Discrete Algorithms `8' (1997), 360–369; R.Sedgewick, `Algorithms in C' (Reading, Mass. Addison–Wesley, 1998), `15.4'.) Each trie node stores a character, and there are branches to subtries for the cases where a given character is less than, equal to, or greater than the character in the trie. There also is a pointer to a symbol table entry if a symbol ends at the current node.”

So it's a tree encoded as a stream of bytes. The stream of bytes acts on a single virtual global symbol, adding and removing characters and signalling complete symbol points. Here, we read the stream and create symbols at the completion points.

First, there's a control byte m. If any of the listed bits in m is nonzero, we execute what stands at the right, in the listed order:

      (MMO3_LEFT)
      0x40 - Traverse left trie.
             (Read a new command byte and recurse.)
     
      (MMO3_SYMBITS)
      0x2f - Read the next byte as a character and store it in the
             current character position; increment character position.
             Test the bits of m:
     
             (MMO3_WCHAR)
             0x80 - The character is 16-bit (so read another byte,
                    merge into current character.
     
             (MMO3_TYPEBITS)
             0xf  - We have a complete symbol; parse the type, value
                    and serial number and do what should be done
                    with a symbol.  The type and length information
                    is in j = (m & 0xf).
     
                    (MMO3_REGQUAL_BITS)
                    j == 0xf: A register variable.  The following
                              byte tells which register.
                    j <= 8:   An absolute symbol.  Read j bytes as the
                              big-endian number the symbol equals.
                              A j = 2 with two zero bytes denotes an
                              unknown symbol.
                    j > 8:    As with j <= 8, but add (0x20 << 56)
                              to the value in the following j - 8
                              bytes.
     
                    Then comes the serial number, as a variant of
                    uleb128, but better named ubeb128:
                    Read bytes and shift the previous value left 7
                    (multiply by 128).  Add in the new byte, repeat
                    until a byte has bit 7 set.  The serial number
                    is the computed value minus 128.
     
             (MMO3_MIDDLE)
             0x20 - Traverse middle trie.  (Read a new command byte
                    and recurse.)  Decrement character position.
     
      (MMO3_RIGHT)
      0x10 - Traverse right trie.  (Read a new command byte and
             recurse.)

Let's look again at the lop_stab for the trivial file (see File layout).

      0x980b0000 - lop_stab for ":Main" = 0, serial 1.
      0x203a4040
      0x10404020
      0x4d206120
      0x69016e00
      0x81000000

This forms the trivial trie (note that the path between “:” and “M” is redundant):

      203a     ":"
      40       /
      40      /
      10      \
      40      /
      40     /
      204d  "M"
      2061  "a"
      2069  "i"
      016e  "n" is the last character in a full symbol, and
            with a value represented in one byte.
      00    The value is 0.
      81    The serial number is 1.