Compactness improvements

These changes almost all apply to CTFv4 or more likely CTFv5 only, and will be removed when BTF is emitted (though some might be suggested as future BTF improvements, with libctf given as an existence proof and as evidence of effectiveness).

Tail-merge the strtab

(Applies to BTF too.)

This is trivially done by sorting twice, first on the reversed strings in the strtab, then turning all possible strings into tail pointers (easy now they're all adjacent), then sorting the new smaller set lexicographically as now. (Not sure if the complexity is worth it after chunking is in force, but it's worth a try for the sake of BTF if nothing else: and if chunking doesn't work out, it's definitely worth it.)

Try to revive string chunking

(Significant compactness improvement, as in, uncompressed file sizes 1/3 of what they were. Difficulty: low because I already wrote it: this will be an early experiment.)

Before I do anything else I'm going to try to revive the string chunking work, which tried to express as many strings as possible as references to a chunk table of references to strings, divided up at _ and lower-to-uppercase changes. This was ruined by the comparative incompressibility of the chunk indexes, so that the compressed dicts were often bigger than they were before. We can try to bring that back by moving the chunk index section to after the string section and compressing everything but the chunk index section (conceptually dividing dicts into compressed and non-compressed regions, and moving the chunk index into the non-compressed region). This might give us better compression overall: or worse. But it's easy to try, mostly already exists, and might give big wins, so let's see.

Another possible enhancement rolled in from the strtab chunking code, which again may hurt compression and never be implemented (will test it, since this code is already written): once we know the size of the internal and external string tables (at link time), we can shrink the offsets of all string references in the CTF format correspondingly. Doing it on a bit-tight basis is ridiculous but we can certainly arrange to use 24-bit offsets if the strtabs do not exceed 2^{23 bytes in size, and 16-bit offsets if the strtabs do not exceed 2}15. This saves one byte per type, structure member, etc for nearly all CTF dictionaries (even without the structure member optimizations below), and two bytes per type for smaller ones.

Properly denote the names of sections used in CTF dicts

(Format flexibility against future changes of a sort which have already happened in the past. Priority: high. Difficulty: low since we're breaking the format already.)

Right now, we are forced to guess whether the string and symbol table sections the CTF consumer uses are the same as those used by the producer, via a header flag in the preamble. Replace this with header fields giving the section names (which will probably have to be stored in the CTF strtab simply because they'll be needed to get hold of the external strtab), so we can be sure of always agreeing between producer and consumer.

Compression improvements

(Major compactness improvements. Priority: high. Difficulty: moderate, entirely because of compatibility concerns on weird platforms.)

Obviously likely to have major dict size wins.

Introduce LZMA compression and zstd compression (easy, but what happens if the consumers do not have a libctf linked against the compressor of discourse? dlopen it! fail if we don't have it at all. What happens with .gnu_debugdata, which is defined to be LZMA-compressed?)

Making zstd the standard compressor for CTF if not BTF seems likely to be a major win (not in binutils, but *very* common by now, so if we dlopen it if necessary we can almost always rely on its being around: a linker option to turn on zlib or lzma instead seems likely to be good enough).

Further future compression-related work

We might also try Burrows-Wheeler compression of everything (or of just the strtab) before compression. Given that Burrows-Wheeler helps almost any compressor (bzip2 was nothing but Burrows-Wheeler and RLE) it might help a lot (but the time cost might be too high).

Introduce a "trial serialization" phase, where we make trial decisions, try compressing, then adjust them and see if the compression improves. Relatively expensive, but might have considerable gains someitmes. (Current decisions we might adjust: the index / pad tradeoff for indexed symtab sections.)

Type section compactness improvements

Bring back the size benefits of overlapping structures which we mostly lost with format v3; the space needed for a ctf_id_t grew so much. This is really a regression fix

Introduce new ctf_ttype_t with new info word, which saves four bytes per suitable type:

               ---------------------|
   ctt_tinfo:  | kind | tiny | vlen |
               ---------------------|
               15     11    10      0

typedef struct ctf_ttype
{
  uint32_t ctt_name;            /* Reference to name in string table.  */
  uint16_t ctt_info;            /* Encoded kind, variant length (see below).  */
  union
  {
    uint16_t ctt_size;          /* Size of entire type in bytes.  */
    uint16_t ctt_type;          /* Reference to another type.  */
  };
} ctf_ttype_t;

This is like the old ctf_stype_t except that one bit is taken from vlen in ctt_info to indicate that this type is encoded in a ttype. Only root-visible types with type kinds < 16 bits, and ctt_size/ctt_type that fit within 16 bits, can use this representation. The savings are significant for most dicts (since most frequently-referenced types have low values and we can easily sort things at dedup time to make this even more the case), and dropping alignment requirements allows further flexibility elsewhere. But saving most space with this trick requires a new format for structure/union members.

We can also do a more compact representation for structures and unions. These have a single word of variant data at the start, before the structure array:

               ---------------------------------
   str_info:   | short | shortstr | strtab_off |
               ---------------------------------
               31      30         29           0

The 'short' bit indicates whether all types and offsets used by this structure's members fit within 2^16, when we can use this revised ctf_member_t instead of a ctf_lmember_t:

typedef struct ctf_member
{
  uint16_t ctm_type;            /* Reference to type of member.  */
  uint16_t ctm_offset;          /* Offset of this member in bits.  */
  uint32_t ctm_name;            /* Reference to name in string table.  */
} ctf_member_t;

This saves four bytes per suitable member.

If the 'shortstr' bit is set, all structure member name strings lie within a 2^16 offset of the strtab_off value, allowing us to use this even smaller member structure for all members:

typedef struct ctf_smember
{
  uint16_t ctm_type;            /* Reference to type of member.  */
  uint16_t ctm_offset;          /* Offset of this member in bits.  */
  uint16_t ctm_name;            /* Reference to name in string table.  */
} ctf_smember_t;

(This structure is not naturally aligned, but we now have code changes in place to allow for non-aligned type arrays, so that's fine. We need it for ctf_ttype_t above regardless.)

The format above will usually only be possible when all structure members appear within the internal string table, but this is the common case anyway. Because the strtab is sorted, this will usually also only work if conventional C structure names with constant prefixes within a structure are present.

With all these optimizations simultaneously in force, structure members and generic type table entries drop to half their present size.

(A non-format-changing future enhancement to the strtab writer may attempt to keep strings that encode a single structure's members together to increase the chance that this optimization is possible, but it is possible this will hurt compression more than it gains in compactness. It can be done by grouping strings in the atoms table into 'generations', incrementing the generation counter every time a structure is started and more than 2^15 bytes of strings have been written out in this generation, and emitting strtab entries in N passes, one generation at a time.)

Drop CTF archives inside .ctf sections

(Reduction in format complexity, space saving, possible further improvements in compactness in future: priority: medium. Difficulty: medium-high because of the need to avoid libctf API breaks.)

Space saving due to elimination of redundant headers, duplicated strings in child dict CTF sections, and duplicated types in child dicts, which in this new scheme can copy structure from existing types in other child dicts.

New scheme (still under design) allowing the linker to cease producing CTF archives inside ELF objects when conflicting types are present: instead, labels gain the new semantics of imposing namespaces over the types they contain, and each TU gets a labelled region of type space of its own. With a bit of trickery we can make this transparent to users (except that they don't need to ctf_import() the parent container any more, but that's OK, that is compatibly signalled by not giving the child ctf_file_t's a parent label.)

Implementation maunderings, likely outdated, here for future rethinking

We still retain CTF archives and parent/child relationships, both because they might be useful for use cases like the kernel (with huge numbers of rarely-accessed sections), but in ELF objects by default we use labels as namespaces.

Data structure revisions for this are quite painful. Labelled-namespace CTF dicts appear like transparent ctf_archive_t's (with ctfi_archive set in the ctf_archive_internal), but we *also* return (and populate ctfi_dicts with) ctf_dict_namespace's. A struct ctf_namespace is the new internal name for what is returned to users as a ctf_dict_t: it contains a struct ctf_dict_ir_t pointer (ctf_dict_ir_t is what is now called struct ctf_dict), a refcount, and a current namespace string. Internally we use this to look up things in the right namespace in the ctf_dict_ir_t as needed.

The ctf_{structs,unions,enums,name} and ctf_lookups fields in the ctf_dict_ir_t turn into a ctf_dynhash_t of namespace labels pointing to a struct of the above; the default namespace is ".ctf", of course. This means we can add namespaces in any order we like. We also have a range table (i.e. a btree, in array form) letting us map from type ID to namespace label. We add namespaces via ctf_label_set() (which is currently declared but not defined), which sets the label assigned to new additions from now on: existing labels cannot be reused, which ensures the set of labels still tiles the type ID space. Labels gain a *parent label* (and ctf_label_set is augmented accordingly: not an API break since this function doesn't actually exist yet), so that in time we can have hierarchies more than two deep the label section contents is augmented to a (label, parent, type) triplet accordingly.

Linking using labels is easy enough -- we link exactly like we do now, then right after structure member emission we go over all child dicts in ASCIibetical order by name, set the current label in the parent to the child's cuname, remember the next type ID to be assigned in the parent at this point (call it the *base*), and emit their contents into the parent one by one: all types encountered that reference types in the child namespace are rewritten so that instead of being 0x80000000+N they are base+N. Then we drop the child dicts. Doing this post facto means we don't need to worry about the fact that we don't know the number of types in any child TU when we start type emission (and thus we don't know at what type ID later child TUs will start).

We do structure delta assignment (see CTF_K_DELTA below) right after this.

Some new type kinds

CTF_K_DELTA

This reference kind refers to an existing type (so far, structs, unions, or enums seem useful), but instead of a membership list it has a set of instructions that describes how to mutate that type ID's member list to yield a new type (they're so simple this hardly counts as a language, even a little one: it has no state or flow control and is just a list of mutations to apply). Like CTF_K_SLICE, it is not visible to the lookup API: only the results of its application are visible.

It seems sufficient to encode a delta in terms of a count of ctf_delta elements followed by an array of them:

struct ctf_struct_delta
{
   int op;
   union
     {
       size_t skip_count;
       struct ctf_member new_member;
     };
};

where the op is a bitfield with three defined bits, SKIP for skipping members in the other struct and/or ADD for adding new ones. new_member can be all-NULL/0 for "end early").

Something similar should suffice for enums.

Identifying what to do is easy: walk over all structs in turn, comparing their members, and drop structs as they turn out to be different from the one we're trying to add; the comparison process can "run ahead" to try to find later members with the same name as the current one, but will still terminate if it is more different than any other currently being compared. This should be nearly O(n) in the number of members since the drop-off rate should be extremely high and we'll soon be comparing only one or two other structs.

CTF_K_ALIAS

CTF_K_ALIAS is similar to CTF_K_DELTA but much simpler and works with all types, not just structs and unions: it just lets you say "this type is just like that one named in my ctt_type, look over there". In combination with the label namespace stuff above, this lets children get deduplicated against types in other children (later on: right now the deduplicator doesn't even try to do that, but this makes it possible in future).

Later possible string chunking work

(Possible compactness improvements. Priority: low. Probably format v5.)

Another roll-in from the strtab chunking work, but not fully implemented so may be done much, much later: if we arrange to move the external strtab bit into the LSB rather than the MSB on little-endian hosts, we can determine whether a string is in the external strtab by inspecting the first byte of all string offsets. (The endian-swapping code can flip this easily). We can then track the sizes of the external and internal strtrabs independently and use different offset sizes depending on which table this is a reference to! We can also reduce the average size of most offsets by constructing a mapping from offset to index at open time (treating the strtabs as arrays) and storing indexes instead: as this costs time, we indicate it with a flag in the header (or rather two, one per strtab) and only do it if it will reduce the offset size for this dictionary.