CTFv4

The general plan here is to migrate to a superset of BTF, with a write-time option allowing libctf to emit BTF instead.

Doing this makes it easy to use the same representation internally for both BTF and CTFv4 (a representation targetted at CTFv4, lowered to BTF as needed); so libctf will be able to deduplicate BTF with other BTF, deduplicate BTF against CTF, etc.

This needs various different pieces of work done first. (All of which are now done.)

Prerequisites (implementation) (complete, April 2024)

Most of these are simplifications, ripping out changes that turned out to be ill-advised and are getting in the way now.

Move back to a model where types are read-only if in dicts that were ctf_opened, which also lets us move to having only one hashtab for such dicts.
Fix up ctf_string implementation to not use 'pending refs' any more, instead keeping already-written strtabs unchanged, reading them into the atoms table to allow dedup against them and sorting only newly-added strings into alphabetical order.
The two changes above allow ctf_serialize to shift to a scheme where it emits new dicts rather than modifying the existing dict, making it possible to emit BTF dicts without changing the existing dict to BTF (squashing out slices, etc).

CTFv4 work

Version bump (done on users/nalcock/road-to-ctfv4, July 2024)

Bump the version to v4, adjust the ctf.h structures (retaining existing ones as _v3 as needed), adjusted and an open-time converter written from v3 in ctf_open.c.

Share parent dict strtabs (done on users/nalcock/road-to-ctfv4, July 2024)

Change the definition of string offsets such that sttab offsets 0...N where N is the max seen in the parent dict relate to the parent dict, and later ones relate to the child: this is the BTF definition (much like the definition of types already is). Child dicts can use this to share their strings with other child dicts, deduplicating strings independently of types. Enormous savings are expected, since any duplicated structs will reuse a great many strings over and over again.

There is some difficulty here because of the implications for what you can now do with child dicts after opening them (not much!). At least one thing must be left un-deduplicated: the parent dict name (or how can you tell which dict to import to complete the strtab). We'll keep the cuname un-imported too, so that all the strings in the header can be initialized and read before ctf_import() (which is costless because by definition the cuname is going to be different in every child).

Share duplicated strings (done on users/nalcock/road-to-ctfv4, July 2024)

Right now, if two child dicts use the same string "foo", and that string is not used by the parent, the string is duplicated in each child. After the change above is done, this is needlessly inefficient. The deduplicator should track individual string usage and arrange to put duplicated strings into the parent, not the child, so they can be shared.

Not technically required by BTF, but so much easier than the previous bit that doing it at the same time is obviously important.

Consecutive type table entries

Modify the type table to run child type IDs consecutively from parents (with a new header field tracking their ID), as in BTF. (CTFv3 and below set a high bit in children.)

Lookup-side and addition-side changes

Adjusting the lookup- and addition-side stuff for BTF format changes, modifying the open compatibility functions at the same time.

Dumper

We do the dumper at this stage simply to make debugging practical. No attempt to make it look like the BTF dump format: this is just to keep ctf_dump() working.

Serialization

Adjust to serialize the new format properly, lowering to BTF if requested and taking advantage of the prereq serializer improvements. (libctf has way too many write functions already, so we probably want just a btf_write_mem to start with: the rest can be added as needed.)

Deduplication

Finally adjust the deduplicator, which should be trivial after all the rest (particularly given that strtab dedup is already done).

Possible future enhancements

(maybe in v5, may move out of this section): a couple of relevant simplifications which save a bunch of pointless complexity and move CTF a bit closer to BTF:

Make the variable section redundant (optional: not yet)

(Reduction in format complexity. Priority: moderate. Difficulty: moderate.)

This section is strictly unnecessary now that we have the data object section. Unifying it with the function info section as well (see below) means we gain the ability to reference static variables and static functions from e.g. the backtrace section without having to duplicate the work to do so.

It can be made redundant to the data object section given one format change, a new header field that gives an index to add to all indexes in the object index section. You can then have a single data object section that contains both indexes less than the boundary (in the symtab), and indexes above the boundary (covered by the object index). Sort the object index section into ascending order by name and migrate variable section entries into it, and you have supplanted the variable section. Do the same with the function info and its index and you have the same feature for functions.

(Reduction in format complexity. Priority: low. Difficulty: moderate.)

Symbols are symbols, no matter what kind they are. CTF only kept data and function symbols in two different sections because it was using two different formats for them. This is no longer true, so we can unify the two without changing API or losing functionality.

Not only do these sections have names it is downright impossible to keep straight, they are redundant: now we store all function prototypes in the type section as CTF_K_FUNCTION symbols, a function symbol is in all useful respects just a data symbol with a different type, and storing them in separate sections just makes everything more complex to no end.

Unify them into one overarching symtab section.

Now some more detail on the format changes needed (still provisional).

Format changes

Major differences between CTFv3 and BTF

forwards encoded differently (will just stick with the BTF encoding, it's nasty but survivable)
unknown kinds are type 0: CTF still supports this, though it's deprecated in favour of CTF_KIND_UNKNOWN, so it should be easy to carry forward
the vlen is shorter, the size is shorter: smaller limits on number of struct members, total size of types etc.
bitfields encoded differently
a bunch of useless/unused int/float formats gone
enum64 supported
decl and type tags supported (are these GCC attributes?)
in-type-section var/datasec kinds (we just want to reflect datasec in the API, CTF won't want it; var we can support but for CTFv4 we probably want to prefer the symtypetabs or variable section if we don't manage to rejig the symtypetabs to hold variables
function prototypes with names supported (BTF_KIND_FUNC_PROTO), separately from the actual type signature (BTF_KIND_FUNC)
Alan Maguire's layout_off is supported
no symtypetab (CTFv4 will retain one, but it might be replaced with something better if BTF grows an addrs section)
References from some extra sections (linker-relevant)

Major expected differences between BTF and CTFv4

Most of the differences below serve to extend BTF so that it can encode all types CTFv3 can encode, making reliable conversion from v3 to v4 possible. (Mostly this applies to enormous types that will never be seen in any sane kernel but can definitely appear in large userspace programs.)

New header fields

        /* New entries in CTFv4, not in BTF. */
        uint64_t ctf_magic;     /* magic number (48 bits), version number (16) */
        uint64_t ctf_flags;
        uint32_t cu_name;       /* strtab offset: random in parent */
        uint32_t cu_parent_name; /* cuname of parent */
        uint32_t cu_parent_ntypes; /* count of types in parent */
        uint32_t cu_parent_strtab_len; /* strtab_len in parent */
        uint32_t layout_off;    /* offset of layout section */
        uint32_t layout_len;
        uint32_t objt_off;
        uint32_t objt_len;
        uint32_t func_off;
        uint32_t func_len;
        uint32_t objtidx_off;
        uint32_t objtidx_len;
        uint32_t funcidx_off;
        uint32_t funcidx_len;
        uint32_t addrs_off;     /* offset of addrs section */
        uint32_t addrs_len;

The magic number is treated a little oddly for BTF compatibility. The start of the header has a BTF magic number, naturally, and a BTF-compatible hdr_len, since BTF uses that field as its version: but immediately following that is a 64-bit field containing 48 bits of CTF-specific magic number (not yet defined, I'll pick one at random), followed by 16 bits of version number (currently 4, for CTFv4, since we can break compatibility with the existing rather strange v3-and-below compatible version number at this point). The entire header lies between the range (hdr_len...type_off) so BTF-compatible clients never see it.

A CTF-specific flags word follows, probably denoting compression (zlib or zstd), sortedness of symtypetab sections, and any bug workarounds, as in v3 (though all v3 bug workaround flags can of course be dropped at this point).

If cu_parent_name is nonzero, this is a child dict. The nature of BTF type IDs means that we cannot usefully map indexes to type IDs until we know how many types the parent has: to make it possible to do it early, cu_parent_ntypes contains the number of types in the parent at the time the child was written out. (We have to change libctf to assign type IDs late enough that modifying the parent while the child is changing is permitted, because the deduplicator does this and frankly other users will expect this to work as well. This is a SMOP probably we'll represent type IDs in not-yet-written-out dicts differently, via a high bit on or something, and update them at the last minute before writeout, much as we already do with strtab offsets. Modifying the parent after the child is written out is a bad idea, but it's already a bad idea: write out all your related dicts at once! Caveat: this rule wastes memory: we might be able to adjust it so writeouts over time work, but I'm not sure how.)

Similarly, cu_parent_strtab_len lets us look up at least some strings before importing the parent: it also solves some problems with reserializing CTFv3 dicts (see below) and provides another check that we've imported the right dict and it hasn't changed since our dict was written.

In parents, cu_name is randomly assigned: this makes it posssible to detect importing of children to the wrong parents, and even automatically import in future.

New sections

layout_off is Alan's type kind description section (format to be decided). It is affected by prefix kinds: see below.

addrs_off is the replacement for the objtoff/funcoff/objtidxoff/ funcidxoff sections in CTFv3: probably a triplet of (address, strtab offset, type ID) mappings. Note: I assume here that the strtab offsets can retain the high-bit-set semantics of CTF, namely that something with the high bit set uses an "external string table", which for ELF is the dynstrtab. This is essential to avoid massive space wastage for userspace binaries. (The string chunking scheme, to be described later, will steal another high bit for chunked strings.)

It is possible that the addrs_off thing can get away with an (address, type ID) mapping if we can rely on always being able to look up the name of the symbol efficiently: I haven't thought about that. Regardless, sorting by address is clearly essential so it can be bsearched.

Alternatively, we might stick with the objt/func sections: we have to stick with them before symtabs are available anyway. If funcidx_len is greater than func_len, this means that the "extra space" on the end stores sorted (strtab, type ID) pairs (the old variable section). (Alternatively, we could keep the variable section -- that's starting to seem like a better idea to me, honestly: they do have a different structure, after all.)

All these new sections will probably be located before the type_off in the actual file: there is no requirement that these things are in ascending order, unlike in CTFv3.

btf_type changes

For reference, btf_type looks like this:

struct btf_type {
        __u32 name_off;
        /* "info" bits arrangement
         * bits  0-15: vlen (e.g. # of struct's members)
         * bits 16-23: unused
         * bits 24-28: kind (e.g. int, ptr, array...etc)
         * bits 29-30: unused
         * bit     31: kind_flag, currently used by
         *             struct, union, enum, fwd and enum64
         */
        __u32 info;
        /* "size" is used by INT, ENUM, STRUCT, UNION, DATASEC and ENUM64.
         * "size" tells the size of the type it is describing.
         *
         * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
         * FUNC, FUNC_PROTO, VAR, DECL_TAG and TYPE_TAG.
         * "type" is a type_id referring to another type.
         */
        union {
                __u32 size;
                __u32 type;
        };
};

(But see prefix kinds, below.)

There is one semantic extension to btf_type. Instead of CTFv3's isroot flag to indicate hidden types (that do not appear in the name table), we indicate one simply by setting its name to 0 (""). Obviously you can't look up nameless types in the name table! (If we need a way to tell which non-hidden type a hidden type is a variant of, we can add another new reference kind to tie the two together, CTF_KIND_ALIAS or something.)

New type kinds

These type kinds are exclusive to CTFv4 and do not exist in BTF: they all need lowering to BTF when BTF is emitted.

Prefix kinds

Most of CTFv4's extensions to BTF are opaque to the user: so they should not appear as a new type kind with a new type ID that points to some other type because they are not modelled in C as reference types ("a big type" is not the same as a "a small type pointed to by a big-qual"). So we'd like to model this sort of thing in some other way. We do this with *prefix kinds*, which are new type kinds whose vlen contains *another type*: so the prefix kind is a prefix to the other type, and modifies it in some way.

The following prefix kinds are defined:

CTF_KIND_BIG

The struct btf_header cannot, unfortunately, express all types CTF can: the vlen is too short, and so is the size.

CTF handles this with a big variant of ctf_type, but we can avoid introducing a new variant of btf_type by adding a new type kind, CTF_KIND_BIG. This has a real type in its vlen: its vlen provides a high 16 bits to the vlen of the prefixed type, giving 32 bits: the size provides a high 32 bits to its size, giving 64 bits. The other properties of the type come from the immediately appended real_type and its attached vlen.

These are definitely enough (larger than CTFv3 permitted), and the result leaves the real_type nicely aligned.

I expect CTF_KIND_BIG to be quite rare, but not that rare: I know of a number of userspace programs with really big machine-generated structures in them that would overflow a BTF vlen and size.

Things big enough to need this cannot be encoded by BTF anyway, so (as long as the layout section can figure out how to say "the child's vlen is determined by this in its parent") BTF readers could just skip this.

(Internally we do store everything using a bigger ctf_type since it's more convenient to work with: at emission time this is either lowered to a btf_type or converted to a ctf_big / type pair. If the latter is needed and we are emitting BTF, we can just fail emission.)

API implications: the API should probably return sizes as uint64_ts, hiding ctf_big entirely unless requested. (See API enhancements for more.)

CTF_KIND_CONFLICTING

This prefixed type indicates that the type it prefixes is found in the named translation unit: it will act as if the hidden bit was turned on, but a new ctf_type_cuname() function will return the CU given in the prefix (ctf_type_aname() etc return the name of the suffixed type, if any).

This is much bigger than CTFv3's one root-visible bit, but hidden types are rare, and this also lets us record all the info we could otherwise record in child dicts (if somewhat irregularly and a bit inconveniently for users).

Hidden (non-root-visible) types created via ctf_add_*()'s ridiculously- named 'flag' parameter now create CTF_KIND_CONFLICTING with no name: you can add a name later, after creation, via ctf_set_type_cuname(), and the deduplicator does so as needed.

This completes the prefixed type kinds in CTFv4: the other new type kinds are either new in BTF (not CTF), or not prefixed.

(Note: most conflicting types will not have CTF_KIND_CONFLICTING: this is specifically for conflicting types whose conflictedness cannot be represented in any other way, usually because they are already in child dicts lumped together from multiple translation units via the cu-mapping mechanism. Maybe CTF_KIND_CONFLICTING is a bad name: suggestions for a better name solicited. Alternatively we might have a flag for kernel builds which encodes *all* conflicting types this way, on the grounds that nearly all of them will by anyway because we already use child dicts to represent something else: kernel modules.)

CTF_KIND_UNKNOWN

We may be able to use BTF_KIND_UNKN for this, but I seem to recall that it has insane semantics. At any rate we need something since type ID 0 definitely cannot mean unknown in CTFv4: it is "void" as in BTF.

CTF_KIND_SLICE: identical to CTFv3

typedef struct ctf_slice
{
  uint32_t cts_type;
  unsigned short cts_offset;
  unsigned short cts_bits;
} ctf_slice_t;

One restriction is lifted: libctf no longer errors when cts_offset or cts_bits is > 255. Any width is allowed.

Kind-by-kind changes to existing kinds

If not mentioned, identical to BTF and with no information loss over CTF either.

BTF_KIND_INT (CTF_K_INTEGER)

Unchanged: the old CTFv3 CTF_INT_VARARGS is gone, but it was never generated or used by anything, so no loss.

BTF_KIND_STRUCT (CTF_K_STRUCT), BTF_KIND_UNION (CTF_K_UNION)

CTF prefers use of slices to encode bitfields but I suppose we have to at least be able to read both (we can probably get the deduplicator to always emit slices though).

Otherwise, no change, but see the BIG stuff above.

BTF_KIND_ENUM / BTF_KIND_ENUM64 (CTF_K_ENUM)

As in BTF, including the horrible way of encoding forwards: the API will disguise it. The deduplicator is enhanced to track enumerands and consider enumerators with conflicting enumerands to be conflicting. (This is, of course, not a format change, just a behavioural change over CTFv3.)

ENUM64 is only used if necessary to encode that enumerand.

TODO: neither the format nor the libctf API has a way to return the integral type of either enumerators or their enumerands. These can and will differ: the enumerator must be big enough to encode any enumerand, but the type of each enumerand is defined (by C) to be no more than is necessary to encode that particular enumerand. I doubt GCC encodes the latter, but we should at least be able to report the size of the enumerator and figure out and return the size of the enumerand in libctf.

BTF_KIND_FWD (CTF_K_FORWARD)

No change, but semantically can be used to indicate "conflicting types in child dict", as in CTF.

BTF_KIND_FUNC

Seems to be pretty similar to CTF_K_FUNCTION. CTF_K_FUNCTION adds padding if the number of args is odd: it shouldn't, BTF_KIND_FUNC doesn't, and CTFv4 shouldn't.

BTF_KIND_DECL_TAG, BTF_KIND_TYPE_TAG

I don't really know what these are, but if they are used to encode GCC attributes, CTF wants them too (with a suitable libctf API).

BTF_KIND_VAR, BTF_KIND_DATASEC

Very BTF-specific, presumably we just carry them over. Not sure what the libctf API should be creation/lookup of these yet but presumably there needs to be one.

Resolved difficulties

Upgrade complexities

The requirement to read older CTF dicts has an extra wrinkle: if you can open a dict you can write it out again, but we want to only retain code to write out CTFv4 dicts (auto-downgrading to BTF if possible, so most of the time we'll actually be writing out BTF: I'll probably have a new API function to force writing one or the other, largely for kernel use, where it matters that we use BTF in paticular).

But this runs into problems where ID-related properties of the old format vary from the new one. There are two such problems currently:

Format v1 had a parent/child type ID boundary nailed at 0x7fff versus v2 and v3's 0x7fffffff, but that boundary is recorded nowhere: so v1 is actually upgraded on writeout to a new format type, CTF_VERSION_1_UPGRADED_3 which is just like CTF_VERSION_3 except its parent/child boundary is different. For v4 this will not do. Since v1 dicts are very rare and reserializing dicts is almost always done only for parent dicts with no children, I propose to simply refuse to reserialize child v1 dicts. I bet nobody will ever notice and it means we can ignore this problem in future.
Formats v2 and v3 start child dict type IDs at 0x80000000. This can be trivially encoded in CTFv4 by setting cu_parent_ntypes to 0x7fffffff. Since this header field only exists in CTFv4, not BTF, child CTFv[23] dicts are always reserialized as CTFv4. (Normally, we complain at import time if cu_parent_ntypes differs from the actual number of types in the parent: we can add a special case for CTFv4 and this one specific value.)
The change in strtab representation means that strtab offsets in CTFv[23] will be invalid in v4. I'd like to avoid having to traverse and fix up every string in read-in v2/v3 dicts (we just got rid of code that did that). We can use the new cu_parent_strtab_len header field for that. Since BTF doesn't have one of those, all string lookups in child BTF dicts are prohibited until the parent is imported (and some work is put off until import time as a result).

Unresolved difficulties

Header flags

CTFv4 drops most of the header flags used by CTFv3, which were indicators of bug workarounds. But we still have a use for CTF_F_COMPRESS (needs a decompressor, possibly with a new one indicating *which* decompressor) and CTF_F_IDXSORTED (indicating that the symtypetab index sections are already sorted and there's no need to waste time resorting them: the compiler doesn't sort them, so we need to flag that we already did it). But we can't use the header flags field because BTF might reuse those flags for something else! So we need a new flags field.

But even with that flags field in place (easy enough), we have the problem that BTF appears not to flag that its data is compressed *at all* (its flags field appears entirely unused). Are we seriously meant to tell whether BTF is compressed by just blindly feeding it to various decompressors and seeing if any work?!