Infinity Note Format
This document is OUT OF DATE and retained for historical value only.
See https://infinitynotes.org/wiki/Note_format for the current version.
Contents
Basics
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Please familiarize yourself with LEB128.
Note providers are executables or shared libraries containing Infinity notes.
Note consumers are tools that access Infinity notes from note providers.
This document specifies three reasons to reject notes. Note consumers SHOULD differentiate between these in error messages, etc.
Note Rejection Reasons
- A CORRUPT note is one that cannot be decoded (either fully or at all) because it does not follow the specification.
- An UNHANDLED note is one that requires a feature that the note consumer does not implement.
- An INVALID note is one that is both decodable by and within the capabilities of the note consumer, but it cannot be processed because it is in some other way unusable.
Outer Format
Infinity notes are embedded in into executables or shared libraries, so part of the note format depends on the format of the containing file. The only currently supported containing file is ELF. If more formats are supported they should be added here.
For ELF files
Each Infinity note is contained within an ELF PT_NOTE with a note name of "GNU\0" and a note type of NT_GNU_INFINITY. The contents of the desc field are as described in Inner Format below. Each ELF note with a note name of "GNU\0" and a note type of NT_GNU_INFINITY MUST contain exactly one Infinity note.
NT_GNU_INFINITY is currently defined as 5, though this may change before Infinity becomes final.
Some information on ELF notes may be found here: http://www.netbsd.org/docs/kernel/elf-notes.html
Inner Format
Each Infinity note is built up from chunks. The format of each chunk is as follows:
uleb128/ur type_id uleb128 version uleb128 size byte[size] data
- Note consumers MUST cope with chunks in any order.
- Note consumers MUST skip over chunks they don't understand or don't wish to process.
- Notes SHOULD NOT contain chunks of zero size. Note consumers MUST treat chunks of zero size as if they were not present.
- If a chunk is truncated then note consumers SHOULD reject the note as CORRUPT.
- If a note consumer needs to process a particular chunk but the chunk's version is not supported then the consumer SHOULD reject the note as UNHANDLED.
- If a note does not contain a chunk the consumer requires then the consumer SHOULD reject the note as UNHANDLED.
The type_id field has user ranges:
- Notes MUST NOT contain chunks with Type IDs in user ranges.
- Note consumers SHOULD reject notes with Type IDs in user ranges as INVALID.
Chunks
Info Chunks
Each note SHOULD have exactly one info chunk. Info chunks have a type_id of I8_CHUNK_INFO == 1. Info chunks are variable-length, so if the chunk ends before a field then the note does not have that field. Currently defined fields are as follows:
uleb128 provider_offset_in_stringtable uleb128 name_offset_in_stringtable uleb128 encoded_paramtypes_offset_in_stringtable uleb128 encoded_returntypes_offset_in_stringtable uleb128 max_stack
The four string table offsets reference strings in the note's string table and are used to construct the function's signature. Provider names starting with the string "i8" are reserved and are invalid here. max_stack is the maximum stack depth this function's code can generate. Info chunks SHOULD contain up to and including the max_stack field.
- Note consumers SHOULD reject notes without exactly one info chunk as UNHANDLED.
- Note consumers SHOULD reject notes as CORRUPT if an info chunk is present but truncated in the middle of a field.
Note consumers SHOULD process provider, name, encoded_paramtypes and encoded_returntypes as detailed in Function Signatures and Encoded Type Lists below.
- Note consumers MUST ignore any data in info chunks beyond the fields they understand.
- Note consumers SHOULD reject notes as UNHANDLED if the chunk does not extend to a field they require.
Note consumers SHOULD reject notes as INVALID if the provider specified by the info chunk starts with the string "i8".
Code Chunks
Each note SHOULD have exactly 0 or 1 code chunks. Code chunks have a type_id of I8_CHUNK_CODE == 2. The format of a code chunk is as follows:
2byte byte_order_mark byte[rest_of_chunk] bytecode
The byte order mark is the number 26936 (0x6938), encoded in the same byte order as non-LEB128 multi-byte values in the bytecode. The bytecode is a serialized DWARF expression as described in InfinityBytecode.
- Note consumers SHOULD reject notes with more than one code chunk as UNHANDLED.
- Note consumers SHOULD reject notes as INVALID if the code chunk contains a byte order mark other than 0x6938 or 0x3969.
- The byte order of both the byte order mark and all non-LEB128 multi-byte values in the bytecode MUST be the same.
If the note-containing file has a defined bytecode, the byte order of both the byte order mark and all non-LEB128 multi-byte values in the bytecode SHOULD be the same.
Note consumers MAY reject notes as UNHANDLED if the code chunk and containing file have different byte orders.
- Notes SHOULD NOT contain code chunks with bytecode of zero length.
Notes with no code chunk MUST be handled as if they have zero length bytecode. What?
Externals Table Chunks
Each note SHOULD have exactly 0 or 1 externals table chunks. Externals table chunks have a type_id of I8_CHUNK_ETAB == 3. An externals table contains one or more externals table entries concatenated together.
- Note consumers SHOULD reject notes with more than one externals table chunk as UNHANDLED.
- Note consumers SHOULD reject notes as CORRUPT if an externals table chunk is present but truncated in the middle of an entry.
- Note consumers SHOULD reject notes as UNHANDLED if an externals table contains an entry that is not of a known type.
Function Reference Externals Table Entries
The format of a function reference external is as follows:
byte 'f' uleb128 provider_offset_in_stringtable uleb128 name_offset_in_stringtable uleb128 encoded_paramtypes_offset_in_stringtable uleb128 encoded_returntypes_offset_in_stringtable
The four fields have the same meaning as the fields of the same name in the info chunk and SHOULD be processed in the same way with the exception that providers starting with "i8" are allowed here.
Unrelocated Address Externals Table Entries
The format of an unrelocated address external is as follows:
byte 'x' uleb128 address
The meaning of the address field is dependent on the format of the containing file.
For ELF files
The value stored in address MUST satisfy sh_addr <= address < sh_addr + sh_size for exactly one section of the containing ELF file. The value stored in address will be relocated in the same way as its containing section.
Note consumers SHOULD reject the notes as INVALID if address does not map to exactly one section of the containing ELF file.
String Table Chunks
Each note SHOULD have exactly one string table chunk. String table chunks have a type_id of I8_CHUNK_STAB == 4. A string table contains one or more NUL-terminated strings concatenated together. Strings are encoded in Modified UTF-8 format to allow embedded NULs. Fields in other chunks reference strings by their offset from the start of the string table. Note that any offset into the table that yields a NUL-terminated Modified UTF-8 string is permitted, so in an I8_CHUNK_STAB chunk whose data field contains:
example\0string\xC0\x80table\0
The obvious strings are "example" at offset 0 and "string\0table" at offset 8, but note that an offset of 15 will yield the string "table" and offsets of 7 or 20 will both yield the empty string.
- Note consumers SHOULD reject notes with more than one string table chunk as UNHANDLED.
- Note consumers SHOULD reject notes that reference strings but do not contain a string table chunk as UNHANDLED.
- String tables SHOULD end with a terminal NUL. Note consumers MAY reject notes containing string tables that do not end with a terminal NUL as CORRUPT.
- If a note contains a reference into the string beyond the final NUL in the string table then the note consumer SHOULD reject the note as CORRUPT.
Note that no current use of strings in Infinity allows characters outside of A-Za-z0-9()_ (i.e. nobody needs to write a Modified UTF-8 decoder just yet!)
Encoded Type Lists
Lists of types (e.g. parameter types, return types) are encoded as strings. The basic types int, ptr and opaque are encoded as "i", "p" and "o" respectively. Function types are encoded as:
"F" + encoded return types + "(" + encoded parameter types + ")"
Examples:
A function that accepts one ptr parameter and returns two int values has an encoded type of "Fii(p)".
A function that accepts two parameters, 1) a function with an opaque parameter followed by an int parameter that returns an int and a ptr, and 2) an opaque parameter, that returns two ptr values has an encoded type of "Fpp(Fip(oi)o)".
A function that accepts one int parameter and returns a function that accepts one ptr parameter and returns two int values has an encoded type of "FFii(p)(i)".
Processing:
Note consumers SHOULD reject notes containing encoded types lists containing characters other than ipoF() as UNHANDLED.
- Note consumers SHOULD reject notes containing otherwise undecodable encoded types lists as CORRUPT.
- Note consumers SHOULD NOT attempt to handle notes with invalid encoded types lists in any way other than trivial things like displaying information.
Function Signatures
Infinity functions are referenced by their signature. A function's signature is constructed from its provider, its name, and its encoded parameter and return types lists as follows:
provider + "::" + name + "(" + encoded_paramtypes + ")" + encoded_returntypes
A function called "a_function" with provider "example_provider" that accepts one ptr parameter and returns two int values has a signature of "example_provider::a_function(p)ii".
Note consumers SHOULD reject notes as UNHANDLED if either provider or name are the empty string.
Note consumers SHOULD reject notes as UNHANDLED if the first characters of either provider or name are not in the range A-Za-z_.
Note consumers SHOULD reject notes as UNHANDLED if any subsequent characters of either provider ore name are not in the range A-Za-z0-9_.
Note consumers SHOULD NOT attempt to handle notes with invalid values of provider or name in any way other than trivial things like displaying information.
Note consumers SHOULD process both encoded_paramtypes and encoded_returntypes as detailed in Encoded Type Lists above.