DRAFT: Full Featured Printf Hooks Design
Scope
The intention of this page is to serve as a starting point for identifying the scope of the printf-hooks extension design.
Useful Definitions
A format string is a combination of valid format specifier characters. In total it is a directive which tells printf how to display an argument in an argument list as a string, e.g.
"%0.16llx" - the argument is a long long int that should be output in hexadecimal format with lower case letters. It should be zero padded and consume 16 columns. It should be left justified.
A format specifier is a generic term for one or more characters which make up an operable grouping in a printf format string, e.g.
'DD', and 'e' separately in "%DDe".
A conversion specification is a format specifier which identifies how to convert a data-type into a string. An overridden conversion specification may or may not be tied to an overridden length modifier.
A length modifier is a format specifier which identifies the data-type of an argument. A length modifier that is overridden is practically useless without an accompanying overridden convsersion specification.
A flag character is a format specifier which effects the output of the string in modifier ways, e.g. it may effect justification, padding, minimum column, etc.
Desires
Support overridden length modifiers.
- Support single or multibyte format specifier, e.g. %H, %DD, %llv
- Override arginfo functions marks flags indicating the argument data-type.
- Consumes zero through n arguments.
I'm not sure what zero would indicate.
Support overridden conversion specifications.
Support single or multibyte format specifier, e.g. 'e' in %DDe, (no example for multi-byte).
- Override arginfo function reads flags to detect operable data-type.
- Override either operates on a data type or doesn't as indicated by arginfo function callback return value '0' or '1'.
- Perform va_arg peeling.
- Invoke override_fn for matching length-modifiers and conversion-specifications.
Support other overridden flag characters.
- DFP, VSX, VMX (Altivec), and AVX don't require these as far as I can tell so they're a lower priority.
Design Preclusions
There are a number of preclusions which dictate the direction of the design. They are either definite or questionable. Questionable design preclusions should be finalized before this design document leaves DRAFT.
Definite
- Performance of the fast-path shall not be impacted by the hooks override.
- Positional arguments indicate positional-path, i.e. non-fast-path branch.
- Overrides take positional-path and must account for positional arguments.
- Format identifier overrides may consume zero or more arguments.
zero: conversion-specification will not operate and default will take over. one: Most likely case for most data-type specifications, e.g. DFP, Altivec. n: I'm not sure of the origin of this ability.
- Allocations for overrides should only happen if an override is registered.
- Tests for overrides should have branch prediction:
use __builtin_expect(<test>,0)
- Introduction of new data-types and unknown ABI issues which prevent type-punning require the registration of a user va_arg function callback for a conversion specification override, e.g.
- The PowerPC ABI indicates that _Decimal128 data-types be stored in even-odd register pairs, e.g. f2-f3, f4-f5, f6-f7.
- IBM long double 128, a congruently sized data-type, does not have such a register requirement.
long double userarg = va_arg(*ap,long double); where list contains a _Decimal128 stored in f2-f3 may result in f1-f2 being stored into userarg erroneously.
- Data type size congruence is not consistent and therefore type punning may not work. For instance, When long double is double:
sizeof(_Decimal128) == 16 sizeof(long double) == 8
- Overridden conversion specifications should still work for all data-types in the fast-path if the overridden conversion specification doesn't detect an overridden length-modifier ,e.g.
register: length-modifier "DD" : marks flag as DECIMAL128 register: conversion-specification "e" : looks for flag DECIMAL128 printf("%DDe\n",data) - operate on "data" as a _Decimal128 : processing "DD" set flag DECIMAL128. Processing 'e' found flag DECIMAL128. printf("%e\n",data) - operate on "data" as a "double" since flag does not indicate DECIMAL128.
Questionable
- Structure definitions should not change if possible.
- Format specifiers should only be ASCII range 32 to 127 (all inclusive). This would imply retrograde disabling of existing wchar_t 'spec' character in 'struct printf_info'.
Preconditions
- An arginfo function may indicate that a data-type consume zero or more arguments. Zero arguments consumed indicates that the override has chosen to not operate on an argument for any number of reasons.
- Length modifiers do not cause action by the printf internals, they are simply a way to mark an argument and allocate space for a va_arg peeling.
- Conversion specifications indicate how a data-type is to be converted into a string. It causes the actual action by the printf internals. An accompanying arginfo_fn will look at the identification flags marked for an argument which identify a data-type. A va_arg function will actually peel said data-type off of an argument list and store it into storage indicated by the printf internals. Finally the print internals will call the override_fn that was registered along with the conversion specification in order to convert the data-type to a string in the appropriate manner.
- A sizeof parameter should be passed when registering a length modifier so that the printf internals know how much space to allocate for each consumed argument.
What kind of bounds checking should this perform, i.e. max size?
Introduction of a user member to struct printf_info will require the following:
Since struct printf_info is passed in const the length-modifier arginfo_fn override sets length flags into __argstype in-out argument of arginfo_fn.
The printf internals will copy the length flags to the struct printf_info::user member.
When an override_fn is invoked for a conversion-specification it can read the flags out of struct printf_info::user to determine what data-type it is operating on.
- You must be able to have multiple registrations to the override functions. The reason being that you may want your runtime to support both VMX and VSX data-types.
- The overridden conversion-specifications should not get in the way of the default operability, e.g. the following should work just fine:
double d = 1.234; _Decimal128 d128 = 3.45DL; printf("%e\n",d); printf("%DDe\n",d128);
Interface
printf.h
struct printf_info { int prec; /* Precision. */ int width; /* Width. */ wchar_t spec; /* Format letter. */ unsigned int is_long_double:1;/* L flag. */ unsigned int is_short:1; /* h flag. */ unsigned int is_long:1; /* l flag. */ unsigned int alt:1; /* # flag. */ unsigned int space:1; /* Space flag. */ unsigned int left:1; /* - flag. */ unsigned int showsign:1; /* + flag. */ unsigned int group:1; /* ' flag. */ unsigned int extra:1; /* For special use. */ unsigned int is_char:1; /* hh flag. */ unsigned int wide:1; /* Nonzero for wide character streams. */ unsigned int i18n:1; /* I flag. */ wchar_t pad; /* Padding character. */ unsigned int user; /* 'flag character' or 'length modifier' override flags. */ }; struct printf_overrides { /* flag-character: Unknown. */ /* length-modifier: Used for setting user data-type flags. */ /* conversion-specification: Used for checking for user data-type flags. */ printf_arginfo_function *arginfo_fn; /* flag-character: Unknown. */ /* length-modifier: sizeof(data-type). Indicates how much space will be allocated * prior to a va_arg call-back invocation from a companion conv spec. */ /* conversion-specification: Un-used. */ size_t size; /* flag-character: Unknown. */ /* length-modifier: Un-used. */ /* conversion-specification: Used to peel user data-type from argument list. */ printf_va_arg_function *va_arg_fn; /* flag-character: Unknown. */ /* length-modifier: Un-used. */ /* conversion-specification: Invoked to convert user data-type to string. */ printf_function *override_fn; }; /* List of supported format overrides. */ enum { PF_NONE, /* Don't use. */ PF_LENGTH_MODIFIER, PF_CONVERSION_SPECIFIER, PF_FLAG_CHARACTER, PF_LAST /* Don't use. This is a place holder. */ }; /* Flag bits that can be set by a 'flag character' or 'length modifier' override. * Corresponding bits are set into the arginfo function's __argstype parameter. * and are copied into the `struct printf_info::user' member after a valid override * is detected. */ #define PA_USER_MASK 0xffff0000 /* SPEC_CHARS: string of characters denoting the 'format specifier' that is to be overriden. * NCHARS: the number of characters in the 'format specifier'. * TYPE: the type of 'format specifier' as indicated by the enums enumerated above. * PFO: a table of override data (which may or may not be applicable to a * particular 'format specifier') and data-type size (if applicable). */ extern int register_printf_override (int *spec_chars, int nchars, int type, struct printf_overrides *pfo);
Issues and Questions
- Will we preserve the existing printf hooks registration method?
Would changing the definition of struct printf_info by extending it with a flags 'word' and changing spec from wchar_t to int require a struct versioning interface for the registration functions?
Due to the __const label on struct printf_info}} when calling the length-modifier arginfo_fn the user can-not set the {{{struct printf_info::flags member directly, but must set the length modifier flags into int * __argtypes and the printf_parsemb internals must copy the user flags to the struct printf_info::flags member. Since the user flags must lie in the mask 0xffff0000 do we want & them directly into struct printf_info::flags or do we want to shift them 16 right?
By only allowing user flags in 0xffff0000 we limit the number of user flags to 16. This is probably not adequate. Perhaps the addition of an "int *__user" parameter to the arginfo function would work.