From the 'language features' brainstorm: # Statistical aggregates These can be implemented with BPF_MAP_TYPE_PERCPU storing elements of type struct stat_data. Multiple aggregates can be stored in the same array, one per aggregate. Then __stp_stat_add can be implemented in kernel-space as non-looping, non-locking eBPF code, while all other functions (reading stat value, histogram printing) can be implemented as userspace helpers that aggregate data from all CPUs. As far as I can tell, it is not strictly necessary to lock a statistical aggregate when reading its value -- the kernel-module backend does this to guarantee a time-consistent snapshot of the different CPU's values, whereas without locking the result might be approximate. # {TODO} More complex structures: arrays of statistical aggregates Still investigating whether we can do this. (0) There is no per-CPU version of BPF_HASH. (1) A BPF_MAP_TYPE_PERCPU would be a contiguously indexed, preallocated array of aggregates, so a BPF_MAP_TYPE_HASH would be needed to map from sparse keys to indices into the BPF_MAP_TYPE_PERCPU. However, without synchronization, there is no way to allocate slots in the BPF_MAP_TYPE_PERCPU. (2) /usr/include/linux/bpf.h mentions BPF_MAP_TYPE_HASH_OF_MAPS, but it's currently undocumented. Still need to read the code and investigate if it works for our purposes.
Correction to point (0) of the prior note: There is BPF_MAP_TYPE_PERCPU_HASH as of kernel 4.6.0. The code I'm currently working on will use this to represent the stat_data structures for an array of statistical aggregates. Currently deciding between: - option (a): one PERCPU_HASH per field. Not clear what to do with histogram. - option (b): multiplex -- use [array_key+field_id] as the key for the per-CPU hash. The histogram is [array_key+hist+bucket_id]. Such combined keys can be encoded in BPF code without any more difficulty than a strcpy(). Thus far option (b) is looking like the best, pending experimental confirmation. Such multiplexing could also be adapted to solve PR23478, although the stack might end up stretched to its limits.
Pushed some more commits to the branch. What's left before closing this PR: - handle delete operation for statistical aggregates - stress test for @variance calculation (higher values of N as shown in stat1.stp testcase) -- there is some divergence here which I suspect relates to lack of locking (related to PR22312?) - omit calculations for unused extractor functions
Previously on PR23476: > - option (a): one PERCPU_HASH per field. Not clear what to do with histogram. > - option (b): multiplex -- use [array_key+field_id] as the key for the per-CPU hash. The histogram is [array_key+hist+bucket_id]. Such combined keys can be encoded in BPF code without any more difficulty than a strcpy(). Worth noting that I went with option (a) for most fields but (for future PR24424 work) option (b) for histogram data.