23476 – bpf statistical aggregates

Bug 23476 - bpf statistical aggregates

Summary: bpf statistical aggregates

Status:	NEW

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	bpf (show other bugs)
Version:	unspecified

Importance:	P1 normal
Target Milestone:	---
Assignee:	Serhei Makarov

URL:
Keywords:

Depends on:
Blocks:	22329 24424
	Show dependency tree / graph

Reported:	2018-08-01 18:50 UTC by Serhei Makarov
Modified:	2019-06-26 19:43 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:	2019-06-26 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Serhei Makarov 2018-08-01 18:50:19 UTC

From the 'language features' brainstorm:

# Statistical aggregates

These can be implemented with BPF_MAP_TYPE_PERCPU storing elements of type struct stat_data. Multiple aggregates can be stored in the same array, one per aggregate.

Then __stp_stat_add can be implemented in kernel-space as non-looping, non-locking eBPF code, while all other functions (reading stat value, histogram printing) can be implemented as userspace helpers that aggregate data from all CPUs.

As far as I can tell, it is not strictly necessary to lock a statistical aggregate when reading its value -- the kernel-module backend does this to guarantee a time-consistent snapshot of the different CPU's values, whereas without locking the result might be approximate.

# {TODO} More complex structures: arrays of statistical aggregates

Still investigating whether we can do this.

(0) There is no per-CPU version of BPF_HASH.

(1) A BPF_MAP_TYPE_PERCPU would be a contiguously indexed, preallocated array of aggregates, so a BPF_MAP_TYPE_HASH would be needed to map from sparse keys to indices into the BPF_MAP_TYPE_PERCPU. However, without synchronization, there is no way to allocate slots in the BPF_MAP_TYPE_PERCPU.

(2) /usr/include/linux/bpf.h mentions BPF_MAP_TYPE_HASH_OF_MAPS, but it's currently undocumented. Still need to read the code and investigate if it works for our purposes.

Comment 1 Serhei Makarov 2019-03-26 21:07:00 UTC

Correction to point (0) of the prior note:

There is BPF_MAP_TYPE_PERCPU_HASH as of kernel 4.6.0. The code I'm currently working on will use this to represent the stat_data structures for an array of statistical aggregates. Currently deciding between:

- option (a): one PERCPU_HASH per field. Not clear what to do with histogram.
- option (b): multiplex -- use [array_key+field_id] as the key for the per-CPU hash. The histogram is [array_key+hist+bucket_id]. Such combined keys can be encoded in BPF code without any more difficulty than a strcpy().

Thus far option (b) is looking like the best, pending experimental confirmation. Such multiplexing could also be adapted to solve PR23478, although the stack might end up stretched to its limits.

Comment 2 Serhei Makarov 2019-06-26 19:38:53 UTC

Pushed some more commits to the branch.

What's left before closing this PR:
- handle delete operation for statistical aggregates
- stress test for @variance calculation (higher values of N as shown in stat1.stp testcase) -- there is some divergence here which I suspect relates to lack of locking (related to PR22312?)
- omit calculations for unused extractor functions

Comment 3 Serhei Makarov 2019-06-26 19:42:37 UTC

Previously on PR23476:
> - option (a): one PERCPU_HASH per field. Not clear what to do with histogram.
> - option (b): multiplex -- use [array_key+field_id] as the key for the per-CPU hash. The histogram is [array_key+hist+bucket_id]. Such combined keys can be encoded in BPF code without any more difficulty than a strcpy().

Worth noting that I went with option (a) for most fields but (for future PR24424 work) option (b) for histogram data.