Bug 25218 - stapbpf loses perf_events when writing to interactive terminal
Summary: stapbpf loses perf_events when writing to interactive terminal
Status: NEW
Alias: None
Product: systemtap
Classification: Unclassified
Component: bpf (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-22 23:13 UTC by Sagar Patel
Modified: 2019-11-25 18:22 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sagar Patel 2019-11-22 23:13:25 UTC
When there is frequent output from a script, it appears that stapbpf loses some perf_events which leads to errors. Consider the script below.

global count

probe kernel.function("vfs_read") {

    count++

    printf("Beginning: %d\n", count)
    printf("Middle: %d\n", count)
    printf("End: %d\n", count)

}

Output:

...
...
Beginning: 13700
Middle: 13700
End: 13700
Beginning: 13701
Middle: 13701
End: 13701
WARNING: lost 359 perf_events on cpu 2
bpfinterp.cxx:421: printf already started
WARNING: /home/sapatel/stap_head/install/bin/stapbpf exited with signal: 6 (Aborted)

However, if the output is redirected to a file instead, the perf_events are not lost, and the script proceeds as expected.
Comment 1 Serhei Makarov 2019-11-22 23:46:34 UTC
My suspicion is that the perf_events buffer used to receive transport messages fills up while the stapbpf process is delayed writing output. This results in transport messages being overwritten and the 'lost perf_events' warning being generated.

When stapbpf output is redirected to a file, there is no I/O delay, and perf_events are consumed before the buffer overflows.

If my suspicion is true, this could be solved by receiving transport messages in a separate thread.
Comment 2 Serhei Makarov 2019-11-25 18:22:46 UTC
One thing that I should definitely fix, aside from tackling the speed disparity in console/file output: when perf_events are lost, unpaired PRINTF_{START,END} messages should not cause a fatal error for stapbpf.

That is, a sequence such as

PRINTF_START PRINTF_END <lost events> PRINTF_END

should cause the PRINTF_END to be dropped along with the lost events.
Currently it causes stapbpf to terminate with an error message.