Bug 29801

Summary: --monitor mode gets 2 hits to procfs("monitor_control").write for every 1 write
Product: systemtap Reporter: Ryan Goldberg <ryan.s.goldberg>
Component: translatorAssignee: Unassigned <systemtap>
Severity: normal CC: fche
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Ryan Goldberg 2022-11-17 19:19:56 UTC
When running monitor mode it looks like for every one write_command call there are 2 hits to the procfs("monitor_control").write probe. This isn't really an issue for things like resume and pause, but it causing an issue with quit, causing the module to lockup

So it turns out that elaborate.cxx::monitor_mode_write derives the procfs write probe and has it join(and init) the procfs_derived_probe_group. Which enrolls the probe (adding it to the write_probes set)

Then in semantic_pass_optimize1 the probe is added to the group (and enrolled) again which causes the second copy of the probe to be added to write_probes (leading to the above issue)

So to fix this I can either just remove the join_group in monitor_mode_write (or add some join_group internal conditional to avoid repeats) or I could do a non-null check on groups before joining them in semantic_pass_optimize1 (but this could result in new issues with probes which may rely on this underlying behavior but also would lead to the widest covering solution)

Is their a preference for which fix I go with?
Comment 1 Frank Ch. Eigler 2022-11-17 19:38:55 UTC
Nice analysis.

I suspect the problem originates from the setup_timeout() code that implements the stap -T NNN function.  Its join_group etc. stuff was probably buggy, and was copy-pasted into the monitor-related synthetic probe stuff.

I think the right fix there is to drop the dp->join_group(s); in each of those clones, because synthetic derived_probe objects are joined at the other area you found.
Comment 2 Ryan Goldberg 2022-11-18 15:58:47 UTC
Closed with commit 92e8ecb3329820992756bbbd3decd3d2ef4f490a