Optimizing scripts with skipped probes due to global variable locks

One common reason for skipped probes is contention amongst probes for script-level global variables. When a systemtap probe handler starts, it tries to acquire shared or exclusive locks on all the script-level globals it uses. If any of the variables are then locked by another probe handler running on another CPU, this probe handler will spin for a while. (The length of this time is set by the MAXTRYLOCK and TRYLOCKDELAY compile parameters.) If the spin times out, the probe will exit and mark itself as skipped. Skip enough (MAXSKIPPED), and the script as a whole will exit.

This can occur if data acquisition probes write into ordinary global arrays very frequently, and if periodically a long-winded reporting function reads those arrays. Or a busy multiprocessor may have many acquisition probes running at once. What can do in these cases?

One technique is to shorten handlers. Do less work on average, especially costly stuff like backtracing, so that the handlers finish faster and release their locks earlier.

Another technique is to minimize the number of globals. Consider this piece of an older version of "nettop.stp":

global ifxmit, /* */ ifdevs, ifpid, execname, user

probe netdev.transmit
        p = pid()
        execname[p] = execname()
        user[p] = uid()
        ifdevs[p, dev_name] = dev_name
        ifxmit[p, dev_name] <<< length
        ifpid[p, dev_name] ++

All the arrays except for ifxmit are unhelpful here. execname, user, and ifpid were simply meant to record some auxiliary data about processes, so that more than just a pid number will be printed during the reporting function. However, this requires a lot of extra memory (several extra arrays), each with its own exclusive lock. Note too that we're overwriting the same elements over and over.

What do instead? Abracadabra:

global ifxmit

probe netdev.transmit
        ifxmit[pid(), dev_name, execname(), uid()] <<< length

Now we have just one array, and since it's a statistics one (written-to with "<<<"), it'll only require a shared lock while this probe is running! The execname etc. data has become a set of additional index columns. That works just fine, since we only wanted the values for the same rows (process-id/dev-name) anyway.

Then there is the reporting function. If one needs to iterate over this statistic array, it will take time, and it will take an exclusive lock. That will lock out the data-gathering probes.

TBC ... once conditional probes are fully implemented, we'll be able to express "data gathering" and "report generating" phases, which would avoid contention by explicitly alternating between being enabled and disabled.


None: TipSkippedProbesOptimization (last edited 2009-06-13 22:03:52 by FChE)