task_finder_vma: rewrite using RCU to fix performance issues
The use of a single global rwlock to protect this file's hash table
results in significantly degraded performance when there are many
processes using the vma tracker in flight. A lot of time is spent
spinning on the rwlock when this happens. For exmaple, it is using
most of the CPU time in the following kernel-space CPU flame graph:
There are other code paths which would invoke the same spinlock, as in
_stp_umodule_relocate().
To remedy this, make the hash table RCU safe so we'll never block upon
reading a hash list.
We now use the hash_ptr() function to generate the hashes, and the task
pointers themselves are hashed now instead of their PID for reliability,
since PIDs are not a stable anchor point to a task struct.
While we're at it, clean up the rest of this file to bring it up to
current Linux kernel coding standards as well.
This leads to dramatic CPU time reduction when
1. the current system has a lot of running processes, or
2. some processes have a lot of DSO dependencies, and
3. also -x PID is not used for stap or staprun, and
4. there are quite a few CPU cores.
For a typical test run, we have the following CPU utilization changes: