This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Request fore review of performance counter access proposal
- From: William Cohen <wcohen at redhat dot com>
- To: Carl Love <cel at us dot ibm dot com>
- Cc: systemtap at sources dot redhat dot com
- Date: Thu, 30 Mar 2006 10:31:26 -0500
- Subject: Re: Request fore review of performance counter access proposal
- References: <1143150385.5350.13.camel@dyn9047017113.beaverton.ibm.com>
Carl Love wrote:
I read over this proposal. If I understand it correctly, you are
talking about using the perfmon2 interface to interface with the
performance counters. So a prerequisite to implement this
functionality in systemtap is that the perfmon2 interface be
accepted into the kernel. Additionally, perfmon2 must have the
power support as well.
Yes, use an existing interface rather than have yet another device
driver touch the raw hardware. For x86 processors there are already
several drivers that are touching that hardware
(oprofile/perfmon/perfctr) and they don't coordinate in any way.
Perfmon2 does have some reservation logic in it to prevent different the
different modules from stepping on each other.
According to the perfmon2 patch from 3/22 there is some support for
power 5 implemented by David Gibson at IBM. However, I have not verified
that it works. I don't have a power5 machine handy.
The prototypes for the functions seem to be very general in terms of
being able to put any event into any counter. Also, it looked like
there was the assumption that you could start and stop counters
individual. Both of these assumptions are not valid on Power. Due
to the architecture of the Power performance counters there are a
lot of restrictions. There are three control registers for the 6
Power 5 or 8 Power 4 performance counters. Within the control
registers there are bit fields that specify the specific event.
There are also bits that define how the MUXs route the performance
signals to the counters. It is these MUXs that cause problems.
Given two events, you may only be able to put one event in counter A
because routing the signal conflicts with routing the signal for the
other event. Anyway, as a result of this, the hardware team has
defined a set of groups for use in the counters. Each group consist
of 6 Power 5 or 8 Power 4 events. The group is defined by the
mmcr0, mmcr1 and mmcra control register settings. These settings
define which event is in which counter. All of the counters are
started and stopped at the same time, i.e. there is a single control
bit to start and stop the counters. There is an exception on Power
5 where two of the counters are always counting and do not respond
to the start/stop bit. That is a complication that the perfmon2
interface must deal with.
The thought was to use libpfm to determine whether the configuration is
feasible. However, there doesn't appear to support in libpfm for powerpc
or pentium4, the processors that need that type of support the most.
Once the performance hardware is running the events would not be
changed. For the sampling the sampling could have a software bit to
determine whether a sample is taken or not. The constant free running of
the counter could present a problem for the intervals on ppc. The
documentation makes it appear that there are some event selectors that
effectively stop the counter from counting any more events. Is that the
expected behavior of the mechanism? If so, could that be used to
implement stopping a counter?
So, you will have a very hard time taking the defined groups for
Power and making them fit into these very general functions. As I
recall, there are some Intel processors (Itanium 64 bit processors)
where some of the events can only be programmed into a subset of the
counters. Again, these very general calls will be an issue for
these Intel events.
Carl Love
So far it looks like the power has the most constraints when programming
the performance monitoring hardware. The Itanium and Pentium 4 have
constraints in hardware, but OProfile simplified the model to yield a
set of independent registers. The events available for each register varies.
To some extent that this interface is going to handle the common
cases/needs of developers and that some particular performance
monitoring hardware on some processors will not be used.
-Will