This is the mail archive of the sid@sources.redhat.com mailing list for the SID project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Profiling: --insn-count=1


Hi -

On Thu, Aug 01, 2002 at 05:03:55AM -0700, Scott Dattalo wrote:
> [...]
> Yeah, I suspected as much... If I had time to look into it, I'd try to add
> that feature. The way I'd approach it is I'd partition the time it takes
> an instruction to execute into two parts: the fixed amount of time the CPU
> requires and the (possibly) variable amount that the memory accesses
> require.  

> The fixed portion may be ascertained when the target program is
> first loaded. 

This computation is hard, totally target-dependent.  See for example
the amount of work needed in gcc to model a CPU pipeline in detail
(especially the more-precise DFA models).


> The variable portion may too, depending on the address
> accessed.  [...]

SID already does this part.  You can configure memory modules, mappers,
caches, and a few other bits as having latency counts associated with
operations.  The CPU accumulates these as penalties, combines them with
a raw instruction count, and tells the target-time scheduler the sum.
So simulated target time already includes the effect of these parameters.


> [...]
> The development scenario has thus been:
> 
> 1) make optimizations to the code and test on a Linux box
> 2) debug and go back to step 1 about 100 hundred times or so.
> 3) Once convinced that an optimization has been correctly made
>    re-target the makefile for an ARM processor
> 4) simulate the code (using sid as the simulator engine, of course)
> 5) analyze the simulation results

(You may also opt to have both linux & arm builds go in parallel, and
cross-check results for consistency.)


> So far, I've been satisfied knowing the total number of executed
> instructions. Objective results are easily quantified.  However, I'm now
> rapidly approaching the point where the optimizations have been completed. 
> While I know the approximate number of instructions, I still do not know 
> the total number of CPU cycles (and hence the total time).
> [...]

To get the most precise answer, you'd best use hardware running a
profiling-capable OS.  If accounting for approximate memory latencies
is good enough, then SID can be of help.


- FChE

Attachment: msg00009/pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]