This is the mail archive of the sid@sources.redhat.com mailing list for the SID project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Profiling: --insn-count=1

From: "Frank Ch. Eigler" <fche at redhat dot com>
To: Scott Dattalo <scott at dattalo dot com>
Cc: sid at sources dot redhat dot com
Date: Thu, 1 Aug 2002 11:32:03 -0400
Subject: Re: Profiling: --insn-count=1
References: <20020801071823.A7358@redhat.com> <Pine.LNX.4.44.0208010434180.18325-100000@ruckus.brouhaha.com>

Hi -

On Thu, Aug 01, 2002 at 05:03:55AM -0700, Scott Dattalo wrote:
> [...]
> Yeah, I suspected as much... If I had time to look into it, I'd try to add
> that feature. The way I'd approach it is I'd partition the time it takes
> an instruction to execute into two parts: the fixed amount of time the CPU
> requires and the (possibly) variable amount that the memory accesses
> require.  

> The fixed portion may be ascertained when the target program is
> first loaded. 

This computation is hard, totally target-dependent.  See for example
the amount of work needed in gcc to model a CPU pipeline in detail
(especially the more-precise DFA models).

> The variable portion may too, depending on the address
> accessed.  [...]

SID already does this part.  You can configure memory modules, mappers,
caches, and a few other bits as having latency counts associated with
operations.  The CPU accumulates these as penalties, combines them with
a raw instruction count, and tells the target-time scheduler the sum.
So simulated target time already includes the effect of these parameters.

> [...]
> The development scenario has thus been:
> 
> 1) make optimizations to the code and test on a Linux box
> 2) debug and go back to step 1 about 100 hundred times or so.
> 3) Once convinced that an optimization has been correctly made
>    re-target the makefile for an ARM processor
> 4) simulate the code (using sid as the simulator engine, of course)
> 5) analyze the simulation results

(You may also opt to have both linux & arm builds go in parallel, and
cross-check results for consistency.)

> So far, I've been satisfied knowing the total number of executed
> instructions. Objective results are easily quantified.  However, I'm now
> rapidly approaching the point where the optimizations have been completed. 
> While I know the approximate number of instructions, I still do not know 
> the total number of CPU cycles (and hence the total time).
> [...]

To get the most precise answer, you'd best use hardware running a
profiling-capable OS.  If accounting for approximate memory latencies
is good enough, then SID can be of help.

- FChE

Attachment: msg00009/pgp00000.pgp
Description: PGP signature

Follow-Ups:
- Re: Profiling: --insn-count=1
  - From: Scott Dattalo

References:
- Re: Profiling: --insn-count=1
  - From: Frank Ch. Eigler
- Re: Profiling: --insn-count=1
  - From: Scott Dattalo

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]