This is the mail archive of the sid@sourceware.org mailing list for the SID project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch][rfa] --gprof Performance Improvement


Hi,

I'd like to commit the attached patch which improves the performance of the --gprof option by an order of magnitude. One benchmark I tried with

sid --board=<board> --final-insn-count --gprof=<file>,cycles=1

improved from 6963 seconds to 640 seconds. This is on top of the improvement I obtained with my previous patch. The culprit was the sw-profile-gprof component's use of an attribute interface to obtain the pc from each sample. It turns out that this alone completely dominates all other aspects of the simulation.

The patch does two things:

1) Make use of a local reference whenever 'this->stats[current_stats]' is used more than once in a method of gprof_component. 'this->stats' is a vector and this change gave me about a 3% improvement

2) Use a pin interface to provide the pc for each sample to the gprof component. This was the big win.

Rather than have the gprof component obtain and parse the value of an attribute of the cpu for each sample, instead the cpu now drives the pc value on two pins (to handle 64 bit pc's) before driving its sample-gprof pin.

Since this represents an interface change, I didn't want to commit it with some review/approval, however, as far as I can tell, the gprof interface is not used by any existing port.

Comments? OK to commit?

Dave
2006-06-26  Dave Brolley  <brolley@redhat.com>

	* commonCfg.cxx (GprofCfg): Connect the cpu's gprof-pc and gprof-pc-hi
	pins to out pc and p-hi pins respectively.

2006-06-26  Dave Brolley  <brolley@redhat.com>

	* sidcpuutil.h (gprof_pc_pin,gprof_pc_hi_pin): New members of
	basic_cpu.
	(sample_gprof): Drive gprof_pc_pin and gprof_pc_hi_pin.
	(get_pc,get_pc_hi): New methods od basic_cpu.
	(read_watchpoint_memory): Use get_pc.
	(basic_cpu): Add gprof-pc and gprof-pc-hi pins.

2006-06-26  Dave Brolley  <brolley@redhat.com>

	* gprof.cxx (target_attribute): Removed from gprof_component.
	(pc_pin,pc_hi_pin): Added to gprof_component.
	(bucket_size_set): Use local reference for this->stats[current_stats].
	(accumulate_call): Likewise.
	(accumulate): Likewise. Use pc_pin and pc_hi_pin instead of
	target_attribute to get the pc.
	(gprof_component): Add pc and pc-hi pins. Don't add value-attribute
	attribute. Initialize the driven value of pc_hi_pin with 0.

2006-06-26  Dave Brolley  <brolley@redhat.com>

	* xstormy16.h (get_pc): New member of xstormy16_cpu.

2006-06-26  Dave Brolley  <brolley@redhat.com>

	* mt.h (get_pc): New member of mt_cpu.

2006-06-26  Dave Brolley  <brolley@redhat.com>

	* m32rbf.h (get_pc): New member of m32rbf_cpu.

2006-06-26  Dave Brolley  <brolley@redhat.com>

	* arm7f.h (get_pc): New member of arm7f_cpu.
	* hw-cpu-arm7t.txt: Regenerated.

2006-06-26  Dave Brolley  <brolley@redhat.com>

	* common-xml/behavior.xml (profiling): New behavior documented.
	* common-xml/interface.xml (pins): Document gprof-pc, gprof-pc-hi
	and sample-gprof pins.


Index: sid/component/cgen-cpu/arm7t/arm7f.h
===================================================================
RCS file: /cvs/src/src/sid/component/cgen-cpu/arm7t/arm7f.h,v
retrieving revision 1.4
diff -c -p -r1.4 arm7f.h
*** sid/component/cgen-cpu/arm7t/arm7f.h	4 Aug 2004 15:42:00 -0000	1.4
--- sid/component/cgen-cpu/arm7t/arm7f.h	26 Jun 2006 19:26:09 -0000
*************** private:
*** 130,135 ****
--- 130,141 ----
        this->h_pc_set ((PCADDR) v);
      }
  
+   host_int_4
+   get_pc ()
+     {
+       return this->h_pc_get ();
+     }
+ 
    // debug support routines
    string dbg_get_reg (host_int_4 n);
    component::status dbg_set_reg (host_int_4 n, const string& s);
Index: sid/component/cgen-cpu/common-xml/behavior.xml
===================================================================
RCS file: /cvs/src/src/sid/component/cgen-cpu/common-xml/behavior.xml,v
retrieving revision 1.5
diff -c -p -r1.5 behavior.xml
*** sid/component/cgen-cpu/common-xml/behavior.xml	22 Mar 2004 21:38:38 -0000	1.5
--- sid/component/cgen-cpu/common-xml/behavior.xml	26 Jun 2006 19:26:09 -0000
***************
*** 79,84 ****
--- 79,98 ----
        stream.</p>
      </behavior>
  
+     <behavior name="profiling">
+ 
+       <p>The component can be configured to provide samples for use by the
+       <component>sw-profile-gprof</component> component in creating
+       a gmon output file. A sample is provided each time the
+       <pin>sample-gprof</pin> is driven. Each time, before the
+       <pin>sample-gprof</pin> is driven, the <pin>gprof-pc</pin> and
+       <pin>gprof-pc-hi</pin> pins will be driven in order to provide the
+       current program counter. <pin>gprof-pc</pin> represents the low order
+       32 bits of the pc and <pin>gprof-pc-hi</pin> represents the high order
+       32 bits of the pc (if any).
+       </p>
+     </behavior>
+ 
      <behavior name="exceptions/traps">
  
        <p>When encountering exception/trap conditions such as memory
Index: sid/component/cgen-cpu/common-xml/interface.xml
===================================================================
RCS file: /cvs/src/src/sid/component/cgen-cpu/common-xml/interface.xml,v
retrieving revision 1.7
diff -c -p -r1.7 interface.xml
*** sid/component/cgen-cpu/common-xml/interface.xml	22 Mar 2004 21:38:38 -0000	1.7
--- sid/component/cgen-cpu/common-xml/interface.xml	26 Jun 2006 19:26:09 -0000
***************
*** 13,18 ****
--- 13,21 ----
      <defpin name="flush-icache" direction="in" legalvalues="any" behaviors="execution" />
      <defpin name="print-insn-summary!" direction="in" legalvalues="any" behaviors="tracing" />
      <defpin name="trace" direction="in" legalvalues="any" behaviors="tracing" />
+     <defpin name="gprof-pc" direction="out" legalvalues="integer" behaviors="profiling" />
+     <defpin name="gprof-pc-hi" direction="out" legalvalues="integer" behaviors="profiling" />
+     <defpin name="sample-gprof" direction="out" legalvalues="positive integer" behaviors="profiling" />
  
      <!-- accessors -->
      <defaccessor name="data-memory" accesses="any" behaviors="execution" />
Index: sid/component/cgen-cpu/m32r/m32rbf.h
===================================================================
RCS file: /cvs/src/src/sid/component/cgen-cpu/m32r/m32rbf.h,v
retrieving revision 1.2
diff -c -p -r1.2 m32rbf.h
*** sid/component/cgen-cpu/m32r/m32rbf.h	3 Aug 2001 06:02:43 -0000	1.2
--- sid/component/cgen-cpu/m32r/m32rbf.h	26 Jun 2006 19:26:09 -0000
*************** public:
*** 56,61 ****
--- 56,67 ----
        this->hardware.h_pc = (PCADDR) v; 
      }
  
+   host_int_4
+   get_pc ()
+     {
+       return this->hardware.h_pc;
+     }
+ 
    // Called by semantic code to perform branches.
    inline void
    branch (PCADDR new_pc, PCADDR& npc, sem_status& status)
Index: sid/component/cgen-cpu/mt/mt.h
===================================================================
RCS file: /cvs/src/src/sid/component/cgen-cpu/mt/mt.h,v
retrieving revision 1.2
diff -c -p -r1.2 mt.h
*** sid/component/cgen-cpu/mt/mt.h	16 Dec 2005 10:23:13 -0000	1.2
--- sid/component/cgen-cpu/mt/mt.h	26 Jun 2006 19:26:09 -0000
*************** namespace mt
*** 128,133 ****
--- 128,138 ----
  	  this->hardware.h_pc = (PCADDR) v; 
  	}
        
+       host_int_4 get_pc ()
+         {
+ 	  return this->hardware.h_pc;
+ 	}
+ 
        // Called by semantic code to perform branches.
        inline void
        branch (PCADDR new_pc, PCADDR& npc, sem_status& status)
Index: sid/component/cgen-cpu/xstormy16/xstormy16.h
===================================================================
RCS file: /cvs/src/src/sid/component/cgen-cpu/xstormy16/xstormy16.h,v
retrieving revision 1.2
diff -c -p -r1.2 xstormy16.h
*** sid/component/cgen-cpu/xstormy16/xstormy16.h	11 Jan 2002 07:25:03 -0000	1.2
--- sid/component/cgen-cpu/xstormy16/xstormy16.h	26 Jun 2006 19:26:09 -0000
*************** namespace xstormy16
*** 48,53 ****
--- 48,58 ----
  	  this->hardware.h_pc = (PCADDR) v; 
  	}
  
+       host_int_4 get_pc ()
+         {
+ 	  return this->hardware.h_pc;
+ 	}
+ 
        // syscall temporary registers
        USI syscall_arg0;
        USI syscall_arg1;
Index: sid/component/profiling/gprof.cxx
===================================================================
RCS file: /cvs/src/src/sid/component/profiling/gprof.cxx,v
retrieving revision 1.16
diff -c -p -r1.16 gprof.cxx
*** sid/component/profiling/gprof.cxx	14 Jun 2006 20:38:56 -0000	1.16
--- sid/component/profiling/gprof.cxx	26 Jun 2006 19:26:10 -0000
*************** namespace profiling_components
*** 135,147 ****
      vector<statistics> stats;
      unsigned current_stats;
  
-     string target_attribute;
      component* target_component;
  
      endian output_file_format;
  
      callback_pin<gprof_component> accumulate_pin;
  
      input_pin cg_caller_hi_pin;
      input_pin cg_caller_pin;
      input_pin cg_callee_hi_pin;
--- 135,148 ----
      vector<statistics> stats;
      unsigned current_stats;
  
      component* target_component;
  
      endian output_file_format;
  
      callback_pin<gprof_component> accumulate_pin;
  
+     input_pin pc_hi_pin;
+     input_pin pc_pin;
      input_pin cg_caller_hi_pin;
      input_pin cg_caller_pin;
      input_pin cg_callee_hi_pin;
*************** namespace profiling_components
*** 214,221 ****
  	if (s != component::ok) return s;
  
  	// Reject change if we already have samples 
! 	if ((this->stats[current_stats].value_count != 0) &&
! 	    (this->stats[current_stats].bucket_size != new_bucket_size))
  	  {
  	    cerr << "sw-profile-gprof: invalid time to change bucket size" << endl;
  	    return component::bad_value;
--- 215,223 ----
  	if (s != component::ok) return s;
  
  	// Reject change if we already have samples 
! 	statistics &st = this->stats[current_stats];
! 	if ((st.value_count != 0) &&
! 	    (st.bucket_size != new_bucket_size))
  	  {
  	    cerr << "sw-profile-gprof: invalid time to change bucket size" << endl;
  	    return component::bad_value;
*************** namespace profiling_components
*** 228,234 ****
  	    return component::bad_value;
  	  }
  
! 	this->stats[current_stats].bucket_size = new_bucket_size;
  	return component::ok;
        }
  
--- 230,236 ----
  	    return component::bad_value;
  	  }
  
! 	st.bucket_size = new_bucket_size;
  	return component::ok;
        }
  
*************** namespace profiling_components
*** 273,303 ****
        {
  	if (! this->target_component) return;
  
! 	string value_str = this->target_component->attribute_value (this->target_attribute);
! 	host_int_8 value;
! 	component::status s = parse_attribute (value_str, value);
! 	if (s != component::ok) return;
! 
! 	value_str = this->target_component->attribute_value (this->target_attribute + "-hi");
! 	host_int_8 value_hi;
! 	s = parse_attribute (value_str, value_hi);
! 	if (s != component::ok)
! 	  value_hi = 0;
! 
! 	value = (value_hi << 32) | (value & 0xffffffff);
  
  	//	std::cout << "sampled at 0x" << std::hex << value << std::dec << " for " << stats[current_stats].output_file << endl;
  	// Reject out-of-bounds samples
! 	if (value < this->stats[current_stats].limit_min || value > this->stats[current_stats].limit_max) return;
  
! 	stats[current_stats].value_count ++;
  
! 	assert (this->stats[current_stats].bucket_size != 0);
! 	host_int_8 quantized = (value / this->stats[current_stats].bucket_size) * this->stats[current_stats].bucket_size;
  
! 	if (quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = quantized;
! 	if (quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = quantized;
! 	this->stats[current_stats].value_hitcount_map [quantized] += ticks;
        }
  
      void accumulate_call (host_int_4 selfpc_low)
--- 275,295 ----
        {
  	if (! this->target_component) return;
  
! 	host_int_8 value = (((host_int_8)this->pc_hi_pin.sense ()) << 32) | (this->pc_pin.sense () & 0xffffffff);
  
  	//	std::cout << "sampled at 0x" << std::hex << value << std::dec << " for " << stats[current_stats].output_file << endl;
  	// Reject out-of-bounds samples
! 	statistics &st = this->stats[current_stats];
! 	if (value < st.limit_min || value > st.limit_max) return;
  
! 	st.value_count ++;
  
! 	assert (st.bucket_size != 0);
! 	host_int_8 quantized = (value / st.bucket_size) * st.bucket_size;
  
! 	if (quantized < st.value_min) st.value_min = quantized;
! 	if (quantized > st.value_max) st.value_max = quantized;
! 	st.value_hitcount_map [quantized] += ticks;
        }
  
      void accumulate_call (host_int_4 selfpc_low)
*************** namespace profiling_components
*** 306,326 ****
  	host_int_8 callerpc = (((host_int_8)this->cg_caller_hi_pin.sense ()) << 32) | (this->cg_caller_pin.sense () & 0xffffffff);
  
  	// Reject out-of-bounds samples
! 	if (selfpc < this->stats[current_stats].limit_min || selfpc > this->stats[current_stats].limit_max) return;
! 	if (callerpc < this->stats[current_stats].limit_min || callerpc > this->stats[current_stats].limit_max) return;
! 
! 	stats[current_stats].value_count ++;
! 
! 	assert (this->stats[current_stats].bucket_size != 0);
! 	host_int_8 c_quantized = (callerpc / this->stats[current_stats].bucket_size) * this->stats[current_stats].bucket_size;
! 	host_int_8 s_quantized = (selfpc / this->stats[current_stats].bucket_size) * this->stats[current_stats].bucket_size;
! 
! 	if (c_quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = c_quantized;
! 	if (s_quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = s_quantized;
! 	if (c_quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = c_quantized;
! 	if (s_quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = s_quantized;
  
! 	this->stats[current_stats].cg_count_map [make_pair(c_quantized,s_quantized)] ++;
        }
  
  
--- 298,319 ----
  	host_int_8 callerpc = (((host_int_8)this->cg_caller_hi_pin.sense ()) << 32) | (this->cg_caller_pin.sense () & 0xffffffff);
  
  	// Reject out-of-bounds samples
! 	statistics &st = this->stats[current_stats];
! 	if (selfpc < st.limit_min || selfpc > st.limit_max) return;
! 	if (callerpc < st.limit_min || callerpc > st.limit_max) return;
! 
! 	st.value_count ++;
! 
! 	assert (st.bucket_size != 0);
! 	host_int_8 c_quantized = (callerpc / st.bucket_size) * st.bucket_size;
! 	host_int_8 s_quantized = (selfpc / st.bucket_size) * st.bucket_size;
! 
! 	if (c_quantized < st.value_min) st.value_min = c_quantized;
! 	if (s_quantized < st.value_min) st.value_min = s_quantized;
! 	if (c_quantized > st.value_max) st.value_max = c_quantized;
! 	if (s_quantized > st.value_max) st.value_max = s_quantized;
  
! 	st.cg_count_map [make_pair(c_quantized,s_quantized)] ++;
        }
  
  
*************** namespace profiling_components
*** 563,569 ****
  
    public:
      gprof_component ():
-       target_attribute ("pc"),
        target_component (0),
        output_file_format (endian_unknown),
        accumulate_pin (this, & gprof_component::accumulate),
--- 556,561 ----
*************** namespace profiling_components
*** 577,582 ****
--- 569,578 ----
  
  	add_pin ("sample", & this->accumulate_pin);
  	add_attribute ("sample", & this->accumulate_pin, "pin");
+ 	add_pin ("pc", & this->pc_pin);
+ 	add_attribute ("pc", & this->pc_pin, "pin");
+ 	add_pin ("pc-hi", & this->pc_hi_pin);
+ 	add_attribute ("pc-hi", & this->pc_hi_pin, "pin");
  	add_pin ("cg-caller", & this->cg_caller_pin);
  	add_attribute ("cg-caller", & this->cg_caller_pin, "pin");
  	add_pin ("cg-caller-hi", & this->cg_caller_hi_pin);
*************** namespace profiling_components
*** 617,623 ****
  			       & gprof_component::limit_max_get,
  			       & gprof_component::limit_max_set,
  			       "setting");
- 	add_attribute ("value-attribute", & this->target_attribute, "setting");
  	add_attribute_virtual ("output-file", this,
  			       & gprof_component::output_file_get,
  			       & gprof_component::output_file_set,
--- 613,618 ----
*************** namespace profiling_components
*** 625,630 ****
--- 620,626 ----
  	add_attribute ("output-file-endianness", & this->output_file_format, "setting");
  	add_uni_relation ("target-component", & this->target_component);
  
+ 	pc_hi_pin.driven (0);
  	cg_caller_hi_pin.driven (0);
  	cg_callee_hi_pin.driven (0);
        }
Index: sid/include/sidcpuutil.h
===================================================================
RCS file: /cvs/src/src/sid/include/sidcpuutil.h,v
retrieving revision 1.37
diff -c -p -r1.37 sidcpuutil.h
*** sid/include/sidcpuutil.h	20 Jun 2006 18:13:45 -0000	1.37
--- sid/include/sidcpuutil.h	26 Jun 2006 19:26:10 -0000
*************** namespace sidutil
*** 253,258 ****
--- 253,260 ----
      output_pin cg_callee_pin;
      output_pin cg_jump_pin;
      output_pin cg_return_pin;
+     output_pin gprof_pc_hi_pin;
+     output_pin gprof_pc_pin;
      output_pin sample_gprof_pin;
     
      // tracing
*************** namespace sidutil
*** 417,422 ****
--- 419,426 ----
        gprof_counter += current_cycle - this->gprof_prev_cycle;
        this->gprof_prev_cycle = current_cycle;
  
+       this->gprof_pc_hi_pin.drive (this->get_pc_hi ());
+       this->gprof_pc_pin.drive (this->get_pc ());
        if (this->gprof_cycles == 0)
  	{
  	  // Sample for gprof in insn-count mode.
*************** namespace sidutil
*** 514,519 ****
--- 518,525 ----
    private:
      callback_pin<basic_cpu> pc_set_pin;
      virtual void set_pc(sid::host_int_4) = 0;
+     virtual sid::host_int_4 get_pc() = 0;
+     virtual sid::host_int_4 get_pc_hi() { return 0; }
      void pc_set_pin_handler(sid::host_int_4 v) { this->set_pc (v); }
  
      // Set the initial endianness after reset
*************** namespace sidutil
*** 860,869 ****
  	sid::host_int_4 length  = addr_and_length.second;
  
  	// We'll need the current pc.
! 	std::string pc_str = this->attribute_value ("pc");
! 	sid::host_int_4 pc;
! 	sid::component::status rc = sidutil::parse_attribute (pc_str, pc);
! 	assert (rc == sid::component::ok);
  
  	// Just read from insn memory by default
  	switch (length)
--- 866,872 ----
  	sid::host_int_4 length  = addr_and_length.second;
  
  	// We'll need the current pc.
! 	sid::host_int_4 pc = get_pc ();
  
  	// Just read from insn memory by default
  	switch (length)
*************** public:
*** 951,956 ****
--- 954,961 ----
  	add_pin ("cg-jump", & this->cg_jump_pin);
  	add_pin ("print-insn-summary!", & this->print_insn_summary_pin);
  	add_pin ("sample-gprof", &sample_gprof_pin);
+ 	add_pin ("gprof-pc-hi", &gprof_pc_hi_pin);
+ 	add_pin ("gprof-pc", &gprof_pc_pin);
  	add_pin ("endian-set!", & this->endian_set_pin);
  	add_pin ("eflags-set!", & this->eflags_set_pin);
  	add_watchable_pin ("trap", & this->trap_type_pin); // output side
Index: sid/main/dynamic/commonCfg.cxx
===================================================================
RCS file: /cvs/src/src/sid/main/dynamic/commonCfg.cxx,v
retrieving revision 1.14
diff -c -p -r1.14 commonCfg.cxx
*** sid/main/dynamic/commonCfg.cxx	20 Jun 2006 18:40:20 -0000	1.14
--- sid/main/dynamic/commonCfg.cxx	26 Jun 2006 19:26:10 -0000
*************** GprofCfg::GprofCfg (const string name, 
*** 1009,1015 ****
    relate (this, "target-component", cpu);
    conn_pin (cpu, "cg-caller", this, "cg-caller");
    conn_pin (cpu, "cg-callee", this, "cg-callee");
!   set (this, "value-attribute", "pc");
    set (this, "bucket-size", "4"); // bytes-per-bucket
    set (this, "output-file", filename);
  }
--- 1009,1016 ----
    relate (this, "target-component", cpu);
    conn_pin (cpu, "cg-caller", this, "cg-caller");
    conn_pin (cpu, "cg-callee", this, "cg-callee");
!   conn_pin (cpu, "gprof-pc-hi", this, "pc-hi");
!   conn_pin (cpu, "gprof-pc", this, "pc");
    set (this, "bucket-size", "4"); // bytes-per-bucket
    set (this, "output-file", filename);
  }
*************** GprofCfg::GprofCfg (const string name,
*** 1028,1034 ****
  
    sess->shutdown_seq->add_output (7, this, "store");
    relate (this, "target-component", cpu);
-   set (this, "value-attribute", "pc");
    set (this, "bucket-size", "4"); // bytes-per-bucket
  }
  
--- 1029,1034 ----

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]