This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
[rx-sim]: add cycle accuracy
- From: DJ Delorie <dj at redhat dot com>
- To: gdb-patches at sourceware dot org
- Date: Tue, 27 Jul 2010 22:06:42 -0400
- Subject: [rx-sim]: add cycle accuracy
This is a rather large but single-directory patch which makes the RX
simulator cycle-accurate. Well, mostly cycle accurate anyway - it's
within a small fraction of a percent compared to real hardware, on
large benchmarks. There's some speedups and documentation included
too. OK to commit?
I also built it with -Wall -Werror
There doesn't seem to be an rx-specific sim maintainer. Since I wrote
it, should I be the maintainer?
* README.txt: New.
* config.h (CYCLE_ACCURATE, CYCLE_STATS): New.
* configure.in (--enable-cycle-accurate, --enable-cycle-stats):
New. Default to enabled.
* configure: Regenerate.
* cpu.h (regs_type): Add cycle tracking info.
(reset_pipeline_stats): Declare.
(halt_pipeline_stats): Declare.
(pipeline_stats): Declare.
* main.c (done): Call pipeline_stats().
* mem.h (rx_mem_ptr): Moved to here ...
* mem.c (mem_ptr): ... from here. Rename throughout.
(mem_put_byte): Move LEDs to Port A. Add Port B to control cycle
statistics. Move UART to SCI4.
(mem_put_hi): Add TPU 1-2. TPU 1 and 2 count CPU cycles.
* reg.c (init_regs): Set Rt reg to -1 (no reg).
* rx.c: Add cycle counting and statistics throughout.
(rx_get_byte): Optimize for speed.
(decode_opcode): Likewise.
(reset_pipeline_stats): New.
(halt_pipeline_stats): New.
(pipeline_stats): New.
* trace.c (sim_disasm_one): Print cycle count.
Index: README.txt
===================================================================
RCS file: README.txt
diff -N README.txt
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ README.txt 28 Jul 2010 02:00:19 -0000
@@ -0,0 +1,121 @@
+The RX simulator offers two rx-specific configure options:
+
+--enable-cycle-accurate (default)
+--disable-cycle-accurate
+
+If enabled, the simulator will keep track of how many cycles each
+instruction takes. While not 100% accurate, it is very close,
+including modelling fetch stalls and register latency.
+
+--enable-cycle-stats (default)
+--disable-cycle-stats
+
+If enabled, specifying "-v" twice on the simulator command line causes
+the simulator to print statistics on how much time was used by each
+type of opcode, and what pairs of opcodes tend to happen most
+frequently, as well as how many times various pipeline stalls
+happened.
+
+
+
+The RX simulator offers many command line options:
+
+-v - verbose output. This prints some information about where the
+program is being loaded and its starting address, as well as
+information about how much memory was used and how many instructions
+were executed during the run. If specified twice, pipeline and cycle
+information are added to the report.
+
+-d - disassemble output. Each instruction executed is printed.
+
+-t - trace output. Causes a *lot* of printed information about what
+ every instruction is doing, from math results down to register
+ changes.
+
+--ignore-*
+--warn-*
+--error-*
+
+ The RX simulator can detect certain types of memory corruption, and
+ either ignore them, warn the user about them, or error and exit.
+ Note that valid GCC code may trigger some of these, for example,
+ writing a bitfield involves reading the existing value, which may
+ not have been set yet. The options for * are:
+
+ null-deref - memory access to address zero. You must modify your
+ linker script to avoid putting anything at location zero, of
+ course.
+
+ unwritten-pages - attempts to read a page of memory (see below)
+ before it is written. This is much faster than the next option.
+
+ unwritten-bytes - attempts to read individual bytes before they're
+ written.
+
+ corrupt-stack - On return from a subroutine, the memory location
+ where $pc was stored is checked to see if anything other than
+ $pc had been written to it most recently.
+
+-i -w -e - these three options change the settings for all of the
+ above. For example, "-i" tells the simulator to ignore all memory
+ corruption.
+
+-E - end of options. Any remaining options (after the program name)
+ are considered to be options for the simulated program, although
+ such functionality is not supported.
+
+
+
+The RX simulator simulates a small number of peripherals, mostly in
+order to provide I/O capabilities for testing and such. The supported
+peripherals, and their limitations, are documented here.
+
+Memory
+
+Memory for the simulator is stored in a heirarchical tree, much like
+the i386's page directory and page tables. The simulator can allocate
+memory to individual pages as needed, allowing the simulated program
+to act as if it had a full 4 Gb of RAM at its disposal, without
+actually allocating more memory from the host operating system than
+the simulated program actually uses. Note that for each page of
+memory, there's a corresponding page of memory *types* (for tracking
+memory corruption). Memory is initially filled with all zeros.
+
+GPIO Port A
+
+PA.DR is configured as an output-only port (regardless of PA.DDR).
+When written to, a row of colored @ and * symbols are printed,
+reflecting a row of eight LEDs being either on or off.
+
+GPIO Port B
+
+PB.DR controls the pipeline statistics. Writing a 0 to PB.DR disables
+statistics gathering. Writing a non-0 to PB.DR resets all counters
+and enables (even if already enabled) statistics gathering. The
+simulator starts with statistics enabled, so writing to PB.DR is not
+needed if you want statistics on the entire program's run.
+
+SCI4
+
+SCI4.TDR is connected to the simulator's stdout. Any byte written to
+SCI4.TDR is written to stdout. If the simulated program writes the
+bytes 3, 3, and N in sequence, the simulator exits with an exit value
+of N.
+
+SCI4.SSR always returns "transmitter empty".
+
+
+TPU1.TCNT
+TPU2.TCNT
+
+TPU1 and TPU2 are configured as a chained 32-bit counter which counts
+machine cycles. It always runs at "ICLK speed", regardless of the
+clock control settings. Writing to either of these 16-bit registers
+zeros the counter, regardless of the value written. Reading from
+these registers returns the elapsed cycle count, with TPU1 holding the
+most significant word and TPU2 holding the least significant word.
+
+Note that, much like the hardware, these values may (TPU2.CNT *will*)
+change between reads, so you must read TPU1.CNT, then TPU2.CNT, and
+then TPU1.CNT again, and only trust the values if both reads of
+TPU1.CNT were the same.
Index: config.in
===================================================================
RCS file: /cvs/src/src/sim/rx/config.in,v
retrieving revision 1.2
diff -p -U3 -r1.2 config.in
--- config.in 14 Feb 2010 07:37:11 -0000 1.2
+++ config.in 28 Jul 2010 02:00:19 -0000
@@ -105,3 +105,9 @@
/* Define to 1 if you have the ANSI C header files. */
#undef STDC_HEADERS
+
+/* --enable-cycle-accurate */
+#undef CYCLE_ACCURATE
+
+/* --enable-cycle-stats */
+#undef CYCLE_STATS
Index: configure.in
===================================================================
RCS file: /cvs/src/src/sim/rx/configure.in,v
retrieving revision 1.3
diff -p -U3 -r1.3 configure.in
--- configure.in 14 Feb 2010 07:37:11 -0000 1.3
+++ configure.in 28 Jul 2010 02:00:19 -0000
@@ -25,6 +25,36 @@ AC_CHECK_HEADERS(getopt.h)
sinclude(../common/aclocal.m4)
+AC_ARG_ENABLE(cycle-accurate,
+[ --disable-cycle-accurate ],
+[case "${enableval}" in
+yes | no) ;;
+*) AC_MSG_ERROR(bad value ${enableval} given for --enable-cycle-accurate option) ;;
+esac])
+
+AC_ARG_ENABLE(cycle-stats,
+[ --disable-cycle-stats ],
+[case "${enableval}" in
+yes | no) ;;
+*) AC_MSG_ERROR(bad value ${enableval} given for --enable-cycle-stats option) ;;
+esac])
+
+echo enable_cycle_accurate is $enable_cycle_accurate
+echo enable_cycle_stats is $enable_cycle_stats
+
+if test "x${enable_cycle_accurate}" != xno; then
+AC_DEFINE([CYCLE_ACCURATE])
+
+ if test "x${enable_cycle_stats}" != xno; then
+ AC_DEFINE([CYCLE_STATS])
+ fi
+else
+ if test "x${enable_cycle_stats}" != xno; then
+ AC_ERROR([cycle-stats not available without cycle-accurate])
+ fi
+fi
+
+
# Bugs in autoconf 2.59 break the call to SIM_AC_COMMON, hack around
# it by inlining the macro's contents.
sinclude(../common/common.m4)
Index: cpu.h
===================================================================
RCS file: /cvs/src/src/sim/rx/cpu.h,v
retrieving revision 1.2
diff -p -U3 -r1.2 cpu.h
--- cpu.h 1 Jan 2010 10:03:33 -0000 1.2
+++ cpu.h 28 Jul 2010 02:00:19 -0000
@@ -76,8 +76,24 @@ typedef struct
SI r_temp;
DI r_acc;
+
+#ifdef CYCLE_ACCURATE
+ /* If set, RTS/RTSD take 2 fewer cycles. */
+ char fast_return;
+ SI link_register;
+
+ unsigned long long cycle_count;
+ /* Bits saying what kind of memory operands the previous insn had. */
+ int m2m;
+ /* Target register for load. */
+ int rt;
+#endif
} regs_type;
+#define M2M_SRC 0x01
+#define M2M_DST 0x02
+#define M2M_BOTH 0x03
+
#define sp 0
#define psw 16
#define pc 17
@@ -219,6 +235,9 @@ extern unsigned int heaptop;
extern unsigned int heapbottom;
extern int decode_opcode (void);
+extern void reset_pipeline_stats (void);
+extern void halt_pipeline_stats (void);
+extern void pipeline_stats (void);
extern void trace_register_changes ();
extern void generate_access_exception (void);
Index: main.c
===================================================================
RCS file: /cvs/src/src/sim/rx/main.c,v
retrieving revision 1.3
diff -p -U3 -r1.3 main.c
--- main.c 14 Feb 2010 07:37:11 -0000 1.3
+++ main.c 28 Jul 2010 02:00:19 -0000
@@ -82,6 +82,8 @@ done (int exit_code)
printf ("insns: %14s\n", comma (rx_cycles));
else
printf ("insns: %u\n", rx_cycles);
+
+ pipeline_stats ();
}
exit (exit_code);
}
Index: mem.c
===================================================================
RCS file: /cvs/src/src/sim/rx/mem.c,v
retrieving revision 1.2
diff -p -U3 -r1.2 mem.c
--- mem.c 1 Jan 2010 10:03:33 -0000 1.2
+++ mem.c 28 Jul 2010 02:00:19 -0000
@@ -25,6 +25,7 @@ along with this program. If not, see <h
1. */
#define RDCHECK 0
+#include "config.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -37,7 +38,7 @@ along with this program. If not, see <h
#define L1_BITS (10)
#define L2_BITS (10)
-#define OFF_BITS (12)
+#define OFF_BITS PAGE_BITS
#define L1_LEN (1 << L1_BITS)
#define L2_LEN (1 << L2_BITS)
@@ -70,15 +71,8 @@ init_mem (void)
memset (mem_counters, 0, sizeof (mem_counters));
}
-enum mem_ptr_action
-{
- MPA_WRITING,
- MPA_READING,
- MPA_CONTENT_TYPE
-};
-
-static unsigned char *
-mem_ptr (unsigned long address, enum mem_ptr_action action)
+unsigned char *
+rx_mem_ptr (unsigned long address, enum mem_ptr_action action)
{
int pt1 = (address >> (L2_BITS + OFF_BITS)) & ((1 << L1_BITS) - 1);
int pt2 = (address >> OFF_BITS) & ((1 << L2_BITS) - 1);
@@ -240,7 +234,7 @@ e ()
static char
mtypec (int address)
{
- unsigned char *cp = mem_ptr (address, MPA_CONTENT_TYPE);
+ unsigned char *cp = rx_mem_ptr (address, MPA_CONTENT_TYPE);
return "udp"[*cp];
}
@@ -254,48 +248,75 @@ mem_put_byte (unsigned int address, unsi
if (trace)
tc = mtypec (address);
- m = mem_ptr (address, MPA_WRITING);
+ m = rx_mem_ptr (address, MPA_WRITING);
if (trace)
printf (" %02x%c", value, tc);
*m = value;
switch (address)
{
- case 0x00e1:
- {
+ case 0x0008c02a: /* PA.DR */
+ {
static int old_led = -1;
- static char *led_on[] =
- { "\033[31m O ", "\033[32m O ", "\033[34m O " };
- static char *led_off[] = { "\033[0m · ", "\033[0m · ", "\033[0m · " };
+ int red_on = 0;
int i;
+
if (old_led != value)
{
- fputs (" ", stdout);
- for (i = 0; i < 3; i++)
+ fputs (" ", stdout);
+ for (i = 0; i < 8; i++)
if (value & (1 << i))
- fputs (led_off[i], stdout);
+ {
+ if (! red_on)
+ {
+ fputs ("\033[31m", stdout);
+ red_on = 1;
+ }
+ fputs (" @", stdout);
+ }
else
- fputs (led_on[i], stdout);
- fputs ("\033[0m\r", stdout);
+ {
+ if (red_on)
+ {
+ fputs ("\033[0m", stdout);
+ red_on = 0;
+ }
+ fputs (" *", stdout);
+ }
+
+ if (red_on)
+ fputs ("\033[0m", stdout);
+
+ fputs ("\r", stdout);
fflush (stdout);
old_led = value;
}
}
break;
- case 0x3aa: /* uart1tx */
+#ifdef CYCLE_STATS
+ case 0x0008c02b: /* PB.DR */
{
- static int pending_exit = 0;
if (value == 0)
+ halt_pipeline_stats ();
+ else
+ reset_pipeline_stats ();
+ }
+#endif
+
+ case 0x00088263: /* SCI4.TDR */
+ {
+ static int pending_exit = 0;
+ if (pending_exit == 2)
{
- if (pending_exit)
- {
- step_result = RX_MAKE_EXITED(value);
- return;
- }
- pending_exit = 1;
+ step_result = RX_MAKE_EXITED(value);
+ longjmp (decode_jmp_buf, 1);
}
+ else if (value == 3)
+ pending_exit ++;
else
- putchar(value);
+ pending_exit = 0;
+
+ putchar(value);
}
break;
@@ -314,19 +335,33 @@ mem_put_qi (int address, unsigned char v
COUNT (1, 1);
}
+static int tpu_base;
+
void
mem_put_hi (int address, unsigned short value)
{
S ("<=");
- if (rx_big_endian)
- {
- mem_put_byte (address, value >> 8);
- mem_put_byte (address + 1, value & 0xff);
- }
- else
+ switch (address)
{
- mem_put_byte (address, value & 0xff);
- mem_put_byte (address + 1, value >> 8);
+#ifdef CYCLE_ACCURATE
+ case 0x00088126: /* TPU1.TCNT */
+ tpu_base = regs.cycle_count;
+ break;
+ case 0x00088136: /* TPU2.TCNT */
+ tpu_base = regs.cycle_count;
+ break;
+#endif
+ default:
+ if (rx_big_endian)
+ {
+ mem_put_byte (address, value >> 8);
+ mem_put_byte (address + 1, value & 0xff);
+ }
+ else
+ {
+ mem_put_byte (address, value & 0xff);
+ mem_put_byte (address + 1, value >> 8);
+ }
}
E ();
COUNT (1, 2);
@@ -388,7 +423,7 @@ mem_put_blk (int address, void *bufptr,
unsigned char
mem_get_pc (int address)
{
- unsigned char *m = mem_ptr (address, MPA_READING);
+ unsigned char *m = rx_mem_ptr (address, MPA_READING);
COUNT (0, 0);
return *m;
}
@@ -399,12 +434,12 @@ mem_get_byte (unsigned int address)
unsigned char *m;
S ("=>");
- m = mem_ptr (address, MPA_READING);
+ m = rx_mem_ptr (address, MPA_READING);
switch (address)
{
- case 0x3ad: /* uart1c1 */
+ case 0x00088264: /* SCI4.SSR */
E();
- return 2; /* transmitter empty */
+ return 0x04; /* transmitter empty */
break;
default:
if (trace)
@@ -433,15 +468,28 @@ mem_get_hi (int address)
{
unsigned short rv;
S ("=>");
- if (rx_big_endian)
- {
- rv = mem_get_byte (address) << 8;
- rv |= mem_get_byte (address + 1);
- }
- else
+ switch (address)
{
- rv = mem_get_byte (address);
- rv |= mem_get_byte (address + 1) << 8;
+#ifdef CYCLE_ACCURATE
+ case 0x00088126: /* TPU1.TCNT */
+ rv = (regs.cycle_count - tpu_base) >> 16;
+ break;
+ case 0x00088136: /* TPU2.TCNT */
+ rv = (regs.cycle_count - tpu_base) >> 0;
+ break;
+#endif
+
+ default:
+ if (rx_big_endian)
+ {
+ rv = mem_get_byte (address) << 8;
+ rv |= mem_get_byte (address + 1);
+ }
+ else
+ {
+ rv = mem_get_byte (address);
+ rv |= mem_get_byte (address + 1) << 8;
+ }
}
COUNT (0, 2);
E ();
@@ -520,7 +568,7 @@ sign_ext (int v, int bits)
void
mem_set_content_type (int address, enum mem_content_type type)
{
- unsigned char *mt = mem_ptr (address, MPA_CONTENT_TYPE);
+ unsigned char *mt = rx_mem_ptr (address, MPA_CONTENT_TYPE);
*mt = type;
}
@@ -537,7 +585,7 @@ mem_set_content_range (int start_address
if (sz + ofs > L1_LEN)
sz = L1_LEN - ofs;
- mt = mem_ptr (start_address, MPA_CONTENT_TYPE);
+ mt = rx_mem_ptr (start_address, MPA_CONTENT_TYPE);
memset (mt, type, sz);
start_address += sz;
@@ -547,6 +595,6 @@ mem_set_content_range (int start_address
enum mem_content_type
mem_get_content_type (int address)
{
- unsigned char *mt = mem_ptr (address, MPA_CONTENT_TYPE);
+ unsigned char *mt = rx_mem_ptr (address, MPA_CONTENT_TYPE);
return *mt;
}
Index: mem.h
===================================================================
RCS file: /cvs/src/src/sim/rx/mem.h,v
retrieving revision 1.2
diff -p -U3 -r1.2 mem.h
--- mem.h 1 Jan 2010 10:03:33 -0000 1.2
+++ mem.h 28 Jul 2010 02:00:19 -0000
@@ -25,10 +25,25 @@ enum mem_content_type {
MC_NUM_TYPES
};
+enum mem_ptr_action
+{
+ MPA_WRITING,
+ MPA_READING,
+ MPA_CONTENT_TYPE
+};
+
void init_mem (void);
void mem_usage_stats (void);
unsigned long mem_usage_cycles (void);
+/* rx_mem_ptr returns a pointer which is valid as long as the address
+ requested remains within the same page. */
+#define PAGE_BITS 12
+#define PAGE_SIZE (1 << PAGE_BITS)
+#define NONPAGE_MASK (~(PAGE_SIZE-1))
+
+unsigned char *rx_mem_ptr (unsigned long address, enum mem_ptr_action action);
+
void mem_put_qi (int address, unsigned char value);
void mem_put_hi (int address, unsigned short value);
void mem_put_psi (int address, unsigned long value);
Index: reg.c
===================================================================
RCS file: /cvs/src/src/sim/rx/reg.c,v
retrieving revision 1.3
diff -p -U3 -r1.3 reg.c
--- reg.c 8 Jun 2010 09:15:17 -0000 1.3
+++ reg.c 28 Jul 2010 02:00:19 -0000
@@ -19,6 +19,7 @@
along with this program. If not, see <http://www.gnu.org/licenses/>. */
+#include "config.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -67,6 +68,11 @@ init_regs (void)
{
memset (®s, 0, sizeof (regs));
memset (&oldregs, 0, sizeof (oldregs));
+
+#ifdef CYCLE_ACCURATE
+ regs.rt = -1;
+ oldregs.rt = -1;
+#endif
}
static unsigned int
Index: rx.c
===================================================================
RCS file: /cvs/src/src/sim/rx/rx.c,v
retrieving revision 1.4
diff -p -U3 -r1.4 rx.c
--- rx.c 1 Jan 2010 10:03:33 -0000 1.4
+++ rx.c 28 Jul 2010 02:00:19 -0000
@@ -18,6 +18,7 @@ GNU General Public License for more deta
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>. */
+#include "config.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -29,12 +30,254 @@ along with this program. If not, see <h
#include "syscalls.h"
#include "fpu.h"
#include "err.h"
+#include "misc.h"
-#define tprintf if (trace) printf
+#ifdef CYCLE_STATS
+static const char * id_names[] = {
+ "RXO_unknown",
+ "RXO_mov", /* d = s (signed) */
+ "RXO_movbi", /* d = [s,s2] (signed) */
+ "RXO_movbir", /* [s,s2] = d (signed) */
+ "RXO_pushm", /* s..s2 */
+ "RXO_popm", /* s..s2 */
+ "RXO_pusha", /* &s */
+ "RXO_xchg", /* s <-> d */
+ "RXO_stcc", /* d = s if cond(s2) */
+ "RXO_rtsd", /* rtsd, 1=imm, 2-0 = reg if reg type */
+
+ /* These are all either d OP= s or, if s2 is set, d = s OP s2. Note
+ that d may be "None". */
+ "RXO_and",
+ "RXO_or",
+ "RXO_xor",
+ "RXO_add",
+ "RXO_sub",
+ "RXO_mul",
+ "RXO_div",
+ "RXO_divu",
+ "RXO_shll",
+ "RXO_shar",
+ "RXO_shlr",
+
+ "RXO_adc", /* d = d + s + carry */
+ "RXO_sbb", /* d = d - s - ~carry */
+ "RXO_abs", /* d = |s| */
+ "RXO_max", /* d = max(d,s) */
+ "RXO_min", /* d = min(d,s) */
+ "RXO_emul", /* d:64 = d:32 * s */
+ "RXO_emulu", /* d:64 = d:32 * s (unsigned) */
+ "RXO_ediv", /* d:64 / s; d = quot, d+1 = rem */
+ "RXO_edivu", /* d:64 / s; d = quot, d+1 = rem */
+
+ "RXO_rolc", /* d <<= 1 through carry */
+ "RXO_rorc", /* d >>= 1 through carry*/
+ "RXO_rotl", /* d <<= #s without carry */
+ "RXO_rotr", /* d >>= #s without carry*/
+ "RXO_revw", /* d = revw(s) */
+ "RXO_revl", /* d = revl(s) */
+ "RXO_branch", /* pc = d if cond(s) */
+ "RXO_branchrel",/* pc += d if cond(s) */
+ "RXO_jsr", /* pc = d */
+ "RXO_jsrrel", /* pc += d */
+ "RXO_rts",
+ "RXO_nop",
+ "RXO_nop2",
+ "RXO_nop3",
+
+ "RXO_scmpu",
+ "RXO_smovu",
+ "RXO_smovb",
+ "RXO_suntil",
+ "RXO_swhile",
+ "RXO_smovf",
+ "RXO_sstr",
+
+ "RXO_rmpa",
+ "RXO_mulhi",
+ "RXO_mullo",
+ "RXO_machi",
+ "RXO_maclo",
+ "RXO_mvtachi",
+ "RXO_mvtaclo",
+ "RXO_mvfachi",
+ "RXO_mvfacmi",
+ "RXO_mvfaclo",
+ "RXO_racw",
+
+ "RXO_sat", /* sat(d) */
+ "RXO_satr",
+
+ "RXO_fadd", /* d op= s */
+ "RXO_fcmp",
+ "RXO_fsub",
+ "RXO_ftoi",
+ "RXO_fmul",
+ "RXO_fdiv",
+ "RXO_round",
+ "RXO_itof",
+
+ "RXO_bset", /* d |= (1<<s) */
+ "RXO_bclr", /* d &= ~(1<<s) */
+ "RXO_btst", /* s & (1<<s2) */
+ "RXO_bnot", /* d ^= (1<<s) */
+ "RXO_bmcc", /* d<s> = cond(s2) */
+
+ "RXO_clrpsw", /* flag index in d */
+ "RXO_setpsw", /* flag index in d */
+ "RXO_mvtipl", /* new IPL in s */
+
+ "RXO_rtfi",
+ "RXO_rte",
+ "RXO_rtd", /* undocumented */
+ "RXO_brk",
+ "RXO_dbt", /* undocumented */
+ "RXO_int", /* vector id in s */
+ "RXO_stop",
+ "RXO_wait",
+
+ "RXO_sccnd", /* d = cond(s) ? 1 : 0 */
+};
+
+static const char * optype_names[] = {
+ " ",
+ "#Imm", /* #addend */
+ " Rn ", /* Rn */
+ "[Rn]", /* [Rn + addend] */
+ "Ps++", /* [Rn+] */
+ "--Pr", /* [-Rn] */
+ " cc ", /* eq, gtu, etc */
+ "Flag" /* [UIOSZC] */
+};
+
+#define N_RXO (sizeof(id_names)/sizeof(id_names[0]))
+#define N_RXT (sizeof(optype_names)/sizeof(optype_names[0]))
+#define N_MAP 30
+
+static unsigned long long benchmark_start_cycle;
+static unsigned long long benchmark_end_cycle;
+
+static int op_cache[N_RXT][N_RXT][N_RXT];
+static int op_cache_rev[N_MAP];
+static int op_cache_idx = 0;
+
+static int
+op_lookup (int a, int b, int c)
+{
+ if (op_cache[a][b][c])
+ return op_cache[a][b][c];
+ op_cache_idx ++;
+ if (op_cache_idx >= N_MAP)
+ {
+ printf("op_cache_idx exceeds %d\n", N_MAP);
+ exit(1);
+ }
+ op_cache[a][b][c] = op_cache_idx;
+ op_cache_rev[op_cache_idx] = (a<<8) | (b<<4) | c;
+ return op_cache_idx;
+}
+
+static char *
+op_cache_string (int map)
+{
+ static int ci;
+ static char cb[5][20];
+ int a, b, c;
+
+ map = op_cache_rev[map];
+ a = (map >> 8) & 15;
+ b = (map >> 4) & 15;
+ c = (map >> 0) & 15;
+ ci = (ci + 1) % 5;
+ sprintf(cb[ci], "%s %s %s", optype_names[a], optype_names[b], optype_names[c]);
+ return cb[ci];
+}
+
+static unsigned long long cycles_per_id[N_RXO][N_MAP];
+static unsigned long long times_per_id[N_RXO][N_MAP];
+static unsigned long long memory_stalls;
+static unsigned long long register_stalls;
+static unsigned long long branch_stalls;
+static unsigned long long branch_alignment_stalls;
+static unsigned long long fast_returns;
+
+static unsigned long times_per_pair[N_RXO][N_MAP][N_RXO][N_MAP];
+static int prev_opcode_id = RXO_unknown;
+static int po0;
+
+#define STATS(x) x
+
+#else
+#define STATS(x)
+#endif /* CYCLE_STATS */
+
+
+#ifdef CYCLE_ACCURATE
+
+static int new_rt = -1;
+
+/* Number of cycles to add if an insn spans an 8-byte boundary. */
+static int branch_alignment_penalty = 0;
+
+#endif
+
+static int running_benchmark = 1;
+
+#define tprintf if (trace && running_benchmark) printf
jmp_buf decode_jmp_buf;
unsigned int rx_cycles = 0;
+#ifdef CYCLE_ACCURATE
+/* If nonzero, memory was read at some point and cycle latency might
+ take effect. */
+static int memory_source = 0;
+/* If nonzero, memory was written and extra cycles might be
+ needed. */
+static int memory_dest = 0;
+
+static void
+cycles (int throughput)
+{
+ tprintf("%d cycles\n", throughput);
+ regs.cycle_count += throughput;
+}
+
+/* Number of execution (E) cycles the op uses. For memory sources, we
+ include the load micro-op stall as two extra E cycles. */
+#define E(c) cycles (memory_source ? c + 2 : c)
+#define E1 cycles (1)
+#define E2 cycles (2)
+#define EBIT cycles (memory_source ? 2 : 1)
+
+/* Check to see if a read latency must be applied for a given register. */
+#define RL(r) \
+ if (regs.rt == r ) \
+ { \
+ tprintf("register %d load stall\n", r); \
+ regs.cycle_count ++; \
+ STATS(register_stalls ++); \
+ regs.rt = -1; \
+ }
+
+#define RLD(r) \
+ if (memory_source) \
+ { \
+ tprintf ("Rt now %d\n", r); \
+ new_rt = r; \
+ }
+
+#else /* !CYCLE_ACCURATE */
+
+#define cycles(t)
+#define E(c)
+#define E1
+#define E2
+#define EBIT
+#define RL(r)
+#define RLD(r)
+
+#endif /* else CYCLE_ACCURATE */
+
static int size2bytes[] = {
4, 1, 1, 1, 2, 2, 2, 3, 4
};
@@ -53,24 +296,28 @@ _rx_abort (const char *file, int line)
abort();
}
+static unsigned char *get_byte_base;
+static SI get_byte_page;
+
+/* This gets called a *lot* so optimize it. */
static int
rx_get_byte (void *vdata)
{
- int saved_trace = trace;
- unsigned char rv;
-
- if (trace == 1)
- trace = 0;
-
RX_Data *rx_data = (RX_Data *)vdata;
+ SI tpc = rx_data->dpc;
+
+ /* See load.c for an explanation of this. */
if (rx_big_endian)
- /* See load.c for an explanation of this. */
- rv = mem_get_pc (rx_data->dpc ^ 3);
- else
- rv = mem_get_pc (rx_data->dpc);
+ tpc ^= 3;
+
+ if (((tpc ^ get_byte_page) & NONPAGE_MASK) || enable_counting)
+ {
+ get_byte_page = tpc & NONPAGE_MASK;
+ get_byte_base = rx_mem_ptr (get_byte_page, MPA_READING) - get_byte_page;
+ }
+
rx_data->dpc ++;
- trace = saved_trace;
- return rv;
+ return get_byte_base [tpc];
}
static int
@@ -88,6 +335,7 @@ get_op (RX_Opcode_Decoded *rd, int i)
return o->addend;
case RX_Operand_Register: /* Rn */
+ RL (o->reg);
rv = get_reg (o->reg);
break;
@@ -96,6 +344,21 @@ get_op (RX_Opcode_Decoded *rd, int i)
/* fall through */
case RX_Operand_Postinc: /* [Rn+] */
case RX_Operand_Indirect: /* [Rn + addend] */
+#ifdef CYCLE_ACCURATE
+ RL (o->reg);
+ regs.rt = -1;
+ if (regs.m2m == M2M_BOTH)
+ {
+ tprintf("src memory stall\n");
+#ifdef CYCLE_STATS
+ memory_stalls ++;
+#endif
+ regs.cycle_count ++;
+ regs.m2m = 0;
+ }
+
+ memory_source = 1;
+#endif
addr = get_reg (o->reg) + o->addend;
switch (o->size)
@@ -234,6 +497,7 @@ put_op (RX_Opcode_Decoded *rd, int i, in
case RX_Operand_Register: /* Rn */
put_reg (o->reg, v);
+ RLD (o->reg);
break;
case RX_Operand_Predec: /* [-Rn] */
@@ -242,6 +506,19 @@ put_op (RX_Opcode_Decoded *rd, int i, in
case RX_Operand_Postinc: /* [Rn+] */
case RX_Operand_Indirect: /* [Rn + addend] */
+#ifdef CYCLE_ACCURATE
+ if (regs.m2m == M2M_BOTH)
+ {
+ tprintf("dst memory stall\n");
+ regs.cycle_count ++;
+#ifdef CYCLE_STATS
+ memory_stalls ++;
+#endif
+ regs.m2m = 0;
+ }
+ memory_dest = 1;
+#endif
+
addr = get_reg (o->reg) + o->addend;
switch (o->size)
{
@@ -345,8 +622,8 @@ poppc()
#define MATH_OP(vop,c) \
{ \
- uma = US1(); \
umb = US2(); \
+ uma = US1(); \
ll = (unsigned long long) uma vop (unsigned long long) umb vop c; \
tprintf ("0x%x " #vop " 0x%x " #vop " 0x%x = 0x%llx\n", uma, umb, c, ll); \
ma = sign_ext (uma, DSZ() * 8); \
@@ -355,23 +632,25 @@ poppc()
tprintf ("%d " #vop " %d " #vop " %d = %lld\n", ma, mb, c, sll); \
set_oszc (sll, DSZ(), (long long) ll > ((1 vop 1) ? (long long) b2mask[DSZ()] : (long long) -1)); \
PD (sll); \
+ E (1); \
}
#define LOGIC_OP(vop) \
{ \
- ma = US1(); \
mb = US2(); \
+ ma = US1(); \
v = ma vop mb; \
tprintf("0x%x " #vop " 0x%x = 0x%x\n", ma, mb, v); \
set_sz (v, DSZ()); \
PD(v); \
+ E (1); \
}
#define SHIFT_OP(val, type, count, OP, carry_mask) \
{ \
int i, c=0; \
- val = (type)US1(); \
count = US2(); \
+ val = (type)US1(); \
tprintf("%lld " #OP " %d\n", val, count); \
for (i = 0; i < count; i ++) \
{ \
@@ -443,8 +722,8 @@ fop_fsub (fp_t s1, fp_t s2, fp_t *d)
int do_store; \
fp_t fa, fb, fc; \
FPCLEAR(); \
- fa = GD (); \
fb = GS (); \
+ fa = GD (); \
do_store = fop_##func (fa, fb, &fc); \
tprintf("%g " #func " %g = %g %08x\n", int2float(fa), int2float(fb), int2float(fc), fc); \
FPCHECK(); \
@@ -549,6 +828,21 @@ do_fp_exception (unsigned long opcode_pc
return RX_MAKE_STEPPED ();
}
+static int
+op_is_memory (RX_Opcode_Decoded *rd, int i)
+{
+ switch (rd->op[i].type)
+ {
+ case RX_Operand_Predec:
+ case RX_Operand_Postinc:
+ case RX_Operand_Indirect:
+ return 1;
+ default:
+ return 0;
+ }
+}
+#define OM(i) op_is_memory (&opcode, i)
+
int
decode_opcode ()
{
@@ -561,14 +855,46 @@ decode_opcode ()
RX_Data rx_data;
RX_Opcode_Decoded opcode;
int rv;
+#ifdef CYCLE_STATS
+ unsigned long long prev_cycle_count;
+#endif
+#ifdef CYCLE_ACCURATE
+ int tx;
+#endif
if ((rv = setjmp (decode_jmp_buf)))
return rv;
+#ifdef CYCLE_STATS
+ prev_cycle_count = regs.cycle_count;
+#endif
+
+#ifdef CYCLE_ACCURATE
+ memory_source = 0;
+ memory_dest = 0;
+#endif
+
rx_cycles ++;
rx_data.dpc = opcode_pc = regs.r_pc;
+ memset (&opcode, 0, sizeof(opcode));
opcode_size = rx_decode_opcode (opcode_pc, &opcode, rx_get_byte, &rx_data);
+
+#ifdef CYCLE_ACCURATE
+ if (branch_alignment_penalty)
+ {
+ if ((regs.r_pc ^ (regs.r_pc + opcode_size - 1)) & ~7)
+ {
+ tprintf("1 cycle branch alignment penalty\n");
+ cycles (branch_alignment_penalty);
+#ifdef CYCLE_STATS
+ branch_alignment_stalls ++;
+#endif
+ }
+ branch_alignment_penalty = 0;
+ }
+#endif
+
regs.r_pc += opcode_size;
rx_flagmask = opcode.flags_s;
@@ -585,6 +911,7 @@ decode_opcode ()
tprintf("%lld\n", sll);
PD (sll);
set_osz (sll, 4);
+ E (1);
break;
case RXO_adc:
@@ -608,6 +935,7 @@ decode_opcode ()
mb &= 0x07;
ma &= ~(1 << mb);
PD (ma);
+ EBIT;
break;
case RXO_bmcc:
@@ -622,6 +950,7 @@ decode_opcode ()
else
ma &= ~(1 << mb);
PD (ma);
+ EBIT;
break;
case RXO_bnot:
@@ -633,16 +962,71 @@ decode_opcode ()
mb &= 0x07;
ma ^= (1 << mb);
PD (ma);
+ EBIT;
break;
case RXO_branch:
if (GS())
- regs.r_pc = GD();
+ {
+#ifdef CYCLE_ACCURATE
+ SI old_pc = regs.r_pc;
+ int delta;
+#endif
+ regs.r_pc = GD();
+#ifdef CYCLE_ACCURATE
+ delta = regs.r_pc - old_pc;
+ if (delta >= 0 && delta < 16
+ && opcode_size > 1)
+ {
+ tprintf("near forward branch bonus\n");
+ cycles (2);
+ }
+ else
+ {
+ cycles (3);
+ branch_alignment_penalty = 1;
+ }
+#ifdef CYCLE_STATS
+ branch_stalls ++;
+ /* This is just for statistics */
+ if (opcode.op[1].reg == 14)
+ opcode.op[1].type = RX_Operand_None;
+#endif
+#endif
+ }
+#ifdef CYCLE_ACCURATE
+ else
+ cycles (1);
+#endif
break;
case RXO_branchrel:
if (GS())
- regs.r_pc += GD();
+ {
+ int delta = GD();
+ regs.r_pc += delta;
+#ifdef CYCLE_ACCURATE
+ /* Note: specs say 3, chip says 2. */
+ if (delta >= 0 && delta < 16
+ && opcode_size > 1)
+ {
+ tprintf("near forward branch bonus\n");
+ cycles (2);
+ }
+ else
+ {
+ cycles (3);
+ branch_alignment_penalty = 1;
+ }
+#ifdef CYCLE_STATS
+ branch_stalls ++;
+#endif
+#endif
+ }
+#ifdef CYCLE_ACCURATE
+ else
+ cycles (1);
+#endif
break;
case RXO_brk:
@@ -659,6 +1043,7 @@ decode_opcode ()
pushpc (old_psw);
pushpc (regs.r_pc);
regs.r_pc = mem_get_si (regs.r_intb);
+ cycles(6);
}
break;
@@ -671,6 +1056,7 @@ decode_opcode ()
mb &= 0x07;
ma |= (1 << mb);
PD (ma);
+ EBIT;
break;
case RXO_btst:
@@ -682,6 +1068,7 @@ decode_opcode ()
mb &= 0x07;
umb = ma & (1 << mb);
set_zc (! umb, umb);
+ EBIT;
break;
case RXO_clrpsw:
@@ -691,6 +1078,7 @@ decode_opcode ()
|| v == FLAGBIT_U))
break;
regs.r_psw &= ~v;
+ cycles (1);
break;
case RXO_div: /* d = d / s */
@@ -709,6 +1097,8 @@ decode_opcode ()
set_flags (FLAGBIT_O, 0);
PD (v);
}
+ /* Note: spec says 3 to 22 cycles, we are pessimistic. */
+ cycles (22);
break;
case RXO_divu: /* d = d / s */
@@ -727,6 +1117,8 @@ decode_opcode ()
set_flags (FLAGBIT_O, 0);
PD (v);
}
+ /* Note: spec says 2 to 20 cycles, we are pessimistic. */
+ cycles (20);
break;
case RXO_ediv:
@@ -748,6 +1140,8 @@ decode_opcode ()
opcode.op[0].reg ++;
PD (mb);
}
+ /* Note: spec says 3 to 22 cycles, we are pessimistic. */
+ cycles (22);
break;
case RXO_edivu:
@@ -769,6 +1163,8 @@ decode_opcode ()
opcode.op[0].reg ++;
PD (umb);
}
+ /* Note: spec says 2 to 20 cycles, we are pessimistic. */
+ cycles (20);
break;
case RXO_emul:
@@ -779,6 +1175,7 @@ decode_opcode ()
PD (sll);
opcode.op[0].reg ++;
PD (sll >> 32);
+ E2;
break;
case RXO_emulu:
@@ -789,10 +1186,12 @@ decode_opcode ()
PD (ll);
opcode.op[0].reg ++;
PD (ll >> 32);
+ E2;
break;
case RXO_fadd:
FLOAT_OP (fadd);
+ E (4);
break;
case RXO_fcmp:
@@ -801,24 +1200,32 @@ decode_opcode ()
FPCLEAR ();
rxfp_cmp (ma, mb);
FPCHECK ();
+ E (1);
break;
case RXO_fdiv:
FLOAT_OP (fdiv);
+ E (16);
break;
case RXO_fmul:
FLOAT_OP (fmul);
+ E (3);
break;
case RXO_rtfi:
PRIVILEDGED ();
regs.r_psw = regs.r_bpsw;
regs.r_pc = regs.r_bpc;
+#ifdef CYCLE_ACCURATE
+ regs.fast_return = 0;
+ cycles(3);
+#endif
break;
case RXO_fsub:
FLOAT_OP (fsub);
+ E (4);
break;
case RXO_ftoi:
@@ -829,6 +1236,7 @@ decode_opcode ()
PD (mb);
tprintf("(int) %g = %d\n", int2float(ma), mb);
set_sz (mb, 4);
+ E (2);
break;
case RXO_int:
@@ -845,6 +1253,7 @@ decode_opcode ()
pushpc (regs.r_pc);
regs.r_pc = mem_get_si (regs.r_intb + 4 * v);
}
+ cycles (6);
break;
case RXO_itof:
@@ -855,49 +1264,87 @@ decode_opcode ()
tprintf("(float) %d = %x\n", ma, mb);
PD (mb);
set_sz (ma, 4);
+ E (2);
break;
case RXO_jsr:
case RXO_jsrrel:
- v = GD ();
- pushpc (get_reg (pc));
- if (opcode.id == RXO_jsrrel)
- v += regs.r_pc;
- put_reg (pc, v);
+ {
+#ifdef CYCLE_ACCURATE
+ int delta;
+ regs.m2m = 0;
+#endif
+ v = GD ();
+#ifdef CYCLE_ACCURATE
+ regs.link_register = regs.r_pc;
+#endif
+ pushpc (get_reg (pc));
+ if (opcode.id == RXO_jsrrel)
+ v += regs.r_pc;
+#ifdef CYCLE_ACCURATE
+ delta = v - regs.r_pc;
+#endif
+ put_reg (pc, v);
+#ifdef CYCLE_ACCURATE
+ /* Note: docs say 3, chip says 2 */
+ if (delta >= 0 && delta < 16)
+ {
+ tprintf ("near forward jsr bonus\n");
+ cycles (2);
+ }
+ else
+ {
+ branch_alignment_penalty = 1;
+ cycles (3);
+ }
+ regs.fast_return = 1;
+#endif
+ }
break;
case RXO_machi:
ll = (long long)(signed short)(GS() >> 16) * (long long)(signed short)(GS2 () >> 16);
ll <<= 16;
put_reg64 (acc64, ll + regs.r_acc);
+ E1;
break;
case RXO_maclo:
ll = (long long)(signed short)(GS()) * (long long)(signed short)(GS2 ());
ll <<= 16;
put_reg64 (acc64, ll + regs.r_acc);
+ E1;
break;
case RXO_max:
- ma = GD();
mb = GS();
+ ma = GD();
if (ma > mb)
PD (ma);
else
PD (mb);
+ E (1);
+#ifdef CYCLE_STATS
+ if (opcode.op[0].type == RX_Operand_Register
+ && opcode.op[1].type == RX_Operand_Register
+ && opcode.op[0].reg == opcode.op[1].reg)
+ opcode.id = RXO_nop3;
+#endif
break;
case RXO_min:
- ma = GD();
mb = GS();
+ ma = GD();
if (ma < mb)
PD (ma);
else
PD (mb);
+ E (1);
break;
case RXO_mov:
v = GS ();
+
if (opcode.op[0].type == RX_Operand_Register
&& opcode.op[0].reg == 16 /* PSW */)
{
@@ -927,8 +1374,32 @@ decode_opcode ()
/* These are ignored. */
break;
}
+ if (OM(0) && OM(1))
+ cycles (2);
+ else
+ cycles (1);
+
PD (v);
+
+#ifdef CYCLE_ACCURATE
+ if ((opcode.op[0].type == RX_Operand_Predec
+ && opcode.op[1].type == RX_Operand_Register)
+ || (opcode.op[0].type == RX_Operand_Postinc
+ && opcode.op[1].type == RX_Operand_Register))
+ {
+ /* Special case: push reg doesn't cause a memory stall. */
+ memory_dest = 0;
+ tprintf("push special case\n");
+ }
+#endif
+
set_sz (v, DSZ());
+#ifdef CYCLE_STATS
+ if (opcode.op[0].type == RX_Operand_Register
+ && opcode.op[1].type == RX_Operand_Register
+ && opcode.op[0].reg == opcode.op[1].reg)
+ opcode.id = RXO_nop2;
+#endif
break;
case RXO_movbi:
@@ -939,6 +1410,7 @@ decode_opcode ()
opcode.op[1].type = RX_Operand_Indirect;
opcode.op[1].addend = 0;
PD (GS ());
+ cycles (1);
break;
case RXO_movbir:
@@ -949,51 +1421,65 @@ decode_opcode ()
opcode.op[1].type = RX_Operand_Indirect;
opcode.op[1].addend = 0;
PS (GD ());
+ cycles (1);
break;
case RXO_mul:
- ll = (unsigned long long) US1() * (unsigned long long) US2();
+ v = US2 ();
+ ll = (unsigned long long) US1() * (unsigned long long) v;
PD(ll);
+ E (1);
break;
case RXO_mulhi:
- ll = (long long)(signed short)(GS() >> 16) * (long long)(signed short)(GS2 () >> 16);
+ v = GS2 ();
+ ll = (long long)(signed short)(GS() >> 16) * (long long)(signed short)(v >> 16);
ll <<= 16;
put_reg64 (acc64, ll);
+ E1;
break;
case RXO_mullo:
- ll = (long long)(signed short)(GS()) * (long long)(signed short)(GS2 ());
+ v = GS2 ();
+ ll = (long long)(signed short)(GS()) * (long long)(signed short)(v);
ll <<= 16;
put_reg64 (acc64, ll);
+ E1;
break;
case RXO_mvfachi:
PD (get_reg (acchi));
+ E1;
break;
case RXO_mvfaclo:
PD (get_reg (acclo));
+ E1;
break;
case RXO_mvfacmi:
PD (get_reg (accmi));
+ E1;
break;
case RXO_mvtachi:
put_reg (acchi, GS ());
+ E1;
break;
case RXO_mvtaclo:
put_reg (acclo, GS ());
+ E1;
break;
case RXO_mvtipl:
regs.r_psw &= ~ FLAGBITS_IPL;
regs.r_psw |= (GS () << FLAGSHIFT_IPL) & FLAGBITS_IPL;
+ E1;
break;
case RXO_nop:
+ E1;
break;
case RXO_or:
@@ -1010,11 +1496,11 @@ decode_opcode ()
return RX_MAKE_STOPPED (SIGILL);
}
for (v = opcode.op[1].reg; v <= opcode.op[2].reg; v++)
- put_reg (v, pop ());
- break;
-
- case RXO_pusha:
- push (get_reg (opcode.op[1].reg) + opcode.op[1].addend);
+ {
+ cycles (1);
+ RLD (v);
+ put_reg (v, pop ());
+ }
break;
case RXO_pushm:
@@ -1027,7 +1513,11 @@ decode_opcode ()
return RX_MAKE_STOPPED (SIGILL);
}
for (v = opcode.op[2].reg; v >= opcode.op[1].reg; v--)
- push (get_reg (v));
+ {
+ RL (v);
+ push (get_reg (v));
+ }
+ cycles (opcode.op[2].reg - opcode.op[1].reg + 1);
break;
case RXO_racw:
@@ -1040,6 +1530,7 @@ decode_opcode ()
else
ll &= 0xffffffff00000000ULL;
put_reg64 (acc64, ll);
+ E1;
break;
case RXO_rte:
@@ -1048,6 +1539,10 @@ decode_opcode ()
regs.r_psw = poppc ();
if (FLAG_PM)
regs.r_psw |= FLAGBIT_U;
+#ifdef CYCLE_ACCURATE
+ regs.fast_return = 0;
+ cycles (6);
+#endif
break;
case RXO_revl:
@@ -1057,6 +1552,7 @@ decode_opcode ()
| ((uma << 8) & 0xff0000)
| ((uma << 24) & 0xff000000UL));
PD (umb);
+ E1;
break;
case RXO_revw:
@@ -1064,9 +1560,16 @@ decode_opcode ()
umb = (((uma >> 8) & 0x00ff00ff)
| ((uma << 8) & 0xff00ff00UL));
PD (umb);
+ E1;
break;
case RXO_rmpa:
+ RL(4);
+ RL(5);
+#ifdef CYCLE_ACCURATE
+ tx = regs.r[3];
+#endif
+
while (regs.r[3] != 0)
{
long long tmp;
@@ -1124,6 +1627,22 @@ decode_opcode ()
set_flags (FLAGBIT_O|FLAGBIT_S, ma | FLAGBIT_O);
else
set_flags (FLAGBIT_O|FLAGBIT_S, ma);
+#ifdef CYCLE_ACCURATE
+ switch (opcode.size)
+ {
+ case RX_Long:
+ cycles (6 + 4 * tx);
+ break;
+ case RX_Word:
+ cycles (6 + 5 * (tx / 2) + 4 * (tx % 2));
+ break;
+ case RX_Byte:
+ cycles (6 + 7 * (tx / 4) + 4 * (tx % 4));
+ break;
+ default:
+ abort ();
+ }
+#endif
break;
case RXO_rolc:
@@ -1133,6 +1652,7 @@ decode_opcode ()
v |= carry;
set_szc (v, 4, ma);
PD (v);
+ E1;
break;
case RXO_rorc:
@@ -1142,6 +1662,7 @@ decode_opcode ()
uma |= (carry ? 0x80000000UL : 0);
set_szc (uma, 4, mb);
PD (uma);
+ E1;
break;
case RXO_rotl:
@@ -1154,6 +1675,7 @@ decode_opcode ()
}
set_szc (uma, 4, mb);
PD (uma);
+ E1;
break;
case RXO_rotr:
@@ -1166,6 +1688,7 @@ decode_opcode ()
}
set_szc (uma, 4, mb);
PD (uma);
+ E1;
break;
case RXO_round:
@@ -1176,10 +1699,30 @@ decode_opcode ()
PD (mb);
tprintf("(int) %g = %d\n", int2float(ma), mb);
set_sz (mb, 4);
+ E (2);
break;
case RXO_rts:
- regs.r_pc = poppc ();
+ {
+#ifdef CYCLE_ACCURATE
+ int cyc = 5;
+#endif
+ regs.r_pc = poppc ();
+#ifdef CYCLE_ACCURATE
+ /* Note: specs say 5, chip says 3. */
+ if (regs.fast_return && regs.link_register == regs.r_pc)
+ {
+#ifdef CYCLE_STATS
+ fast_returns ++;
+#endif
+ tprintf("fast return bonus\n");
+ cyc -= 2;
+ }
+ cycles (cyc);
+ regs.fast_return = 0;
+ branch_alignment_penalty = 1;
+#endif
+ }
break;
case RXO_rtsd:
@@ -1190,12 +1733,39 @@ decode_opcode ()
put_reg (0, get_reg (0) + GS() - (opcode.op[0].reg-opcode.op[2].reg+1)*4);
if (opcode.op[2].reg == 0)
EXCEPTION (EX_UNDEFINED);
+#ifdef CYCLE_ACCURATE
+ tx = opcode.op[0].reg - opcode.op[2].reg + 1;
+#endif
for (i = opcode.op[2].reg; i <= opcode.op[0].reg; i ++)
- put_reg (i, pop ());
+ {
+ RLD (i);
+ put_reg (i, pop ());
+ }
}
else
- put_reg (0, get_reg (0) + GS());
- put_reg (pc, poppc ());
+ {
+#ifdef CYCLE_ACCURATE
+ tx = 0;
+#endif
+ put_reg (0, get_reg (0) + GS());
+ }
+ put_reg (pc, poppc());
+#ifdef CYCLE_ACCURATE
+ if (regs.fast_return && regs.link_register == regs.r_pc)
+ {
+ tprintf("fast return bonus\n");
+#ifdef CYCLE_STATS
+ fast_returns ++;
+#endif
+ cycles (tx < 3 ? 3 : tx + 1);
+ }
+ else
+ {
+ cycles (tx < 5 ? 5 : tx + 1);
+ }
+ regs.fast_return = 0;
+ branch_alignment_penalty = 1;
+#endif
break;
case RXO_sat:
@@ -1203,6 +1773,7 @@ decode_opcode ()
PD (0x7fffffffUL);
else if (FLAG_O && ! FLAG_S)
PD (0x80000000UL);
+ E1;
break;
case RXO_sbb:
@@ -1214,9 +1785,13 @@ decode_opcode ()
PD (1);
else
PD (0);
+ E1;
break;
case RXO_scmpu:
+#ifdef CYCLE_ACCURATE
+ tx = regs.r[3];
+#endif
while (regs.r[3] != 0)
{
uma = mem_get_qi (regs.r[1] ++);
@@ -1229,6 +1804,7 @@ decode_opcode ()
set_zc (1, 1);
else
set_zc (0, ((int)uma - (int)umb) >= 0);
+ cycles (2 + 4 * (tx / 4) + 4 * (tx % 4));
break;
case RXO_setpsw:
@@ -1238,24 +1814,40 @@ decode_opcode ()
|| v == FLAGBIT_U))
break;
regs.r_psw |= v;
+ cycles (1);
break;
case RXO_smovb:
+ RL (3);
+#ifdef CYCLE_ACCURATE
+ tx = regs.r[3];
+#endif
while (regs.r[3])
{
uma = mem_get_qi (regs.r[2] --);
mem_put_qi (regs.r[1]--, uma);
regs.r[3] --;
}
+#ifdef CYCLE_ACCURATE
+ if (tx > 3)
+ cycles (6 + 3 * (tx / 4) + 3 * (tx % 4));
+ else
+ cycles (2 + 3 * (tx % 4));
+#endif
break;
case RXO_smovf:
+ RL (3);
+#ifdef CYCLE_ACCURATE
+ tx = regs.r[3];
+#endif
while (regs.r[3])
{
uma = mem_get_qi (regs.r[2] ++);
mem_put_qi (regs.r[1]++, uma);
regs.r[3] --;
}
+ cycles (2 + 3 * (int)(tx / 4) + 3 * (tx % 4));
break;
case RXO_smovu:
@@ -1271,17 +1863,24 @@ decode_opcode ()
case RXO_shar: /* d = ma >> mb */
SHIFT_OP (sll, int, mb, >>=, 1);
+ E (1);
break;
case RXO_shll: /* d = ma << mb */
SHIFT_OP (ll, int, mb, <<=, 0x80000000UL);
+ E (1);
break;
case RXO_shlr: /* d = ma >> mb */
SHIFT_OP (ll, unsigned int, mb, >>=, 1);
+ E (1);
break;
case RXO_sstr:
+ RL (3);
+#ifdef CYCLE_ACCURATE
+ tx = regs.r[3];
+#endif
switch (opcode.size)
{
case RX_Long:
@@ -1291,6 +1890,7 @@ decode_opcode ()
regs.r[1] += 4;
regs.r[3] --;
}
+ cycles (2 + tx);
break;
case RX_Word:
while (regs.r[3] != 0)
@@ -1299,6 +1899,7 @@ decode_opcode ()
regs.r[1] += 2;
regs.r[3] --;
}
+ cycles (2 + (int)(tx / 2) + tx % 2);
break;
case RX_Byte:
while (regs.r[3] != 0)
@@ -1307,6 +1908,7 @@ decode_opcode ()
regs.r[1] ++;
regs.r[3] --;
}
+ cycles (2 + (int)(tx / 4) + tx % 4);
break;
default:
abort ();
@@ -1316,6 +1918,7 @@ decode_opcode ()
case RXO_stcc:
if (GS2())
PD (GS ());
+ E1;
break;
case RXO_stop:
@@ -1328,8 +1931,15 @@ decode_opcode ()
break;
case RXO_suntil:
+ RL(3);
+#ifdef CYCLE_ACCURATE
+ tx = regs.r[3];
+#endif
if (regs.r[3] == 0)
- break;
+ {
+ cycles (3);
+ break;
+ }
switch (opcode.size)
{
case RX_Long:
@@ -1342,6 +1952,7 @@ decode_opcode ()
if (umb == uma)
break;
}
+ cycles (3 + 3 * tx);
break;
case RX_Word:
uma = get_reg (2) & 0xffff;
@@ -1353,6 +1964,7 @@ decode_opcode ()
if (umb == uma)
break;
}
+ cycles (3 + 3 * (tx / 2) + 3 * (tx % 2));
break;
case RX_Byte:
uma = get_reg (2) & 0xff;
@@ -1364,6 +1976,7 @@ decode_opcode ()
if (umb == uma)
break;
}
+ cycles (3 + 3 * (tx / 4) + 3 * (tx % 4));
break;
default:
abort();
@@ -1375,6 +1988,10 @@ decode_opcode ()
break;
case RXO_swhile:
+ RL(3);
+#ifdef CYCLE_ACCURATE
+ tx = regs.r[3];
+#endif
if (regs.r[3] == 0)
break;
switch (opcode.size)
@@ -1389,6 +2006,7 @@ decode_opcode ()
if (umb != uma)
break;
}
+ cycles (3 + 3 * tx);
break;
case RX_Word:
uma = get_reg (2) & 0xffff;
@@ -1400,6 +2018,7 @@ decode_opcode ()
if (umb != uma)
break;
}
+ cycles (3 + 3 * (tx / 2) + 3 * (tx % 2));
break;
case RX_Byte:
uma = get_reg (2) & 0xff;
@@ -1411,6 +2030,7 @@ decode_opcode ()
if (umb != uma)
break;
}
+ cycles (3 + 3 * (tx / 4) + 3 * (tx % 4));
break;
default:
abort();
@@ -1427,9 +2047,18 @@ decode_opcode ()
return RX_MAKE_STOPPED(0);
case RXO_xchg:
+#ifdef CYCLE_ACCURATE
+ regs.m2m = 0;
+#endif
v = GS (); /* This is the memory operand, if any. */
PS (GD ()); /* and this may change the address register. */
PD (v);
+ E2;
+#ifdef CYCLE_ACCURATE
+ /* all M cycles happen during xchg's cycles. */
+ memory_dest = 0;
+ memory_source = 0;
+#endif
break;
case RXO_xor:
@@ -1440,5 +2069,122 @@ decode_opcode ()
EXCEPTION (EX_UNDEFINED);
}
+#ifdef CYCLE_ACCURATE
+ regs.m2m = 0;
+ if (memory_source)
+ regs.m2m |= M2M_SRC;
+ if (memory_dest)
+ regs.m2m |= M2M_DST;
+
+ regs.rt = new_rt;
+ new_rt = -1;
+#endif
+
+#ifdef CYCLE_STATS
+ if (prev_cycle_count == regs.cycle_count)
+ {
+ printf("Cycle count not updated! id %s\n", id_names[opcode.id]);
+ abort ();
+ }
+#endif
+
+#ifdef CYCLE_STATS
+ if (running_benchmark)
+ {
+ int omap = op_lookup (opcode.op[0].type, opcode.op[1].type, opcode.op[2].type);
+
+
+ cycles_per_id[opcode.id][omap] += regs.cycle_count - prev_cycle_count;
+ times_per_id[opcode.id][omap] ++;
+
+ times_per_pair[prev_opcode_id][po0][opcode.id][omap] ++;
+
+ prev_opcode_id = opcode.id;
+ po0 = omap;
+ }
+#endif
+
return RX_MAKE_STEPPED ();
}
+
+#ifdef CYCLE_STATS
+void
+reset_pipeline_stats (void)
+{
+ memset (cycles_per_id, 0, sizeof(cycles_per_id));
+ memset (times_per_id, 0, sizeof(times_per_id));
+ memory_stalls = 0;
+ register_stalls = 0;
+ branch_stalls = 0;
+ branch_alignment_stalls = 0;
+ fast_returns = 0;
+ memset (times_per_pair, 0, sizeof(times_per_pair));
+ running_benchmark = 1;
+
+ benchmark_start_cycle = regs.cycle_count;
+}
+
+void
+halt_pipeline_stats (void)
+{
+ running_benchmark = 0;
+ benchmark_end_cycle = regs.cycle_count;
+}
+#endif
+
+void
+pipeline_stats (void)
+{
+#ifdef CYCLE_STATS
+ int i, o1;
+ int p, p1;
+#endif
+
+#ifdef CYCLE_ACCURATE
+ if (verbose == 1)
+ {
+ printf ("cycles: %llu\n", regs.cycle_count);
+ return;
+ }
+
+ printf ("cycles: %13s\n", comma (regs.cycle_count));
+#endif
+
+#ifdef CYCLE_STATS
+ if (benchmark_start_cycle)
+ printf ("bmark: %13s\n", comma (benchmark_end_cycle - benchmark_start_cycle));
+
+ printf("\n");
+ for (i = 0; i < N_RXO; i++)
+ for (o1 = 0; o1 < N_MAP; o1 ++)
+ if (times_per_id[i][o1])
+ printf("%13s %13s %7.2f %s %s\n",
+ comma (cycles_per_id[i][o1]),
+ comma (times_per_id[i][o1]),
+ (double)cycles_per_id[i][o1] / times_per_id[i][o1],
+ op_cache_string(o1),
+ id_names[i]+4);
+
+ printf("\n");
+ for (p = 0; p < N_RXO; p ++)
+ for (p1 = 0; p1 < N_MAP; p1 ++)
+ for (i = 0; i < N_RXO; i ++)
+ for (o1 = 0; o1 < N_MAP; o1 ++)
+ if (times_per_pair[p][p1][i][o1])
+ {
+ printf("%13s %s %-9s -> %s %s\n",
+ comma (times_per_pair[p][p1][i][o1]),
+ op_cache_string(p1),
+ id_names[p]+4,
+ op_cache_string(o1),
+ id_names[i]+4);
+ }
+
+ printf("\n");
+ printf("%13s memory stalls\n", comma (memory_stalls));
+ printf("%13s register stalls\n", comma (register_stalls));
+ printf("%13s branches taken (non-return)\n", comma (branch_stalls));
+ printf("%13s branch alignment stalls\n", comma (branch_alignment_stalls));
+ printf("%13s fast returns\n", comma (fast_returns));
+#endif
+}
Index: trace.c
===================================================================
RCS file: /cvs/src/src/sim/rx/trace.c,v
retrieving revision 1.2
diff -p -U3 -r1.2 trace.c
--- trace.c 1 Jan 2010 10:03:33 -0000 1.2
+++ trace.c 28 Jul 2010 02:00:19 -0000
@@ -19,6 +19,7 @@ You should have received a copy of the G
along with this program. If not, see <http://www.gnu.org/licenses/>. */
+#include "config.h"
#include <stdio.h>
#include <stdarg.h>
#include <string.h>
@@ -321,7 +322,13 @@ sim_disasm_one (void)
}
opbuf[0] = 0;
- printf ("\033[33m%06x: ", mypc);
+#ifdef CYCLE_ACCURATE
+ printf ("\033[33m %04u %06x: ", (int)(regs.cycle_count % 10000), mypc);
+#else
+ printf ("\033[33m %06x: ", mypc);
+
+#endif
+
max = print_insn_rx (mypc, & info);
for (i = 0; i < max; i++)