This is the mail archive of the
cgen@sources.redhat.com
mailing list for the CGEN project.
exposed pipeline patch (long!)
- From: Ben Elliston <bje at redhat dot com>
- To: cgen at sources dot redhat dot com
- Date: 09 Jan 2003 14:27:08 +1100
- Subject: exposed pipeline patch (long!)
I'm posting this patch on behalf of Graydon Hoare, who write this
exposed pipeline support last year. It's a more generalised form of
the (delay ..) rtx and has been used for a couple of ports already.
Rather than just commit it, I thought I would post it for review.
Okay to commit?
Ben
2001-06-05 graydon hoare <graydon@redhat.com>
* utils.scm (foldl): Define.
(foldr): Define.
(union): Define.
(intersection): Simplify.
* sid.scm : Set APPLICATION to SID-SIMULATOR.
(-op-gen-delayed-set-maybe-trace): Define.
(<operand> 'gen-set-{quiet,trace}): Delegate to
op-gen-delayed-set-quiet etc. Note: this is still a little tangled
up and needs cleaning.
(-with-parallel?): Hardwire with-parallel to #t.
(<operand> 'cxmake-get): Replace with lookahead-aware code
* sid-decode.scm: Remove per-insn writeback fns.
(-gen-idesc-decls): Redefine sem_fn type.
* sid-cpu.scm (gen-write-stack-structure): Replace parexec stuff
with write stack stuff.
(cgen-write.cxx): Replace per-insn writebacks with single write
stack writeback. Add write stack reset function.
(-gen-scache-semantic-fn insn): Replace parexec stuff with write
stack stuff.
* rtl-c.scm (xop): Clone operand into delayed operand if #:delayed
estate attribute set.
(delay): Set #:delayed attribute to calculated delay, update
maximum delay of cpu, check (delay ...) usage.
* operand.scm (<operand>): Add delayed slot to <operand>.
* mach.scm (<cpu>): Add max-delay slot to <cpu>.
* dev.scm (load-sid): Set APPLICATION to SID-SIMULATOR.
* doc/rtl.texi (Expressions): Add section on (delay ...).
Index: utils.scm
===================================================================
RCS file: /cvs/src/src/cgen/utils.scm,v
retrieving revision 1.7
diff -u -p -r1.7 utils.scm
--- utils.scm 7 Jan 2002 08:23:59 -0000 1.7
+++ utils.scm 9 Jan 2003 03:22:12 -0000
@@ -78,6 +78,10 @@
(define (spaces n) (make-string n #\space))
+; simple list-generators
+(define (seq p q) (if (> p q) '() (cons p (seq (+ p 1) q))))
+(define (fill x n) (if (> n 0) (cons x (fill x (- n 1))) '()))
+
; Write N spaces to PORT, or the current output port if elided.
(define (write-spaces n . port)
@@ -471,6 +475,17 @@
(reverse! (list-drop n (reverse l)))
)
+;; left fold
+(define (foldl kons accum lis)
+ (if (null? lis) accum
+ (foldl kons (kons accum (car lis)) (cdr lis))))
+
+;; right fold
+(define (foldr kons knil lis)
+ (if (null? lis) knil
+ (kons (car lis) (foldr kons knil (cdr lis)))))
+
+
; APL's +\ operation on a vector of numbers.
(define (plus-scan l)
@@ -540,12 +555,13 @@
; Return intersection of two lists.
-(define (intersection l1 l2)
- (cond ((null? l1) l1)
- ((null? l2) l2)
- ((memq (car l1) l2) (cons (car l1) (intersection (cdr l1) l2)))
- (else (intersection (cdr l1) l2)))
-)
+(define (intersection a b)
+ (foldl (lambda (l e) (if (memq e a) (cons e l) l)) '() b))
+
+; Return union of two lists.
+
+(define (union a b)
+ (foldl (lambda (l e) (if (memq e l) l (cons e l))) a b))
; Return a count of the number of elements of list L1 that are in list L2.
; Uses memq.
Index: sid.scm
===================================================================
RCS file: /cvs/src/src/cgen/sid.scm,v
retrieving revision 1.7
diff -u -p -r1.7 sid.scm
--- sid.scm 7 Jan 2002 08:23:59 -0000 1.7
+++ sid.scm 9 Jan 2003 03:22:18 -0000
@@ -10,7 +10,7 @@
; [It still does but that's to be fixed.]
; Specify which application.
-(set! APPLICATION 'SIMULATOR)
+(set! APPLICATION 'SID-SIMULATOR)
; Misc. state info.
@@ -118,7 +118,7 @@
; While processing operand reading (or writing), parallel execution support
; needs to be turned off, so it is up to the appropriate cgen-foo.c proc to
; set-with-parallel?! appropriately.
-(define -with-parallel? #f)
+(define -with-parallel? #t)
(define (with-parallel?) -with-parallel?)
(define (set-with-parallel?! flag) (set! -with-parallel? flag))
@@ -924,43 +924,6 @@
(rtl-c++ INT yes? nil #:rtl-cover-fns? #t)))
)
-; For parallel write post-processing, we don't want to defer setting the pc.
-; ??? Not sure anymore.
-;(method-make!
-; <pc> 'gen-set-quiet
-; (lambda (self estate mode index selector newval)
-; (-op-gen-set-quiet self estate mode index selector newval)))
-;(method-make!
-; <pc> 'gen-set-trace
-; (lambda (self estate mode index selector newval)
-; (-op-gen-set-trace self estate mode index selector newval)))
-
-; Name of C macro to access parallel execution operand support.
-
-(define -par-operand-macro "OPRND")
-
-; Return C code to fetch an operand's value and save it away for the
-; semantic handler. This is used to handle parallel execution of several
-; instructions where all inputs of all insns are read before any outputs are
-; written.
-; For operands, the word `read' is only used in this context.
-
-(define (op:read op sfmt)
- (let ((estate (estate-make-for-normal-rtl-c++ nil nil)))
- (send op 'gen-read estate sfmt -par-operand-macro))
-)
-
-; Return C code to write an operand's value.
-; This is used to handle parallel execution of several instructions where all
-; outputs are written to temporary spots first, and then a final
-; post-processing pass is run to update cpu state.
-; For operands, the word `write' is only used in this context.
-
-(define (op:write op sfmt)
- (let ((estate (estate-make-for-normal-rtl-c++ nil nil)))
- (send op 'gen-write estate sfmt -par-operand-macro))
-)
-
; Default gen-read method.
; This is used to help support targets with parallel insns.
; Either this or gen-write (but not both) is used.
@@ -1010,36 +973,46 @@
(method-make!
<operand> 'cxmake-get
(lambda (self estate mode index selector)
- (let ((mode (if (mode:eq? 'DFLT mode)
- (send self 'get-mode)
- mode))
- (index (if index index (op:index self)))
- (selector (if selector selector (op:selector self))))
- ; If the object is marked with the RAW attribute, access the hardware
- ; object directly.
+ (let* ((mode (if (mode:eq? 'DFLT mode)
+ (send self 'get-mode)
+ mode))
+ (hw (op:type self))
+ (index (if index index (op:index self)))
+ (selector (if selector selector (op:selector self)))
+ (delayval (op:delay self))
+ (md (mode:c-type mode))
+ (name (if
+ (eq? (obj:name hw) 'h-memory)
+ (string-append md "_memory")
+ (gen-c-symbol (obj:name hw))))
+ (getter (op:getter self))
+ (def-val (cond ((obj-has-attr? self 'RAW)
+ (send hw 'cxmake-get-raw estate mode index selector))
+ (getter
+ (let ((args (car getter))
+ (expr (cadr getter)))
+ (rtl-c-expr mode expr
+ (if (= (length args) 0) nil
+ (list (list (car args) 'UINT index)))
+ #:rtl-cover-fns? #t
+ #:output-language (estate-output-language estate))))
+ (else
+ (send hw 'cxmake-get estate mode index selector)))))
+
(logit 4 "<operand> cxmake-get self=" (obj:name self) " mode=" (obj:name mode)
" index=" (obj:name index) " selector=" selector "\n")
- (cond ((obj-has-attr? self 'RAW)
- (send (op:type self) 'cxmake-get-raw estate mode index selector))
- ; If the instruction could be parallely executed with others and
- ; we're doing read pre-processing, the operand has already been
- ; fetched, we just have to grab the cached value.
- ((with-parallel-read?)
- (cx:make-with-atlist mode
- (string-append -par-operand-macro
- " (" (gen-sym self) ")")
- nil)) ; FIXME: want CACHED attr if present
- ((op:getter self)
- (let ((args (car (op:getter self)))
- (expr (cadr (op:getter self))))
- (rtl-c-expr mode expr
- (if (= (length args) 0)
- nil
- (list (list (car args) 'UINT index)))
- #:rtl-cover-fns? #t
- #:output-language (estate-output-language estate))))
- (else
- (send (op:type self) 'cxmake-get estate mode index selector)))))
+
+ (if delayval
+ (if (derived-operand? self)
+ (error "delayed derived operands currently unsupported: " self)
+ (let ((idx (if index (string-append ", " (-gen-hw-index index estate)) "")))
+ (cx:make mode (string-append "lookahead ("
+ (number->string delayval)
+ ", tick, "
+ "buf." name "_writes, "
+ (cx:c def-val)
+ idx ")"))))
+ def-val)))
)
@@ -1049,16 +1022,9 @@
(send (op:type op) 'gen-set-quiet estate mode index selector newval)
)
-(define (-op-gen-set-quiet-parallel op estate mode index selector newval)
- (string-append
- (if (op-save-index? op)
- (string-append " " -par-operand-macro " (" (-op-index-name op) ")"
- " = " (-gen-hw-index index estate) ";\n")
- "")
- " "
- -par-operand-macro " (" (gen-sym op) ")"
- " = " (cx:c newval) ";\n")
-)
+(define (-op-gen-delayed-set-quiet op estate mode index selector newval)
+ (-op-gen-delayed-set-maybe-trace op estate mode index selector newval #f))
+
(define (-op-gen-set-trace op estate mode index selector newval)
(string-append
@@ -1079,12 +1045,7 @@
;else
(send (op:type op) 'gen-set-quiet estate mode index selector
(cx:make-with-atlist mode "opval" (cx:atlist newval))))
- (if (and (with-profile?)
- (op:cond? op))
- (string-append " written |= (1ULL << "
- (number->string (op:num op))
- ");\n")
- "")
+
; TRACE_RESULT_<MODE> (cpu, abuf, hwnum, opnum, value);
; For each insn record array of operand numbers [or indices into
; operand instance table].
@@ -1122,21 +1083,41 @@
" }\n")
)
-(define (-op-gen-set-trace-parallel op estate mode index selector newval)
- (string-append
- " {\n"
- " " (mode:c-type mode) " opval = " (cx:c newval) ";\n"
- (if (op-save-index? op)
- (string-append " " -par-operand-macro " (" (-op-index-name op) ")"
- " = " (-gen-hw-index index estate) ";\n")
- "")
- " " -par-operand-macro " (" (gen-sym op) ")"
- " = opval;\n"
- (if (op:cond? op)
- (string-append " written |= (1ULL << "
- (number->string (op:num op))
- ");\n")
- "")
+(define (-op-gen-delayed-set-trace op estate mode index selector newval)
+ (-op-gen-delayed-set-maybe-trace op estate mode index selector newval #t))
+
+(define (-op-gen-delayed-set-maybe-trace op estate mode index selector newval do-trace?)
+ (let* ((pad " ")
+ (hw (op:type op))
+ (delayval (op:delay op))
+ (md (mode:c-type mode))
+ (name (if
+ (eq? (obj:name hw) 'h-memory)
+ (string-append md "_memory")
+ (gen-c-symbol (obj:name hw))))
+ (val (cx:c newval))
+ (idx (if index (-gen-hw-index index estate) ""))
+ (idx-args (if (equal? idx "") "" (string-append ", " idx)))
+ )
+
+ (string-append
+ " {\n"
+
+ (if delayval
+
+ ;; delayed write: push it to the appropriate buffer
+ (string-append
+ pad md " opval = " val ";\n"
+ pad "buf." name "_writes [(tick + " (number->string delayval)
+ ") % @prefix@::pipe_sz].push (@prefix@::write<" md ">(pc, opval" idx-args "));\n")
+
+ ;; else, uh, we should never have been called!
+ (error "-op-gen-delayed-set-maybe-trace called on non-delayed operand"))
+
+
+ (if do-trace?
+
+ (string-append
; TRACE_RESULT_<MODE> (cpu, abuf, hwnum, opnum, value);
; For each insn record array of operand numbers [or indices into
; operand instance table].
@@ -1169,8 +1150,8 @@
""))
"opval << dec << \" \";\n"
" }\n")
-)
-
+ ;; else no tracing is emitted
+ ""))))
; Return C code to set the value of an operand.
; NEWVAL is a <c-expr> object of the value to store.
@@ -1189,8 +1170,8 @@
(selector (if selector selector (op:selector self))))
(cond ((obj-has-attr? self 'RAW)
(send (op:type self) 'gen-set-quiet-raw estate mode index selector newval))
- ((with-parallel-write?)
- (-op-gen-set-quiet-parallel self estate mode index selector newval))
+ ((op:delay self)
+ (-op-gen-delayed-set-quiet self estate mode index selector newval))
(else
(-op-gen-set-quiet self estate mode index selector newval)))))
)
@@ -1212,26 +1193,12 @@
(selector (if selector selector (op:selector self))))
(cond ((obj-has-attr? self 'RAW)
(send (op:type self) 'gen-set-quiet-raw estate mode index selector newval))
- ((with-parallel-write?)
- (-op-gen-set-trace-parallel self estate mode index selector newval))
+ ((op:delay self)
+ (-op-gen-delayed-set-trace self estate mode index selector newval))
(else
(-op-gen-set-trace self estate mode index selector newval)))))
)
-; Define and undefine C macros to tuck away details of instruction format used
-; in the parallel execution functions. See gen-define-field-macro for a
-; similar thing done for extraction/semantic functions.
-
-(define (gen-define-parallel-operand-macro sfmt)
- (string-append "#define " -par-operand-macro "(f) "
- "par_exec->operands."
- (gen-sym sfmt)
- ".f\n")
-)
-
-(define (gen-undef-parallel-operand-macro sfmt)
- (string-append "#undef " -par-operand-macro "\n")
-)
; Operand profiling and parallel execution support.
Index: sid-decode.scm
===================================================================
RCS file: /cvs/src/src/cgen/sid-decode.scm,v
retrieving revision 1.8
diff -u -p -r1.8 sid-decode.scm
--- sid-decode.scm 7 Feb 2002 18:46:19 -0000 1.8
+++ sid-decode.scm 9 Jan 2003 03:22:18 -0000
@@ -47,10 +47,7 @@ bool @prefix@_idesc::idesc_table_initial
(if pbb?
"0, "
(string-append (-gen-sem-fn-name insn) ", "))
- "")
- (if (with-parallel?)
- (string-append (-gen-write-fn-name sfmt) ", ")
- "")
+ "")
"\"" (string-upcase name) "\", "
(gen-cpu-insn-enum (current-cpu) insn)
", "
@@ -131,25 +128,6 @@ bool @prefix@_idesc::idesc_table_initial
)
-;; and the same for writeback functions
-
-(define (-gen-write-fn-name sfmt)
- (string-append "@prefix@_write_" (gen-sym sfmt))
-)
-
-
-(define (-gen-write-fn-decls)
- (string-write
- "// Decls of each writeback fn.\n\n"
- "using @cpu@::@prefix@_write_fn;\n"
- (string-list-map (lambda (sfmt)
- (string-list "extern @prefix@_write_fn "
- (-gen-write-fn-name sfmt)
- ";\n"))
- (current-sfmt-list))
- "\n"
- )
-)
; idesc, argbuf, and scache types
@@ -164,14 +142,9 @@ struct @cpu@_cpu;
struct @prefix@_scache;
"
(if (with-parallel?)
- "struct @prefix@_parexec;\n" "")
- (if (with-parallel?)
- "typedef void (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec);"
+ "typedef void (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, int tick, @prefix@::write_stacks &buf);"
"typedef sem_status (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem);")
"\n"
- (if (with-parallel?)
- "typedef sem_status (@prefix@_write_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec);"
- "")
"\n"
"
// Instruction descriptor.
@@ -192,12 +165,6 @@ struct @prefix@_idesc {
@prefix@_sem_fn* execute;\n\n"
"")
- (if (with-parallel?)
- "\
- // scache write executor for this insn
- @prefix@_write_fn* writeback;\n\n"
- "")
-
"\
const char* insn_name;
enum @prefix@_insn_type sem_index;
@@ -300,15 +267,6 @@ struct @prefix@_scache {
// argument buffer
@prefix@_sem_fields fields;
-" (if (or (with-profile?) (with-parallel-write?))
- (string-append "
- // writeback flags
- // Only used if profiling or parallel execution support enabled during
- // file generation.
- unsigned long long written;
-")
- "") "
-
// decode given instruction
void decode (@cpu@_cpu* current_cpu, PCADDR pc, @prefix@_insn_word base_insn, @prefix@_insn_word entire_insn);
};
@@ -718,6 +676,11 @@ void
#ifndef @PREFIX@_DECODE_H
#define @PREFIX@_DECODE_H
+namespace @prefix@ {
+// forward declaration of struct in -defs.h
+struct write_stacks;
+}
+
namespace @cpu@ {
using namespace cgen;
@@ -739,10 +702,6 @@ typedef UINT @prefix@_insn_word;
; There's no pressing need for it though.
(if (with-scache?)
-gen-sem-fn-decls
- "")
-
- (if (with-parallel?)
- -gen-write-fn-decls
"")
"\
Index: sid-cpu.scm
===================================================================
RCS file: /cvs/src/src/cgen/sid-cpu.scm,v
retrieving revision 1.7
diff -u -p -r1.7 sid-cpu.scm
--- sid-cpu.scm 7 Feb 2002 18:46:19 -0000 1.7
+++ sid-cpu.scm 9 Jan 2003 03:22:23 -0000
@@ -199,6 +199,34 @@ namespace @arch@ {
(-gen-hardware-struct #f (find hw-need-storage? (current-hw-list))))
)
+(define (-gen-hw-stream-and-destream-fns)
+ (let* ((sa string-append)
+ (regs (find hw-need-storage? (current-hw-list)))
+ (reg-dim (lambda (r)
+ (let ((dims (-hw-vector-dims r)))
+ (if (equal? 0 (length dims))
+ "0"
+ (number->string (car dims))))))
+ (stream-reg (lambda (r)
+ (let ((rname (sa "hardware." (gen-c-symbol (obj:name r)))))
+ (if (hw-scalar? r)
+ (sa " ost << " rname " << ' ';\n")
+ (sa " for (int i = 0; i < " (reg-dim r)
+ "; i++)\n ost << " rname "[i] << ' ';\n")))))
+ (destream-reg (lambda (r)
+ (let ((rname (sa "hardware." (gen-c-symbol (obj:name r)))))
+ (if (hw-scalar? r)
+ (sa " ist >> " rname ";\n")
+ (sa " for (int i = 0; i < " (reg-dim r)
+ "; i++)\n ist >> " rname "[i];\n"))))))
+ (sa
+ " void stream_cgen_hardware (std::ostream &ost) const \n {\n"
+ (string-map stream-reg regs)
+ " }\n"
+ " void destream_cgen_hardware (std::istream &ist) \n {\n"
+ (string-map destream-reg regs)
+ " }\n")))
+
; Generate <cpu>-cpu.h
(define (cgen-cpu.h)
@@ -222,6 +250,8 @@ public:
-gen-hardware-types
+ -gen-hw-stream-and-destream-fns
+
" // C++ register access function templates\n"
"#define current_cpu this\n\n"
(lambda ()
@@ -295,68 +325,161 @@ typedef struct {
)
)
-; Utility of gen-parallel-exec-type to generate the definition of one
-; structure in PAREXEC.
-; SFMT is an <sformat> object.
-(define (gen-parallel-exec-elm sfmt)
- (string-append
- " struct { /* " (obj:comment sfmt) " */\n"
- (let ((sem-ops
- ((if (with-parallel-write?) sfmt-out-ops sfmt-in-ops) sfmt)))
- (if (null? sem-ops)
- " int empty;\n"
- (string-map
- (lambda (op)
- (logit 2 "Processing operand " (obj:name op) " of format "
- (obj:name sfmt) " ...\n")
- (if (with-parallel-write?)
- (let ((index-type (and (op-save-index? op)
- (gen-index-type op sfmt))))
- (string-append " " (gen-type op)
- " " (gen-sym op) ";\n"
- (if index-type
- (string-append " " index-type
- " " (gen-sym op) "_idx;\n")
- "")))
- (string-append " "
- (gen-type op)
- " "
- (gen-sym op)
- ";\n")))
- sem-ops)))
- " } " (gen-sym sfmt) ";\n"
- )
-)
+
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;; begin stack-based write schedule
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define useful-mode-names '(BI QI HI SI DI UQI UHI USI UDI SF DF))
+
+;(define (-calculated-memory-write-buffer-size)
+; (let* ((is-mem? (lambda (op) (eq? (hw-sem-name (op:type op)) 'h-memory)))
+; (count-mem-writes
+; (lambda (sfmt) (length (find is-mem? (sfmt-out-ops sfmt))))))
+; (apply max (append '(0) (map count-mem-writes (current-sfmt-list))))))
+
+
+;; note: this doesn't really correctly approximate the worst case. user-supplied functions
+;; might rewrite the pipeline extensively while it's running.
+;(define (-worst-case-number-of-writes-to hw-name)
+; (let* ((sfmts (current-sfmt-list))
+; (out-ops (map sfmt-out-ops sfmts))
+; (pred (lambda (op) (equal? hw-name (gen-c-symbol (obj:name (op:type op))))))
+; (filtered-ops (map (lambda (ops) (find pred ops)) out-ops)))
+; (apply max (cons 0 (map (lambda (ops) (length ops)) filtered-ops)))))
+
+(define (-hw-gen-write-stack-decl nm mode)
+ (let* (
+; for the time being, we're disabling this size-estimation stuff and just
+; requiring the user to supply a parameter WRITE_BUF_SZ before they include -defs.h
+; (pipe-sz (+ 1 (max-delay (cpu-max-delay (current-cpu)))))
+; (sz (* pipe-sz (-worst-case-number-of-writes-to nm))))
+
+ (mode-pad (spaces (- 4 (string-length mode))))
+ (stack-name (string-append nm "_writes")))
+ (string-append
+ " write_stack< write<" mode "> >" mode-pad "\t" stack-name "\t[pipe_sz];\n")))
+
+
+(define (-hw-gen-write-struct-decl)
+ (let* ((dims (-worst-case-index-dims))
+ (sa string-append)
+ (ns number->string)
+ (idxs (seq 0 (- dims 1)))
+ (ctor (sa "write (PCADDR _pc, MODE _val"
+ (string-map (lambda (x) (sa ", USI _idx" (ns x) "=0")) idxs)
+ ") : pc(_pc), val(_val)"
+ (string-map (lambda (x) (sa ", idx" (ns x) "(_idx" (ns x) ")")) idxs)
+ " {} \n"))
+ (idx-fields (string-map (lambda (x) (sa " USI idx" (ns x) ";\n")) idxs)))
+ (sa
+ "\n\n"
+ " template <typename MODE>\n"
+ " struct write\n"
+ " {\n"
+ " USI pc;\n"
+ " MODE val;\n"
+ idx-fields
+ " " ctor
+ " write() {}\n"
+ " };\n" )))
+
+(define (-hw-vector-dims hw) (elm-get (hw-type hw) 'dimensions))
+(define (-worst-case-index-dims)
+ (apply max
+ (append '(1) ; for memory accesses
+ (map (lambda (hw) (length (-hw-vector-dims hw)))
+ (find (lambda (hw) (not (scalar? hw))) (current-hw-list))))))
+
+(define (-gen-writestacks)
+ (let* ((hw (find register? (current-hw-list)))
+ (modes useful-mode-names)
+ (hw-pairs (map (lambda (h) (list (gen-c-symbol (obj:name h))
+ (obj:name (hw-mode h))))
+ hw))
+ (mem-pairs (map (lambda (m) (list (string-append m "_memory") m))
+ modes))
+ (all-pairs (append mem-pairs hw-pairs))
+
+ (h1 "\n\n// write stacks used in parallel execution\n\n struct write_stacks\n {\n // types of stacks\n\n")
+ (wb (string-append
+ "\n\n // unified writeback function (defined in @prefix@-write.cc)"
+ "\n void writeback (int tick, @cpu@::@cpu@_cpu* current_cpu);"
+ "\n // unified write-stack clearing function (defined in @prefix@-write.cc)"
+ "\n void reset ();"))
+ (zz "\n\n }; // end struct @prefix@::write_stacks \n\n")
+ (st (string-append
+ " std::ostream &operator<< (std::ostream &ost, const @prefix@::write_stacks &s);\n"
+ " std::istream &operator>> (std::istream &ist, @prefix@::write_stacks &s);\n"))
+ )
+ (string-append
+ (-hw-gen-write-struct-decl)
+ (foldl (lambda (s pair) (string-append s (apply -hw-gen-write-stack-decl pair))) h1 all-pairs)
+ wb
+ zz
+ st)))
+
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;; end stack-based write schedule
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
; Generate the definition of the structure that holds register values, etc.
-; for use during parallel execution. When instructions are executed parallelly
-; either
-; - their inputs are read before their outputs are written. Thus we have to
-; fetch the input values of several instructions before executing any of them.
-; - or their outputs are queued here first and then written out after all insns
-; have executed.
-; The fetched/queued values are stored in an array of PAREXEC structs, one
-; element per instruction.
+; for use during parallel execution.
-(define (gen-parallel-exec-type)
- (logit 2 "Generating PAREXEC type ...\n")
- (string-append
- (if (with-parallel-write?)
- "/* Queued output values of an instruction. */\n"
- "/* Fetched input values of an instruction. */\n")
- "\
+(define (gen-write-stack-structure)
+ (let (;(membuf-sz (-calculated-memory-write-buffer-size))
+ (max-delay (cpu-max-delay (current-cpu))))
+ (logit 2 "Generating write stack structure ...\n")
+ (string-append
+ " static const int max_delay = "
+ (number->string max-delay) ";\n"
+ " static const int pipe_sz = "
+ (number->string (+ 1 max-delay)) "; // max_delay + 1\n"
-struct @prefix@_parexec {
- union {\n"
- (string-map gen-parallel-exec-elm (current-sfmt-list))
- "\
- } operands;
- /* For conditionally written operands, bitmask of which ones were. */
- unsigned long long written;
-};\n\n"
- )
-)
+"
+#ifndef WRITE_BUF_SZ
+#define WRITE_BUF_SZ 1
+#endif
+
+ template <typename ELT>
+ struct write_stack
+ {
+ int t;
+ const int sz;
+ ELT buf[WRITE_BUF_SZ];
+
+ write_stack () : t(-1), sz(WRITE_BUF_SZ) {}
+ inline bool empty () { return (t == -1); }
+ inline void clear () { t = -1; }
+ inline void pop () { assert (t > -1); t--;}
+ inline void push (const ELT &e) { assert (t+1 < sz); buf [++t] = e;}
+ inline ELT &top () { return buf [t>0 ? ( t<sz ? t : sz-1) : 0];}
+ };
+
+ // look ahead for latest write with index = idx, where time of write is
+ // <= dist steps from base (present) in write_stack array st.
+ // returning def if no scheduled write is found.
+
+ template <typename STKS, typename VAL>
+ inline VAL lookahead (int dist, int base, STKS &st, VAL def, int idx=0)
+ {
+ for (; dist > 0; --dist)
+ {
+ write_stack <VAL> &v = st [(base + dist) % pipe_sz];
+ for (int i = v.t; i > 0; --i)
+ if (v.buf [i].idx0 == idx) return v.buf [i];
+ }
+ return def;
+ }
+
+"
+
+ (-gen-writestacks)
+ )))
; Generate the TRACE_RECORD struct definition.
@@ -375,16 +498,26 @@ typedef struct @prefix@_trace_record {
; Generate <cpu>-defs.h
+(define semantics-processed? #f)
+
(define (cgen-defs.h)
(logit 1 "Generating " (gen-cpu-name) " defs.h ...\n")
(assert-keep-one)
-
+
; Turn parallel execution support on if cpu needs it.
(set-with-parallel?! (state-parallel-exec?))
; Initialize rtl->c generation.
(rtl-c-config! #:rtl-cover-fns? #t)
+ (sim-analyze-insns!)
+
+ ; ensure semantc analysis has happened, in time
+ ; for the pipeline size to be calculated
+ (if (and (with-parallel?)
+ (not semantics-processed?))
+ (error "defs.h must be generated after sem.cxx for parallel-execution type CPUs"))
+
(string-write
(gen-copyright "CPU family header for @cpu@ / @prefix@."
copyright-red-hat package-red-hat-simulators)
@@ -392,15 +525,26 @@ typedef struct @prefix@_trace_record {
#ifndef DEFS_@PREFIX@_H
#define DEFS_@PREFIX@_H
+#include <stack>
+#include \"cgen-types.h\"
+
+// forward declaration\n\n
namespace @cpu@ {
+struct @cpu@_cpu;
+}
+
+namespace @prefix@ {
+
+using namespace cgen;
+
\n"
(if (with-parallel?)
- gen-parallel-exec-type
- "")
+ gen-write-stack-structure
+ "// no parallel-execution support\n")
"\
-} // end @cpu@ namespace
+} // end @prefix@ namespace
#endif /* DEFS_@PREFIX@_H */\n"
)
@@ -417,47 +561,132 @@ namespace @cpu@ {
; Return C code to fetch and save all output operands to instructions with
; <sformat> SFMT.
-(define (-gen-write-args sfmt)
- (string-map (lambda (op) (op:write op sfmt))
- (sfmt-out-ops sfmt))
-)
+; Generate <cpu>-write.cxx.
-; Utility of gen-write-fns to generate a writer function for <sformat> SFMT.
-(define (-gen-write-fn sfmt)
- (logit 2 "Processing write function for \"" (obj:name sfmt) "\" ...\n")
- (string-list
- "\nsem_status\n"
- (-gen-write-fn-name sfmt) " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec)\n"
- "{\n"
- (if (with-scache?)
- (gen-define-field-macro sfmt)
- "")
- (gen-define-parallel-operand-macro sfmt)
- " @prefix@_scache* abuf = sem;\n"
- " unsigned long long written = abuf->written;\n"
- " PCADDR pc = abuf->addr;\n"
- " PCADDR npc = 0; // dummy value for branches\n"
- " sem_status status = SEM_STATUS_NORMAL; // ditto\n"
- "\n"
- (-gen-write-args sfmt)
- "\n"
- " return status;\n"
- (gen-undef-parallel-operand-macro sfmt)
- (if (with-scache?)
- (gen-undef-field-macro sfmt)
- "")
- "}\n\n")
-)
+(define (-gen-register-writer nm mode dims)
+ (let* ((pad " ")
+ (sa string-append)
+ (idx-args (string-map (lambda (x) (sa "w.idx" (number->string x) ", "))
+ (seq 0 (- dims 1)))))
+ (sa pad "while (! " nm "_writes[tick].empty())\n"
+ pad "{\n"
+ pad " write<" mode "> &w = " nm "_writes[tick].top();\n"
+ pad " current_cpu->" nm "_set(" idx-args "w.val);\n"
+ pad " " nm "_writes[tick].pop();\n"
+ pad "}\n\n")))
+
+(define (-gen-memory-writer nm mode dims)
+ (let* ((pad " ")
+ (sa string-append)
+ (idx-args (string-map (lambda (x) (sa ", w.idx" (number->string x) ""))
+ (seq 0 (- dims 1)))))
+ (sa pad "while (! " nm "_writes[tick].empty())\n"
+ pad "{\n"
+ pad " write<" mode "> &w = " nm "_writes[tick].top();\n"
+ pad " current_cpu->SETMEM" mode " (w.pc" idx-args ", w.val);\n"
+ pad " " nm "_writes[tick].pop();\n"
+ pad "}\n\n")))
+
+
+(define (-gen-reset-fn)
+ (let* ((sa string-append)
+ (objs (append (map (lambda (h) (gen-c-symbol (obj:name h)))
+ (find register? (current-hw-list)))
+ (map (lambda (m) (sa m "_memory")) useful-mode-names)))
+ (clr (lambda (elt) (sa " clear_stacks (" elt "_writes);\n"))))
+ (sa
+ " template <typename ST> \n"
+ " static void clear_stacks (ST &st)\n"
+ " {\n"
+ " for (int i = 0; i < @prefix@::pipe_sz; i++)\n"
+ " st[i].clear();\n"
+ " }\n\n"
+ " void @prefix@::write_stacks::reset ()\n {\n"
+ (string-map clr objs)
+ " }")))
+
+(define (-gen-unified-write-fn)
+ (let* ((hw (find register? (current-hw-list)))
+ (modes useful-mode-names)
+ (hw-triples (map (lambda (h) (list (gen-c-symbol (obj:name h))
+ (obj:name (hw-mode h))
+ (length (-hw-vector-dims h))))
+ hw))
+ (mem-triples (map (lambda (m) (list (string-append m "_memory") m 1))
+ modes)))
-(define (-gen-write-fns)
- (logit 2 "Processing writer functions ...\n")
- (string-write-map (lambda (sfmt) (-gen-write-fn sfmt))
- (current-sfmt-list))
-)
+ (logit 2 "Generating writer function ...\n")
+ (string-append
+ "
+
+ void @prefix@::write_stacks::writeback (int tick, @cpu@::@cpu@_cpu* current_cpu)
+ {
+"
+ "\n // register writeback loops\n"
+ (string-map (lambda (t) (apply -gen-register-writer t)) hw-triples)
+ "\n // memory writeback loops\n"
+ (string-map (lambda (t) (apply -gen-memory-writer t)) mem-triples)
+"
+ }
+")))
-; Generate <cpu>-write.cxx.
+(define (-gen-stacks-stream-and-destream-fns)
+ (let* ((sa string-append)
+ (regs (find hw-need-storage? (current-hw-list)))
+ (reg-dim (lambda (r)
+ (let ((dims (-hw-vector-dims r)))
+ (if (equal? 0 (length dims))
+ "0"
+ (number->string (car dims))))))
+ (write-stacks
+ (map (lambda (n) (sa n "_writes"))
+ (append (map (lambda (r) (gen-c-symbol (obj:name r))) regs)
+ (map (lambda (m) (sa m "_memory")) useful-mode-names))))
+ (stream-stacks (lambda (s) (sa " stream_stacks ( s." s ", ost);\n")))
+ (destream-stacks (lambda (s) (sa " destream_stacks ( s." s ", ist);\n")))
+ (stack-boilerplate
+ (sa
+ " template <typename ST> \n"
+ " void stream_stacks (const ST &st, std::ostream &ost)\n"
+ " {\n"
+ " for (int i = 0; i < @prefix@::pipe_sz; i++)\n"
+ " {\n"
+ " ost << st[i].t << ' ';\n"
+ " for (int j = 0; j <= st[i].t; j++)\n"
+ " {\n"
+ " ost << st[i].buf[j].pc << ' ';\n"
+ " ost << st[i].buf[j].val << ' ';\n"
+ " ost << st[i].buf[j].idx0 << ' ';\n"
+ " }\n"
+ " }\n"
+ " }\n"
+ " \n"
+ " template <typename ST> \n"
+ " void destream_stacks (ST &st, std::istream &ist)\n"
+ " {\n"
+ " for (int i = 0; i < @prefix@::pipe_sz; i++)\n"
+ " {\n"
+ " ist >> st[i].t;\n"
+ " for (int j = 0; j <= st[i].t; j++)\n"
+ " {\n"
+ " ist >> st[i].buf[j].pc;\n"
+ " ist >> st[i].buf[j].val;\n"
+ " ist >> st[i].buf[j].idx0;\n"
+ " }\n"
+ " }\n"
+ " }\n"
+ " \n")))
+ (sa stack-boilerplate
+ " std::ostream & @prefix@::operator<< (std::ostream &ost, const @prefix@::write_stacks &s)\n {\n"
+ (string-map stream-stacks write-stacks)
+ "\n return ost;\n"
+ " }\n"
+ " std::istream & @prefix@::operator>> (std::istream &ist, @prefix@::write_stacks &s)\n {\n"
+ (string-map destream-stacks write-stacks)
+ "\n return ist;\n"
+ " }\n")))
(define (cgen-write.cxx)
(logit 1 "Generating " (gen-cpu-name) " write.cxx ...\n")
@@ -465,8 +694,8 @@ namespace @cpu@ {
(sim-analyze-insns!)
- ; Turn parallel execution support off.
- (set-with-parallel?! #f)
+ ; Turn parallel execution support on if needed.
+ (set-with-parallel?! (state-parallel-exec?))
; Tell the rtx->c translator we are the simulator.
(rtl-c-config! #:rtl-cover-fns? #t)
@@ -478,12 +707,18 @@ namespace @cpu@ {
"\
#include \"@cpu@.h\"
-using namespace @cpu@;
-
+#include <iostream>
"
- -gen-write-fns
+ (if (with-parallel?)
+ (string-append
+ (-gen-reset-fn)
+ (-gen-unified-write-fn)
+ (-gen-stacks-stream-and-destream-fns))
+
+ "// no write-stack functions required\n")
)
)
+
; ******************
; cgen-semantics.cxx
@@ -521,19 +756,14 @@ using namespace @cpu@;
"sem_status\n")
"@prefix@_sem_" (gen-sym insn)
(if (with-parallel?)
- " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec)\n"
+ (string-append " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, const int tick, \n\t"
+ "@prefix@::write_stacks &buf)\n")
" (@cpu@_cpu* current_cpu, @prefix@_scache* sem)\n")
"{\n"
(gen-define-field-macro (insn-sfmt insn))
- (if (with-parallel?)
- (gen-define-parallel-operand-macro (insn-sfmt insn))
- "")
" sem_status status = SEM_STATUS_NORMAL;\n"
" @prefix@_scache* abuf = sem;\n"
- ; Unconditionally written operands are not recorded here.
- (if (or (with-profile?) (with-parallel-write?))
- " unsigned long long written = 0;\n"
- "")
+
; The address of this insn, needed by extraction and semantic code.
; Note that the address recorded in the cpu state struct is not used.
; For faster engines that copy will be out of date.
@@ -542,23 +772,12 @@ using namespace @cpu@;
"\n"
(gen-semantic-code insn)
"\n"
- ; Only update what's been written if some are conditionally written.
- ; Otherwise we know they're all written so there's no point in
- ; keeping track.
- (if (or (with-profile?) (with-parallel-write?))
- (if (-any-cond-written? (insn-sfmt insn))
- " abuf->written = written;\n"
- "")
- "")
(if cti?
" current_cpu->done_cti_insn (npc, status);\n"
" current_cpu->done_insn (npc, status);\n")
(if (with-parallel?)
""
" return status;\n")
- (if (with-parallel?)
- (gen-undef-parallel-operand-macro (insn-sfmt insn))
- "")
(gen-undef-field-macro (insn-sfmt insn))
"}\n\n"
))
@@ -576,13 +795,14 @@ using namespace @cpu@;
; Each instruction is implemented in its own function.
(define (cgen-semantics.cxx)
- (logit 1 "Generating " (gen-cpu-name) " semantics.cxx ...\n")
+ (logit 1 "Generating " (gen-cpu-name) " semantics.cxx ")
(assert-keep-one)
(sim-analyze-insns!)
; Turn parallel execution support on if cpu needs it.
(set-with-parallel?! (state-parallel-exec?))
+ (logit 1 (if (state-parallel-exec?) " (parallel) ...\n" "...\n"))
; Tell the rtx->c translator we are the simulator.
(rtl-c-config! #:rtl-cover-fns? #t)
@@ -590,6 +810,8 @@ using namespace @cpu@;
; Indicate we're currently not generating a pbb engine.
(set-current-pbb-engine?! #f)
+ (set! semantics-processed? #t)
+
(string-write
(gen-copyright "Simulator instruction semantics for @prefix@."
copyright-red-hat package-red-hat-simulators)
@@ -598,6 +820,7 @@ using namespace @cpu@;
#include \"@cpu@.h\"
using namespace @cpu@; // FIXME: namespace organization still wip
+using namespace @prefix@; // FIXME: namespace organization still wip
#define GET_ATTR(name) GET_ATTR_##name ()
@@ -655,9 +878,6 @@ using namespace @cpu@; // FIXME: namespa
(if (with-scache?)
(gen-define-field-macro (insn-sfmt insn))
"")
- (if parallel?
- (gen-define-parallel-operand-macro (insn-sfmt insn))
- "")
; Unconditionally written operands are not recorded here.
(if (or (with-profile?) (with-parallel-write?))
" unsigned long long written = 0;\n"
@@ -694,9 +914,6 @@ using namespace @cpu@; // FIXME: namespa
(string-append " pbb_br_npc = npc;\n"
" pbb_br_status = br_status;\n")
"")
- (if parallel?
- (gen-undef-parallel-operand-macro (insn-sfmt insn))
- "")
(if (with-scache?)
(gen-undef-field-macro (insn-sfmt insn))
"")
@@ -950,9 +1167,6 @@ struct @prefix@_pbb_label {
" vpc = vpc + 1;\n")
"")
(gen-define-field-macro (sfrag-sfmt frag))
- (if parallel?
- (gen-define-parallel-operand-macro (sfrag-sfmt frag))
- "")
; Unconditionally written operands are not recorded here.
(if (or (with-profile?) (with-parallel-write?))
" unsigned long long written = 0;\n"
@@ -992,9 +1206,6 @@ struct @prefix@_pbb_label {
(sfrag-trailer? frag))
(string-append " pbb_br_npc = npc;\n"
" pbb_br_status = br_status;\n")
- "")
- (if parallel?
- (gen-undef-parallel-operand-macro (sfrag-sfmt frag))
"")
(gen-undef-field-macro (sfrag-sfmt frag))
" }\n"
Index: rtl-c.scm
===================================================================
RCS file: /cvs/src/src/cgen/rtl-c.scm,v
retrieving revision 1.4
diff -u -p -r1.4 rtl-c.scm
--- rtl-c.scm 8 Sep 2000 22:18:37 -0000 1.4
+++ rtl-c.scm 9 Jan 2003 03:22:25 -0000
@@ -1304,7 +1304,23 @@
"bad arg to `operand'" object-or-name)))
)
-(define-fn xop (estate options mode object) object)
+(define-fn xop (estate options mode object)
+ (let ((delayed (assoc '#:delay (estate-modifiers estate))))
+ (if (and delayed
+ (equal? APPLICATION 'SID-SIMULATOR)
+ (operand? object))
+ ;; if we're looking at an operand inside a (delay ...) rtx, then we
+ ;; are talking about a _delayed_ operand, which is a different
+ ;; beast. rather than try to work out what context we were
+ ;; constructed within, we just clone the operand instance and set
+ ;; the new one to have a delayed value. the setters and getters
+ ;; will work it out.
+ (let ((obj (object-copy object))
+ (amount (cadr delayed)))
+ (op:set-delay! obj amount)
+ obj)
+ ;; else return the normal object
+ object)))
(define-fn local (estate options mode object-or-name)
(cond ((rtx-temp? object-or-name)
@@ -1363,9 +1379,38 @@
(cx:make VOID "; /*clobber*/\n")
)
-(define-fn delay (estate options mode n rtx)
- (s-sequence (estate-with-modifiers estate '((#:delay))) VOID '() rtx) ; wip!
-)
+
+(define-fn delay (estate options mode num-node rtx)
+ (case APPLICATION
+ ((SID-SIMULATOR)
+ (let* ((n (cadddr num-node))
+ (old-delay (let ((old (assoc '#:delay (estate-modifiers estate))))
+ (if old (cadr old) 0)))
+ (new-delay (+ n old-delay)))
+ (begin
+ ;; check for proper usage
+ (if (let* ((hw (case (car rtx)
+ ((operand) (op:type (rtx-operand-obj rtx)))
+ ((xop) (op:type (rtx-xop-obj rtx)))
+ (else #f))))
+ (not (and hw (or (pc? hw) (memory? hw) (register? hw)))))
+ (context-error
+ (estate-context estate)
+ (string-append
+ "(delay ...) rtx applied to wrong type of operand '" (car rtx) "'. should be pc, register or memory")))
+ ;; signal an error if we're delayed and not in a "parallel-insns" CPU
+ (if (not (with-parallel?))
+ (context-error
+ (estate-context estate)
+ "delayed operand in a non-parallel cpu"))
+ ;; update cpu-global pipeline bound
+ (cpu-set-max-delay! (current-cpu) (max (cpu-max-delay (current-cpu)) new-delay))
+ ;; pass along new delay to embedded rtx
+ (rtx-eval-with-estate rtx mode (estate-with-modifiers estate `((#:delay ,new-delay)))))))
+
+ ;; not in sid-land
+ (else (s-sequence (estate-with-modifiers estate '((#:delay))) VOID '() rtx))))
+
; Gets expanded as a macro.
;(define-fn annul (estate yes?)
Index: operand.scm
===================================================================
RCS file: /cvs/src/src/cgen/operand.scm,v
retrieving revision 1.5
diff -u -p -r1.5 operand.scm
--- operand.scm 20 Dec 2002 06:39:04 -0000 1.5
+++ operand.scm 9 Jan 2003 03:22:29 -0000
@@ -90,6 +90,9 @@
; referenced. #f means the operand is always referenced by
; the instruction.
(cond? . #f)
+
+ ; whether (and by how much) this instance of the operand is delayed.
+ (delayed . #f)
)
nil)
)
@@ -135,6 +138,8 @@
(define op:set-num! (elm-make-setter <operand> 'num))
(define op:cond? (elm-make-getter <operand> 'cond?))
(define op:set-cond?! (elm-make-setter <operand> 'cond?))
+(define op:delay (elm-make-getter <operand> 'delayed))
+(define op:set-delay! (elm-make-setter <operand> 'delayed))
; Compute the hardware type lazily.
; FIXME: op:type should be named op:hwtype or some such.
Index: mach.scm
===================================================================
RCS file: /cvs/src/src/cgen/mach.scm,v
retrieving revision 1.2
diff -u -p -r1.2 mach.scm
--- mach.scm 12 Jul 2001 02:32:25 -0000 1.2
+++ mach.scm 9 Jan 2003 03:22:31 -0000
@@ -755,8 +755,7 @@
(apply min (cons 65535
(map insn-length (find (lambda (insn)
(and (not (has-attr? insn 'ALIAS))
- (eq? (obj-attr-value insn 'ISA)
- (obj:name isa))))
+ (isa-supports? isa insn)))
(non-multi-insns (current-insn-list))))))
)
@@ -765,9 +764,8 @@
; [a language with infinite precision can't have max-reduce-iota-0 :-)]
(apply max (cons 0
(map insn-length (find (lambda (insn)
- (and (not (has-attr? insn 'ALIAS))
- (eq? (obj-attr-value insn 'ISA)
- (obj:name isa))))
+ (and (not (has-attr? insn 'ALIAS))
+ (isa-supports? isa insn)))
(non-multi-insns (current-insn-list))))))
)
@@ -1008,13 +1006,19 @@
; Allow a cpu family to override the isa parallel-insns spec.
; ??? Concession to the m32r port which can go away, in time.
parallel-insns
+
+ ; Computed: maximum number of insns which may pass before there
+ ; an insn writes back its output operands.
+ max-delay
+
)
nil)
)
; Accessors.
-(define-getters <cpu> cpu (word-bitsize insn-chunk-bitsize file-transform parallel-insns))
+(define-getters <cpu> cpu (word-bitsize insn-chunk-bitsize file-transform parallel-insns max-delay))
+(define-setters <cpu> cpu (max-delay))
; Return endianness of instructions.
@@ -1064,7 +1068,9 @@
word-bitsize
insn-chunk-bitsize
file-transform
- parallel-insns)
+ parallel-insns
+ 0 ; default max-delay. will compute correct value
+ )
(begin
(logit 2 "Ignoring " name ".\n")
#f))) ; cpu is not to be kept
@@ -1284,13 +1290,13 @@
; Assert only one cpu family has been selected.
(assert-keep-one)
- (let ((par-insns (map isa-parallel-insns (current-isa-list)))
+ (let ((false->zero (lambda (x) (if x x 0)))
+ (par-insns (map isa-parallel-insns (current-isa-list)))
(cpu-par-insns (cpu-parallel-insns (current-cpu))))
; ??? The m32r does have parallel execution, but to keep support for the
; base mach simpler, a cpu family is allowed to override the isa spec.
- (or cpu-par-insns
- ; FIXME: ensure all have same value.
- (car par-insns)))
+ (max (false->zero cpu-par-insns)
+ (apply max (map false->zero par-insns))))
)
; Return boolean indicating if parallel execution support is required.
Index: dev.scm
===================================================================
RCS file: /cvs/src/src/cgen/dev.scm,v
retrieving revision 1.5
diff -u -p -r1.5 dev.scm
--- dev.scm 21 Dec 2002 22:22:33 -0000 1.5
+++ dev.scm 9 Jan 2003 03:22:31 -0000
@@ -115,7 +115,7 @@
(load "sid-model")
(load "sid-decode")
(set! verbose-level 3)
- (set! APPLICATION 'SIMULATOR)
+ (set! APPLICATION 'SID-SIMULATOR)
)
(define (load-sim)
Index: doc/rtl.texi
===================================================================
RCS file: /cvs/src/src/cgen/doc/rtl.texi,v
retrieving revision 1.17
diff -u -p -r1.17 rtl.texi
--- doc/rtl.texi 22 Dec 2002 04:49:26 -0000 1.17
+++ doc/rtl.texi 9 Jan 2003 03:22:34 -0000
@@ -1833,7 +1833,7 @@ This is a character string consisting of
Fields are denoted by @code{$operand} or
@code{$@{operand@}}@footnote{Support for @code{$@{operand@}} is
work-in-progress.}. If a @samp{$} is required in the syntax, it is
-specified with @samp{\$}. At most one white-space character may be
+specified with @samp{$$}. At most one white-space character may be
present and it must be a blank separating the instruction mnemonic from
the operands. This doesn't restrict the user's assembler, this is
@c Is this reasonable?
@@ -2257,10 +2257,39 @@ first argument.
Indicate that @samp{object} is written in mode @samp{mode}, without
saying how. This could be useful in conjunction with the C escape hooks.
-@item (delay mode num expr)
-Indicate that there are @samp{num} delay slots in the processing of
-@samp{expr}. When using this rtx in instruction semantics, CGEN will
-infer that the instruction has the DELAY-SLOT attribute.
+@item (delay num expr)
+In older "sim" simulators, indicates that there are @samp{num} delay
+slots in the processing of @samp{expr}. When using this rtx in instruction
+semantics, CGEN will infer that the instruction has the DELAY-SLOT
+attribute.
+
+In newer "sid" simulators, evaluates to the writeback queue for hardware
+operand @samp{expr}, at @samp{num} instruction cycles in the
+future. @samp{expr} @emph{must} be a hardware operand in this case.
+
+For example, @code{(set (delay 3 pc) (+ pc 1))} will schedule write to
+the @samp{pc} register in the writeback phase of the 3rd instruction
+after the current. Alternatively, @code{(set gr1 (delay 3 gr2))} will
+immediately update the @samp{gr1} register with the @emph{latest write}
+to the @samp{gr2} register scheduled between the present and 3
+instructions in the future. @code{(delay 0 ...)} refers to the
+writeback phase of the current instruction.
+
+This effect is modeled with a circular buffer of "write stacks" for each
+hardware element (register banks get a single stack). The size of the
+circular buffer is calculated from the uses of @code{(delay ...)}
+rtxs. When a delayed write occurs, the simulator pushes the write onto
+the appropriate write stack in the "future" of the circular buffer for
+the written-to hardware element. At the end of each instruction cycle,
+the simulator executes all writes in all write stacks for the time slice
+just ending. When a delayed read (essentially a pipeline bypass) occurs,
+the simulator looks ahead in the circular buffer for any writes
+scheduled in the future write stack. If it doesn't find one, it
+progressively backs off towards the "current" instruction cycle's write
+stack, and if it still finds no scheduled writes then it returns the
+current state of the CPU. Thus while delayed writes are fast, delayed
+reads are potentially slower in a simulator with long pipelines and very
+large register banks.
@item (annul yes?)
@c FIXME: put annul into the glossary.