This is the mail archive of the
cgen@sources.redhat.com
mailing list for the CGEN project.
generalizing the delay rtx function
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Subject: generalizing the delay rtx function
- From: Doug Evans <dje at transmeta dot com>
- Date: Mon, 12 Mar 2001 20:33:20 -0800 (PST)
- Cc: cgen at sources dot redhat dot com
- References: <20010308160106.A28162@redhat.com>
Frank Ch. Eigler writes:
> Hi -
>
> As you may be aware, the delay rtx function is used, despite its
> work-in-progress designation,
Despite? How do you define "work-in-progress"?
> to model delayed branches in
> constructs like
> (delay 1
> (set pc (add pc 42)))
> The DELAY-SLOT insn attribute is inferred from this for use by
> simulator mainlines. That's the extent of the effect of the delay
> rtx.
>
> In order to model architectures with exposed pipelines (i.e., no
> or limited pipeline interlocks), and related effects like delayed
> loads, I'd like to take it beyond this, by coupling it to the
> parallel-write mechanism.
You mean take the work-in-progress and finish(*1) the work?
[(*1) or make closer to being finished ...]
> As you might be aware, ports that have VLIW features tend to use
> the "parallel-write" mechanism in their semantic blocks in order
> to queue updates to registers/memory? until after all concurrently
> executed instructions have been processed. This lets multiple
> reader instructions execute together with a writer instruction,
> without detailed worry about the evaluation sequence.
>
> Anyway, how about a scheme such as this:
>
> - Provide a clear definition for the DELAY rtx:
> The numeric argument is the number of instruction cycles
> after the current one, at which the enclosed set expressions
> take effect.
> - Restrict the use of the DELAY rtx to only contain SET expressions
> to hardware/memory registers. Forbid other calculations.
Ok.
> - Possibly, force use of (DELAY 0 ....) to express VLIW concurrency,
> at least in new ports.
Ick. Or rather, got an example?
> - Infer "parallel-write?" (or a new equivalent) from the presence of
> DELAY rtxs.
Ditto.
> - Eliminate special treatment of PC by fitting delayed branches into
> this model.
No current opinion.
> Then, the generated simulator code would be changed, so that:
>
> - Semantic functions, instead of taking a single parexec structure
> pointer (for write queueing), take an array of them. Within
> (DELAY <N> RTX*) blocks, define OPRND to point to the appropriate
> elements in the parexec array.
Guess I'd have to see the implementation.
> - The insn evaluation loop would keep an array of parexec structs
> as a rotating buffer, always running the writeback code on the first
> one, then rotating the set, then passing it to the next insn. cgen
> could compute the maximum index needed.
>
> This way, code like
> (set reg1 1)
> (delay 0 (set reg1 3))
> (delay 1 (set reg2 5))
> (delay 2 (set reg1 6))
> would each be well-defined and useful.
Yep.
> An alternate cgen syntax possibility is to introduce a
> (delayed-set N lvalue rvalue)
> rtx.
>
> Any advice?
Eat healthy and exercise.