Scheduling x86 dispatch windows
Thu Jun 10 18:06:00 GMT 2010
Cross-posting Reza's call for feedback to the binutils list since it
is relevant -
see the last few paragraphs regarding how to "solve the alignment problem".
Original thread: http://gcc.gnu.org/ml/gcc/2010-06/threads.html#00402
Not sure if followups should occur on one list or both.
On Thu, Jun 10, 2010 at 12:20 PM, reza yazdani <firstname.lastname@example.org> wrote:
> We are in the process of adding a feature to GCC to take advantage
> of a new hardware feature in the latest AMD micro processor. This
> feature requires a certain mix, ordering and alignments in
> instruction sequences to obtain the expected hardware performance.
> I am asking the community to review this high level implementation
> design and give me direction or advice.
> The new hardware issues two windows of the size N bytes of
> instructions in every cycle. It goes into accelerate mode if the
> windows have the right combination of instructions or alignments. Our
> goal is to maximize the IPC by proper instruction scheduling and
> Here is a summary of the most important requirements:
> a) Maximum of N instructions per window.
> b) An instruction may cross the first window.
> c) Each window can have maximum of x memory loads and y memory
> stores .
> d) The total number of immediate constants in the instructions
> of a window should not exceed k.
> e) The first window must be aligned on 16 byte boundary.
> f) A Window set terminates when a branch exists in a window.
> g) The number of allowed prefixes varies for instructions.
> h) A window set needs to be padded by prefixes in instructions
> or terminated by nops to ensure adherence to the rules.
> We have the following implementation plan for GCC:
> 1) Modify the Haifa scheduler to make the desired arrangement of
> instructions for the two dispatch windows. The scheduler is called
> once before and once after register allocation as usual. In both
> cases it performs dispatch scheduling along with its normal job of
> instruction scheduling.
> The advantage of doing it before register allocation is avoiding
> extra dependencies caused by register allocation which may become
> an obstacle to movement of instructions. The advantage of doing
> it after register allocation is a consideration for spilling code
> which may be generated by the register allocator.
> The algorithm we use is:
> a) Considering the current dispatch window set, choose the first
> instruction from ready queue that does not violate dispatch rules.
> b) When an instruction is selected and scheduled, inform the
> dispatcher code about the instruction. This step keeps track of the
> instruction content of windows for future evaluation. It also manages
> the window set by closing and opening new virtual dispatch windows.
> 2) Insertion of alignment code.
> In x86 alignment is done by inserting prefixes or by generating
> nops. As the object code is generated by the assembler in GCC, some
> information such as sizes of branches are unknown until assembly or
> link time. To do alignments related to dispatch correctly in GCC,
> we need to iteratively compute prefixes and branch sizes until
> its convergence. This pass currently does not exist in GCC, but it
> exists in the assembler.
> There are two possible approaches to solve alignment problem.
> a) Let the assembler performs the alignments and padding needed
> to adhere with the new machine dispatching rules and avoid an extra
> pass in GCC.
> b) Add a new pass to mimic what assembler does before generating
> the assembly listing in GCC and insert the required alignments.
> I appreciate your comments on the proposed implementation procedure
> and the choices a or b above.
> Reza Yazdani
More information about the Binutils