Scheduling x86 dispatch windows

Quentin Neill
Thu Jun 10 18:06:00 GMT 2010

Cross-posting Reza's call for feedback to the binutils list since it
is relevant -
see the last few paragraphs regarding how to "solve the alignment problem".

Original thread:

Not sure if followups should occur on one list or both.
Quentin Neill

On Thu, Jun 10, 2010 at 12:20 PM, reza yazdani <> wrote:
> Hi,
> We are in the process of adding a feature to GCC to take advantage
> of a new hardware feature in the latest AMD micro processor. This
> feature requires a certain mix, ordering and alignments in
> instruction sequences to obtain the expected hardware performance.
> I am asking the community to review this high level implementation
> design and give me direction or advice.
> The new hardware issues two windows of the size N bytes of
> instructions in every cycle. It goes into accelerate mode if the
> windows have the right combination of instructions or alignments. Our
> goal is to maximize the IPC by proper instruction scheduling and
> alignments.
> Here is a summary of the most important requirements:
> a) Maximum of N instructions per window.
> b) An instruction may cross the first window.
> c) Each window can have maximum of x memory loads and y memory
>    stores .
> d) The total number of immediate constants in the instructions
>    of a window should not exceed k.
> e) The first window must be aligned on 16 byte boundary.
> f) A Window set terminates when a branch exists in a window.
> g) The number of allowed prefixes varies for instructions.
> h) A window set needs to be padded by prefixes in instructions
>    or terminated by nops to ensure adherence to the rules.
> We have the following implementation plan for GCC:
> 1) Modify the Haifa scheduler to make the desired arrangement of
>    instructions for the two dispatch windows. The scheduler is called
>    once before and once after register allocation as usual. In both
>    cases it performs dispatch scheduling along with its normal job of
>    instruction scheduling.
> The advantage of doing it before register allocation is avoiding
> extra dependencies caused by register allocation which may become
> an obstacle to movement of instructions.  The advantage of doing
> it after register allocation is a consideration for spilling code
> which may be generated by the register allocator.
> The algorithm we use is:
> a) Considering the current dispatch window set, choose the first
>    instruction from ready queue that does not violate dispatch rules.
> b) When an instruction is selected and scheduled, inform the
>    dispatcher code about the instruction. This step keeps track of the
>    instruction content of windows for future evaluation. It also manages
>    the window set by closing and opening new virtual dispatch windows.
> 2) Insertion of alignment code.
> In x86 alignment is done by inserting prefixes or by generating
> nops. As the object code is generated by the assembler in GCC, some
> information such as sizes of branches are unknown until assembly or
> link time. To do alignments related to dispatch correctly in GCC,
> we need to iteratively compute prefixes and branch sizes until
> its convergence. This pass currently does not exist in GCC, but it
> exists in the assembler.
> There are two possible approaches to solve alignment problem.
> a)  Let the assembler performs the alignments and padding needed
>     to adhere with the new machine dispatching rules and avoid an extra
>     pass in GCC.
> b)  Add a new pass to mimic what assembler does before generating
>     the assembly listing in GCC and insert the required alignments.
> I appreciate your comments on the proposed implementation procedure
> and the choices a or b above.
> Reza Yazdani

More information about the Binutils mailing list