Scheduling x86 dispatch windows

H.J. Lu
Thu Jun 10 19:52:00 GMT 2010

On Thu, Jun 10, 2010 at 11:05 AM, Quentin Neill
<> wrote:
> Cross-posting Reza's call for feedback to the binutils list since it
> is relevant -
> see the last few paragraphs regarding how to "solve the alignment problem".
> Original thread:
> Not sure if followups should occur on one list or both.
> --
> Quentin Neill
> On Thu, Jun 10, 2010 at 12:20 PM, reza yazdani <> wrote:
>> Hi,
>> We are in the process of adding a feature to GCC to take advantage
>> of a new hardware feature in the latest AMD micro processor. This
>> feature requires a certain mix, ordering and alignments in
>> instruction sequences to obtain the expected hardware performance.
>> I am asking the community to review this high level implementation
>> design and give me direction or advice.
>> The new hardware issues two windows of the size N bytes of
>> instructions in every cycle. It goes into accelerate mode if the
>> windows have the right combination of instructions or alignments. Our
>> goal is to maximize the IPC by proper instruction scheduling and
>> alignments.
>> Here is a summary of the most important requirements:
>> a) Maximum of N instructions per window.
>> b) An instruction may cross the first window.
>> c) Each window can have maximum of x memory loads and y memory
>>    stores .
>> d) The total number of immediate constants in the instructions
>>    of a window should not exceed k.
>> e) The first window must be aligned on 16 byte boundary.
>> f) A Window set terminates when a branch exists in a window.
>> g) The number of allowed prefixes varies for instructions.
>> h) A window set needs to be padded by prefixes in instructions
>>    or terminated by nops to ensure adherence to the rules.
>> We have the following implementation plan for GCC:
>> 1) Modify the Haifa scheduler to make the desired arrangement of
>>    instructions for the two dispatch windows. The scheduler is called
>>    once before and once after register allocation as usual. In both
>>    cases it performs dispatch scheduling along with its normal job of
>>    instruction scheduling.
>> The advantage of doing it before register allocation is avoiding
>> extra dependencies caused by register allocation which may become
>> an obstacle to movement of instructions.  The advantage of doing
>> it after register allocation is a consideration for spilling code
>> which may be generated by the register allocator.
>> The algorithm we use is:
>> a) Considering the current dispatch window set, choose the first
>>    instruction from ready queue that does not violate dispatch rules.
>> b) When an instruction is selected and scheduled, inform the
>>    dispatcher code about the instruction. This step keeps track of the
>>    instruction content of windows for future evaluation. It also manages
>>    the window set by closing and opening new virtual dispatch windows.
>> 2) Insertion of alignment code.
>> In x86 alignment is done by inserting prefixes or by generating
>> nops. As the object code is generated by the assembler in GCC, some
>> information such as sizes of branches are unknown until assembly or
>> link time. To do alignments related to dispatch correctly in GCC,
>> we need to iteratively compute prefixes and branch sizes until
>> its convergence. This pass currently does not exist in GCC, but it
>> exists in the assembler.
>> There are two possible approaches to solve alignment problem.
>> a)  Let the assembler performs the alignments and padding needed
>>     to adhere with the new machine dispatching rules and avoid an extra
>>     pass in GCC.
>> b)  Add a new pass to mimic what assembler does before generating
>>     the assembly listing in GCC and insert the required alignments.
>> I appreciate your comments on the proposed implementation procedure
>> and the choices a or b above.

I don't this should be done in assembler. Assembler should just assemble
the assembly input.


More information about the Binutils mailing list