This is the mail archive of the crossgcc@sources.redhat.com mailing list for the crossgcc project.

See the CrossGCC FAQ for lots more infromation.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Optimization question


> 
> I'm using an arm-elf cross compiler built from 2.95.2 for a device with
> a StrongARM CPU core.  I've noticed that the compiler tries to avoid
> multiply instructions by transforming a 32 bit multiply by a constant
> into a sequence of adds and subtracts with shifts.  This is probably
> desirable if the processor has a slow multiply instruction, but the
> StrongARM core I'm using has a fast multiply (1 clock issue, 1-3 clock
> result delay depending on early termination).  So I'd really prefer for
> the compiler to use the multiply instruction.  A quick glance through
> arm.c in the GCC sources indicate that when -mcpu=strongarm is used,
> then a flag (arm_fast_multiply) gets set.  Should this cause the use of
> the multiply instructions (or at least make them more favorable)?  Any
> hints on how to get the compiler to cooperate?
> 

Well, when multiplying by a constant, it is nearly always faster to build 
the operation up from shift instructions, even on a StrongARM.  Remember 
that to use the multiply instruction a constant first has to be loaded 
into a register; that takes at least one cycle and may take many more if 
the value has to be synthesised or fetched from an area of memory that 
might be outside the cache (though that can sometimes be moved outside of 
a loop at the expense of increasing register pressure).  It then takes at 
least two cycles to perform the multiply itself, so we have an absolute 
minimum of 3 cycles before it could be possible to save time by using the 
multiply instruction.  A very large number of constant multiplications in 
normal code can be synthesised in 3 or less shift+add insns (each taking 
one cycle), so there are only a small number of cases where it would be 
better to use the multiply instruction even on a StrongARM.

The costings in gcc are set up to take the above into account, so I'm not 
surprised that you are not seeing the use of the multiply insn.  Do you 
have a specific example where the compile is definitely generating slower 
code?  If so, I'd be interested in taking a look at it.

Richard



------
Want more information?  See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/
Want to unsubscribe? Send a note to crossgcc-unsubscribe@sourceware.cygnus.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]