Thu Aug 24 08:43:00 GMT 2000
Richard Earnshaw wrote:
> > I'm using an arm-elf cross compiler built from 2.95.2 for a device with
> > a StrongARM CPU core. I've noticed that the compiler tries to avoid
> > multiply instructions by transforming a 32 bit multiply by a constant
> > into a sequence of adds and subtracts with shifts. This is probably
> > desirable if the processor has a slow multiply instruction, but the
> > StrongARM core I'm using has a fast multiply (1 clock issue, 1-3 clock
> > result delay depending on early termination). So I'd really prefer for
> > the compiler to use the multiply instruction. A quick glance through
> > arm.c in the GCC sources indicate that when -mcpu=strongarm is used,
> > then a flag (arm_fast_multiply) gets set. Should this cause the use of
> > the multiply instructions (or at least make them more favorable)? Any
> > hints on how to get the compiler to cooperate?
> Well, when multiplying by a constant, it is nearly always faster to build
> the operation up from shift instructions, even on a StrongARM. Remember
> that to use the multiply instruction a constant first has to be loaded
> into a register; that takes at least one cycle and may take many more if
> the value has to be synthesised or fetched from an area of memory that
> might be outside the cache (though that can sometimes be moved outside of
> a loop at the expense of increasing register pressure). It then takes at
> least two cycles to perform the multiply itself, so we have an absolute
> minimum of 3 cycles before it could be possible to save time by using the
> multiply instruction. A very large number of constant multiplications in
> normal code can be synthesised in 3 or less shift+add insns (each taking
> one cycle), so there are only a small number of cases where it would be
> better to use the multiply instruction even on a StrongARM.
> The costings in gcc are set up to take the above into account, so I'm not
> surprised that you are not seeing the use of the multiply insn. Do you
> have a specific example where the compile is definitely generating slower
> code? If so, I'd be interested in taking a look at it.
Your arguments are compelling.
I'll just let the compiler do its "thing".
Want more information? See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/
Want to unsubscribe? Send a note to firstname.lastname@example.org
More information about the crossgcc