This is the mail archive of the crossgcc@sources.redhat.com mailing list for the crossgcc project.
See the CrossGCC FAQ for lots more infromation.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
> > I'm using an arm-elf cross compiler built from 2.95.2 for a device with > a StrongARM CPU core. I've noticed that the compiler tries to avoid > multiply instructions by transforming a 32 bit multiply by a constant > into a sequence of adds and subtracts with shifts. This is probably > desirable if the processor has a slow multiply instruction, but the > StrongARM core I'm using has a fast multiply (1 clock issue, 1-3 clock > result delay depending on early termination). So I'd really prefer for > the compiler to use the multiply instruction. A quick glance through > arm.c in the GCC sources indicate that when -mcpu=strongarm is used, > then a flag (arm_fast_multiply) gets set. Should this cause the use of > the multiply instructions (or at least make them more favorable)? Any > hints on how to get the compiler to cooperate? > Well, when multiplying by a constant, it is nearly always faster to build the operation up from shift instructions, even on a StrongARM. Remember that to use the multiply instruction a constant first has to be loaded into a register; that takes at least one cycle and may take many more if the value has to be synthesised or fetched from an area of memory that might be outside the cache (though that can sometimes be moved outside of a loop at the expense of increasing register pressure). It then takes at least two cycles to perform the multiply itself, so we have an absolute minimum of 3 cycles before it could be possible to save time by using the multiply instruction. A very large number of constant multiplications in normal code can be synthesised in 3 or less shift+add insns (each taking one cycle), so there are only a small number of cases where it would be better to use the multiply instruction even on a StrongARM. The costings in gcc are set up to take the above into account, so I'm not surprised that you are not seeing the use of the multiply insn. Do you have a specific example where the compile is definitely generating slower code? If so, I'd be interested in taking a look at it. Richard ------ Want more information? See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/ Want to unsubscribe? Send a note to crossgcc-unsubscribe@sourceware.cygnus.com
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |