RFC: MIPS: Workaround the "daddi" and "daddiu" errata

Thu Mar 18 00:39:00 GMT 2004

On Sat, 6 Mar 2004, Richard Sandiford wrote:

> >  As you may know there are errata in the R4000 and the initial revision of 
> > the R4400 processor, that lead to an incorrect execution of the "daddi" 
> > and the "daddiu" instructions whan a two-complement overflow happens.  I 
> > believe the only way to work this problem around is to assume the affected 
> > processors don't support these instructions at all.  Fortunately, this can 
> > be quite easily done by pretending these instructions are actually 
> > macros and expanding them to a sequence of a "li" ("addiu" to $0) and a 
> > "dadd" or a "daddu" instruction with an aid of a temporary register, like 
> > this:
> >
> > daddiu $2,$3,0x1234 -> addiu $1,$0,0x1234; daddu $2,$3,$1
> > daddi  $2,$3,0x1234 -> addiu $1,$0,0x1234; dadd  $2,$3,$1
> >
> >  This works quite well when $1 is available and could be implemented with 
> > gas exclusively for that case.  Unfortunately, "daddiu" is quite commonly 
> > used in expansions of address references in macros.
> 
> I guess there are two main ways of approaching an errata like this:
> 
>   (a) Stop gas & gcc from ever using daddiu.
>   (b) Try to justify whether each use of daddiu is safe. Use an
>       alternative if not.
> 
> You've gone for (a) here, which I suppose you could argue is safer.
> Problem is, the patch ends up touching far more code than (b) would,
> and introduces far more special cases.  I think it's going to be very
> hard to maintain the patch in the long-term.  From that point of view,
> it could be argued that it's actually safer to restrict the workaround
> to the cases that really matter.  There's less chance of things silently
> breaking in future.

 I'm not so sure.  I wanted to make absolutely sure the output is
verifiable for correctness and this achieved.  You can run e.g. `objdump
-S <file> | grep 'daddi[u]*' on a binary and see a problem immediately.  
It can also be handled quite easily by the kernel or the dynamic linker
when mmap()ping executable pages -- simply by scanning for the bad
opcodes.  I'd say your proposal is more fragile -- you have to study
generated code thoroughly for every "daddiu" instruction to check if the
erratum can be triggered.  And run-time validation is essentially
impossible (we don't want to do static analysis in the kernel, do we?).

> A quick look at the gcc patch suggests that the changes fall into
> three categories:
> 
>   (1) daddiu used to add an integer constant whose value is known to gcc
> 
>         Related hunks:
>         * adddi3_internal_3
>         * changes to other patterns that I don't think are necessary
> 
>   (2) daddiu used to add %lo(x) to an incomplete address
> 
>         Hunks related to general addresses:
>         * mips_symbol_insns
>         * mips_secondary_reload_class
>         * all(?) 64-bit non-mips16 .md patterns that have a load or store
>           alternative
>         * various reload patterns
> 
>         Hunks related to branch handling:
>         * mips_conditional_register_usage
>         * mips_output_load_label
>         * mips_output_conditional_branch
> 
>   (3) daddiu used to allocate and deallocate stack space in gcc
> 
>         Related hunks:
>         * ASM_OUTPUT_REG_PUSH
>         * ASM_OUTPUT_REG_POP
> 
> And the errata we're working around is that:
> 
>       daddiu dst,src,xxxx
> 
> will give the incorrect result in the following cases:
> 
>   (+) src = 0x7fff_ffff_ffff_yyyy && xxxx + yyyy > 0x10000
> 
>       RESULT: dst will be 0x0000_0000_0000_zzzz
>               rather than 0x8000_0000_0000_zzzz
> 
>   (-) src = 0x8000_0000_0000_yyyy && -xxxx > yyyy
> 
>       RESULT: dst will be 0xffff_ffff_ffff_zzzz
>               rather than 0x7fff_ffff_ffff_zzzz
> 
> So looking at each of these in turn:
> 
>   (1) Definitely needs a workaround, no question about that.
> 
>   (2) Adding %lo(X) to the high part of X can only trigger case (-)
>       since yyyy (the low 16 bits of src) will be 0.  In this case,
>       X must be of the form 0x7fff_ffff_ffff_8???, which is an invalid
>       address, right?  So there's no way this bug could trigger if X
>       refers to a valid in-memory object.
> 
>       Of course, X could still be some kind of constant defined in a
>       linker script, but, well... don't do that if you care about
>       targets with a buggy daddiu.  It hardly seems to justify the
>       kind of code bloat it causes in the common case, nor the big
>       maintenance burden.
> 
>       The same argument applies when "src" contains the high part of
>       X plus an index.  If X refers to an in-memory object, no valid
>       index can cause the daddiu to overflow.
> 
>       Unfortunately, I don't know of any specific code in gcc to stop:
> 
>           &foo[y - x]
> 
>       from being expanded as:
> 
>           ...load high part of foo into dst...
>           daddu dst,dst,y
>           daddiu dst,dst,%lo(foo)
>           dsubu dst,dst,x
> 
>       so perhaps we could trip over the errata when x and y are both
>       humungous values.  But I don't think this case justifies treating
>       "daddiu ...,%lo(...)" as a complete no-no.  Again, it leads to
>       large code bloat and has a high maintenance burden.
> 
>       If you can demonstrate that &foo[y - x] is indeed a problem then
>       I'd prefer a patch that addresses it head-on.

 Note that even if the address is invalid, code should work correctly --
the address may be further offsetted later.  Or code may expect a bad
address and deal with it somehow.  Or there may be a real bug in code --
it should really get an exception then, instead of silently using a 
different address.

>   (3) In the case of ASM_OUTPUT_REG_PUSH, you're catering for case (-);
>       i.e. the case where the stack pointer goes from 0x8000... to
>       0x7fff....  Since this represents the stack pointer moving from
>       xkphys to an invalid address, it surely represents a program gone
>       wrong.  Even the original stack pointer value is a bit suspect ;)
> 
>       Perhaps you could argue that, since 0xffff... is a valid address,
>       the behaviour of the broken daddiu is less friendly for error-detection
>       and debugging purposes.  But I don't think that's enough to justify
>       the extra gcc maintenance burden, nor the cost of using two instructions
>       to allocate small chunks of stack.
> 
>       Similar arguments apply for the POP case.
> 
> Your patch also includes changes to patterns like ffsdi.  This seems
> a bit over-the-top; there's no way counting from 0 to 64 can trigger
> overflow.

 It's just the consequence of getting rid of the instructions universally, 
which was my goal.

 Note that a lot of these modifications is due to macros that are expanded 
by gas to code sequences including a "daddiu".  If we limit the workaround 
to the explicit reloc case (which can now be done thanks to your recent 
no-PIC support addition) a lot of changes can go away.  That would be a 
reasonable and acceptable for me approach.

 Then, if all the patterns that want a "daddiu" call something like 
emit_daddiu_insn() instead of emitting the instruction explicitly, then 
suddenly the changes required may become limited to really a few places.

 If you still worry about maintenance hassle of the change, then feel free
to offload it to me.  Though with an easy detectability of a breakage, as 
demonstrated above, I'll probably be first to notice it and provide an 
update.

> Anyway, concentrating on (1):
> 
> > @@ -1030,17 +1066,41 @@
> >  		 (match_dup 3)))]
> >    "")
> >  
> > -(define_insn "adddi3_internal_3"
> > +(define_expand "adddi3_internal_3"
> >    [(set (match_operand:DI 0 "register_operand" "=d,d")
> >  	(plus:DI (match_operand:DI 1 "reg_or_0_operand" "dJ,dJ")
> >  		 (match_operand:DI 2 "arith_operand" "d,Q")))]
> >    "TARGET_64BIT && !TARGET_MIPS16"
> > -  "@
> > -    daddu\t%0,%z1,%2
> > -    daddiu\t%0,%z1,%2"
> > +  "")
> > +
> > +(define_insn "adddi3_internal_3a"
> > +  [(set (match_operand:DI 0 "register_operand" "=d")
> > +	(plus:DI (match_operand:DI 1 "reg_or_0_operand" "dJ")
> > +		 (match_operand:DI 2 "register_operand" "d")))]
> > +  "TARGET_64BIT && !TARGET_MIPS16"
> > +  "daddu\t%0,%z1,%2"
> > +  [(set_attr "type"	"darith")
> > +   (set_attr "mode"	"DI")])
> > +
> > +(define_insn "adddi3_internal_3b"
> > +  [(set (match_operand:DI 0 "register_operand" "=d")
> > +	(plus:DI (match_operand:DI 1 "reg_or_0_operand" "dJ")
> > +		 (match_operand:DI 2 "const_arith_operand" "Q")))]
> > +  "TARGET_64BIT && !TARGET_MIPS16 && !TARGET_NO_DADDI"
> > +  "daddiu\t%0,%z1,%2"
> >    [(set_attr "type"	"darith")
> >     (set_attr "mode"	"DI")])
> >  
> > +(define_insn "adddi3_internal_3c"
> > +  [(set (match_operand:DI 0 "register_operand" "=d")
> > +	(plus:DI (match_operand:DI 1 "reg_or_0_operand" "dJ")
> > +		 (match_operand:DI 2 "const_arith_operand" "Q")))]
> > +  "TARGET_64BIT && !TARGET_MIPS16 && TARGET_NO_DADDI"
> > +  "%[addiu\\t%@,%.,%2\;daddu\t%0,%z1,%@%]"
> > +  [(set_attr "type"	"darith")
> > +   (set_attr "mode"	"DI")
> > +   (set_attr "length"	"8")])
> > +
> >  ;; For the mips16, we need to recognize stack pointer additions
> >  ;; explicitly, since we don't have a constraint for $sp.  These insns
> >  ;; will be generated by the save_restore_insns functions.
> 
> Please don't split the existing pattern.  One pattern with several
> alternatives gives reload more freedom than several patterns with
> single alternatives.

 I wasn't aware of this -- is it missing from documentation, or wasn't I
careful enough reading it?

> I suggest the TARGET_NO_DADDIU version look something like:
> 
> (define_insn "adddi3_internal_3_nodaddiu"
>   [(set (match_operand:DI 0 "register_operand" "=d,d")
> 	(plus:DI (match_operand:DI 1 "reg_or_0_operand" "dJ,dJ")
> 		 (match_operand:DI 2 "arith_operand" "d,Q")))]
>   "TARGET_64BIT && TARGET_NO_DADDIU"
> {
>   if (which_alternative == 0)
>     return "daddu\t%0,%z1,%2";
> 
>   if (operands[1] == stack_pointer_rtx)
>     return "daddiu\t%0,%z1,%2";
> 
>   return "%[addiu\t%@,%.,%2\;daddu\t%0,%z1,%@%]";
> }
>   [(set_attr "type"	"darith")
>    (set_attr "length"   "8,4")])
> 
> ...completely untested, might not even be syntactically valid.
> This overestimates the length of the stack pointer case,
> but that's not a problem.

 I'll investigate your proposal.  As I'm essentially done with my patches 
functionality-wise, I'll do a short upgrade cycle to get in sync with the 
trunk and then I'll get back with an update.

 I don't think there can be any reduction done on the gas patch, OTOH.  
With a suitable test case, code should be fairly easy to maintain, though.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +