In s390_vr_fill() in guest_s390_toIR.c, filling a vector with two copies
of a 64-bit value is realized with Iop_64HLtoV128, since there is no such
operator as Iop_Dup64x2. But the two args to Iop_64HLtoV128 use the same
expression, referenced twice. Although this hasn't been seen to cause
real trouble yet, it's problematic and potentially inefficient, so change
it: Assign to a temp and pass that twice instead.
In the instruction selector, if Iop_64HLtoV128 is found to be used for a
duplication as above, select "v-vdup" instead of "v-vinitfromgprs". This
mimicks the behavior we'd get if there actually was an operator
Iop_Dup64x2.