This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base


On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:
> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
> > register so that vector index register will be zero-extended to 64 bits.
> >
> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
> > segment override since address will be segment base + zero-extended to 64
> > bits of (base + index * scale + disp).  But GCC:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502
>
> Neither above nor in the bug you explain what's wrong with the
> segment override plus address size override in x32 mode. Since you

X32 relies on 0x67 prefix to zero-extend address to 64 bits:

zero-extended (base + index * scale + disp)

With segment override, we got

segment base + zero-extended (base + index * scale + disp)

instead of

zero-extended (segment base + base + index * scale + disp)

When base + index * scale + disp is negative, we get the wrong
address.

VSIB address in vgatherdps is

base + sign-extend(index) * scale + disp

With segment override, we got

segment base + zero-extended (base + sign-extend(index) * scale + disp)

175.vpr in SPEC CPU 2000:

VPR FPGA Placement and Routing Program Version 4.00-spec
Source completed August 19, 1997.


General Options:
The circuit will be placed but not routed.

Placer Options:
User annealing schedule selected with:
Initial Temperature: 5
Exit (Final) Temperature: 0.005
Temperature Reduction factor (alpha_t): 0.9412
Number of moves in the inner loop is (num_blocks)^4/3 * 2
Placement cost type is linear congestion.
Placement will be performed once.
Placement channel width factor = 100.
Exponent used in placement cost: 1
Initial random seed: 1

Reading the FPGA architectural description from arch.in.
Successfully read arch.in.
Pins per clb: 6.  Pads per row/column: 2.
Subblocks per clb: 1.  Subblock LUT size: 4.
Fc value is fraction of tracks in a channel.
Fc_output: 1.  Fc_input: 1.  Fc_pad: 1.
Switch block type: Subset.
Distinct types of segments: 3.
Distinct types of user-specified switches: 3.

Reading the circuit netlist from net.in.
Warning:  logic block #368 (n_n13961) has only 1 pin.
Pin is an output -- may be a constant generator.  Non-fatal, but check this.
Successfully read net.in.
8527 blocks, 8445 nets, 1 global nets.
8383 clbs, 62 inputs, 82 outputs.
The circuit will be mapped into a 92 x 92 array of clbs.


Program received signal SIGSEGV, Segmentation fault.
0x004158fd in try_place.isra ()
(gdb) disass 0x004158fd,+32
Dump of assembler code from 0x4158fd to 0x41591d:
=> 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12
   0x00415907 <try_place.isra.5+7527>: vandps %ymm12,%ymm7,%ymm0
   0x0041590c <try_place.isra.5+7532>: vpslld $0x2,%ymm1,%ymm10
   0x00415911 <try_place.isra.5+7537>: vmovdqa 0x1cbe7(%rip),%ymm13
    # 0x432500
   0x00415919 <try_place.isra.5+7545>: inc    %eax
   0x0041591b <try_place.isra.5+7547>: vpaddd %ymm5,%ymm10,%ymm14
End of assembler dump.
(gdb) p/x $ymm15
$1 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
    0x8000000000000000, 0x8000000000000000, 0x8000000000000000,
    0x8000000000000000}, v32_int8 = {0x10, 0x30, 0xfa, 0xf7, 0x24, 0x30, 0xfa,
    0xf7, 0x38, 0x30, 0xfa, 0xf7, 0x4c, 0x30, 0xfa, 0xf7, 0x60, 0x30, 0xfa,
    0xf7, 0x74, 0x30, 0xfa, 0xf7, 0x88, 0x30, 0xfa, 0xf7, 0x9c, 0x30, 0xfa,
    0xf7}, v16_int16 = {0x3010, 0xf7fa, 0x3024, 0xf7fa, 0x3038, 0xf7fa,
    0x304c, 0xf7fa, 0x3060, 0xf7fa, 0x3074, 0xf7fa, 0x3088, 0xf7fa, 0x309c,
    0xf7fa}, v8_int32 = {0xf7fa3010, 0xf7fa3024, 0xf7fa3038, 0xf7fa304c,
    0xf7fa3060, 0xf7fa3074, 0xf7fa3088, 0xf7fa309c}, v4_int64 = {
    0xf7fa3024f7fa3010, 0xf7fa304cf7fa3038, 0xf7fa3074f7fa3060,
    0xf7fa309cf7fa3088}, v2_int128 = {0xf7fa304cf7fa3038f7fa3024f7fa3010,
    0xf7fa309cf7fa3088f7fa3074f7fa3060}}
(gdb)

Here indexes are 0xf7fa3010, .... Before my fix, they are sign-extended to
0xfffffffff7fa3010 which leads to invalid address in x32.

> keep using the same wording with just slight alterations, it must be
> something very obvious to you, but entirely un-obvious to me. Is
> this related to the desire of using both negative and positive
> offsets into TLS, where (obviously I would say) there's not going
> to be any wrapping at the 4Gb boundary? If so, I'd say the TLS

It won't wrap for x32.

> usage model is broken, but it's not the assembler that should

No, it is not.  Please read "ILP32 Programming Model" in x86-64 psABI.

> prevent use of otherwise valid constructs. Whether full 64-bit

Assembly is correct for 64-bit mode.  Since it doesn't work for
x32 when offset is negative, we should at least give a warning.

> addresses (and hence full non-zero %fs/%gs bases with no
> wrapping at the 4Gb boundary) is intended is the programmer's
> choice, not something the assembler should enforce unconditionally.
> Optionally emitting a warning is acceptable, but then this shouldn't
> be tied to any other, more generically applicable warnings.

Binutils, including linker, does a few things special for x32 to deal
with address limitation.  This is just one of them.

> In any event, if this is to stay, then at least the code comment
> needs to be quite a bit more clear - "we can't have" is not enough
> without explicitly saying why that is.
>
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -8141,6 +8141,36 @@ output_insn (void)
> >         i.prefix[LOCK_PREFIX] = 0;
> >       }
> >
> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
> > +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)
> > +     {
> > +       /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
> > +          without base register so that vector index register will
> > +          be zero-extended to 64 bits.  */
> > +       if (!i.base_reg && i.tm.opcode_modifier.vecsib)
> > +         add_prefix (ADDR_PREFIX_OPCODE);
>
> Just to re-state: There needs to be a way to override this behavior.
> And this is already leaving aside that making this the default from
> now on has a fair risk of breaking currently working code. (Note
> that this is not to say that I can't see that the change will also
> help currently broken code.)

Please see above.  If VSIB index is below 2G, my fix doesn't change
anything.  If VSIB index is above 2G, the program crashes before my fix.

> > +       /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with
> > +          segment override since final address will be segment
> > +          base + zero-extended (base + index * scale + disp).  */
> > +       if (operand_check != check_none
> > +           && i.prefix[ADDR_PREFIX]
> > +           && i.prefix[SEG_PREFIX])
> > +         {
> > +           const seg_entry *seg;
> > +           if (i.seg[0])
> > +             seg = i.seg[0];
> > +           else
> > +             seg = i.seg[1];
> > +           if (operand_check == check_error)
> > +             as_bad (_("can't encode segment `%s%s' with 32-bit address"),

How about just

segment `%s%s' override with 32-bit address

> > +                     register_prefix, seg->seg_name);
> > +           else
> > +             as_warn (_("segment `%s%s' override with 32-bit address"),
> > +                      register_prefix, seg->seg_name);
> > +         }
> > +     }
> > +#endif
> > +
> >        /* Since the VEX/EVEX prefix contains the implicit prefix, we
> >        don't need the explicit prefix.  */
> >        if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)
>
>
>


-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]