[RFC] SVP64 Cray-style Vectorisation of the OpenPOWER scalar ISA

lkcl luke.leighton@gmail.com
Fri Mar 19 11:45:45 GMT 2021


On Friday, March 19, 2021, Alan Modra <amodra@gmail.com> wrote:

>
> Have you thought about objdump -d?  I'm not worried about that for an
> initial binutils contribution, but you just know people are going to
> ask for disassembly that looks like their asm, don't you?


yes.  deep breath: we need to move incrementally.  rs6000.md is so large
and complex that alterations have to be done with extreme care.

i suspect that at some point it may be possible to add in a pair-spotter
phase that looks for the EXT01 svp64 pattern and substitutes sv.xxx/y/z.
this will ironically slow down compile time but give the expected match

i anticipate/guess there being something similar added for OpenPOWER v3.1
64 bit prefixes, we should be able to follow that, after all we are
embedding SVP64 ReMap into the upper reserved space of EXT01.

Or just let everything go ahead to the point where gas is ready to
> write out the instruction, then twiddle anything necessary.


i'll have to re-examine the code again to see if it throws errors on
"unrecognised" register operand formats.  i may have misunderstood what you
mean.


>
> >
> > > ie. much
> > > the same as you already do using python.
> >
> >
> > yehyeh.  the principle is, we don't invent new opcode formats: it's
> simply
> > too much work at every level (for us and everyone else).
> >
> > use existing v3.0B ones, extra bits "augment", job done.
> >
> > That should work so long as
> > > you are flexible with white-space
> >
> >
> > wait, whur? :) ohhh yes, macros.
> >
> > ahh do gnu-as users  _really_ use macros that create register names with
> > spaces? "f 0" rather than "f0" for example?
>
> Not that I know of, but someone might want to tack on ".v", say, and
> not know the trick to paste a macro parameter without white-space.


you can see where i'm going with that: these are register names.  if "f 0"
or "vr 5" is a syntax error then logically so is "r0 .v". worth testing
"addi r 0, r  1, 5"?
 > do you happen to know of a 2nd one?  Segher recommended avoiding "/"

> > because it clashes with macro "divide" operator.
>
> Segher was probably worried about expressions rather than macros, in
> particular, expressions in operands.  But from what I've understood so
> far, you're not augmenting operands so this isn't a consideration.


okaay, that makes sense.

one thing we will have to consider: there's enough space in some
circumstances to treat (extend) RA-as-src *differently* from RA-as-dest.
at the moment these are the LD/ST-with-update.

although the range is very limited we can provide 2 bits so that at least
the update of RA can, if desired end up in a nearby register.

i really do not wish to alter the *number* of arguments to each opcode,
this gives the mistaken impression that we are actually altering v3.0B
OpenPOWER scalar ISA functionality, and is also a lot more intrusive into
the gas codebase.

also, given that RA-as-src and RA-as-dest *have* to share the same v3.0B RA
5 bit field between them, it makes no sense in my mind to even allow/create
a 4-operand variant of ldux etc.

with that in mind i'd advocate instead the following:

    ldux RT.v, RA.v?RA.v, RB.v

where the "?" can be searched for in the svp64-operand parsing phase, and
the two different RAs checked to see if they have the same lower 5 bits.
(if they don't, there's no way to fit them into the same underlying v3.0B
RA field)

background, here: we will be proposing some scalar bitmanip extensions
later, similar to and based partly on the RV bitmanip, taking into
consideration existing VSX/VMX operations that, because they are in
VSX/VMX, they never got added to the scalar ISA, and also adding
AVX512-like ternary operator and general-purpose Galois Field.

these are so operand-hungry that we have to do overwrite.  like isel for
example, but overwriting RA by only allowing space for 3 operands.

(RV bitmanip deploys the exact same trick btw: 4-operand, 3 of which are
source, overwrite one as q dest)

that's a limitation in the scalar version that scalar (non-SVP64
implementors) just have to live with.  in SVP64 we can extend the (limited)
range and (partially) get a different dest from src.

the reason i mention that is because whilst *at the moment*
LD/ST-with-update is the only candidate for different src/dest this will
not be the case in the future.


>
> Incidentally, I expect you can ignore the fact that powerpc-solaris
> targets use '/' to start a comment.  Those targets are 32-bit only and
> likely not built by anyone nowadays.


they'd need to create an SVP-augmented powerpc-solaris processor first.

i suspect anyone creating an SVP64-augmented powerpc-solaris system today
would likely run into the OpenPOWER ISA EULA, which as it stands only
indemnifies implementors with IBM's patent pool if they demonstrate v3.0B
compliance [at one of the 4 subset levels].



> None of the three slashes above will make their way into assembly.


ok whew.  thx Alain.

l.


More information about the Binutils mailing list