Add an array_mode_supported_p target hook

Richard Sandiford richard.sandiford@linaro.org
Thu Mar 31 14:56:00 GMT 2011


Richard Guenther <richard.guenther@gmail.com> writes:
> On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds an array_mode_supported_p hook, which says whether
>> MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
>> It follows on from the discussion here:
>>
>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
>>
>> The intended use of the hook is to allow small arrays of vectors
>> to have a non-BLK mode, and hence to be stored in rtl registers.
>> These arrays are used both in the ARM arm_neon.h API and in the
>> optabs proposed in:
>>
>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
>>
>> The tail end of the thread was about the definition of TYPE_MODE:
>>
>> #define TYPE_MODE(NODE) \
>>  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
>>   ? vector_type_mode (NODE) : (NODE)->type.mode)
>>
>> with this outcome:
>>
>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html
>>
>> To summarise my take on it:
>>
>> - The current definition of TYPE_MODE isn't sufficient even for vector
>>  modes and vector_mode_supported_p, because non-vector types can have
>>  vector modes.
>>
>> - We should no longer treat types as having one mode everywhere.
>>  We should instead replace TYPE_MODE with a function that takes
>>  a context.  Tests of things like vector_mode_supported_p would
>>  move from layout_type to this new function.
>>
>> I think this patch fits within that scheme.  array_mode_supported_p
>> would be treated in the same way as vector_mode_supported_p.
>>
>> I realise the ideal would be to get rid of TYPE_MODE first.
>> But that's going to be a longer-term thing.  Now that there's
>> at least a plan, I'd like to press ahead with the array stuff
>> on the basis that
>>
>> (a) although the new hook won't work with the "target" attribute,
>>    our current mode handling doesn't work in just the same way.
>>
>> (b) the new hook doesn't interfere with the plan.
>>
>> (c) getting good code from the intrinsics (and support for these
>>    instructions in the vectoriser) is going to be much more important
>>    to most ARM users than the ability to turn Neon on and off for
>>    individual functions in a TU.
>>
>> To give an example of the difference, the Neon code posted here:
>>
>>    http://hilbert-space.de/?p=22
>>
>> produces this inner loop before the patch (but with
>> http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):
>>
>> .L3:
>>        vld3.8  {d16-d18}, [r1]!
>>        vstmia  ip, {d16-d18}
>>        fldd    d19, [sp, #24]
>>        adr     r5, .L6
>>        ldmia   r5, {r4-r5}
>>        fldd    d16, [sp, #32]
>>        vmov    d18, r4, r5  @ v8qi
>>        vmull.u8        q9, d19, d18
>>        adr     r5, .L6+8
>>        ldmia   r5, {r4-r5}
>>        vmov    d17, r4, r5  @ v8qi
>>        vstmia  sp, {d18-d19}
>>        vmlal.u8        q9, d16, d17
>>        fldd    d16, [sp, #40]
>>        adr     r5, .L6+16
>>        ldmia   r5, {r4-r5}
>>        vmov    d17, r4, r5  @ v8qi
>>        vmlal.u8        q9, d16, d17
>>        add     r3, r3, #1
>>        vshrn.i16       d16, q9, #8
>>        cmp     r3, r2
>>        vst1.8  {d16}, [r0]!
>>        bne     .L3
>>
>> With both patches applied, the inner loop is:
>>
>> .L3:
>>        vld3.8  {d18-d20}, [r1]!
>>        vmull.u8        q8, d18, d21
>>        vmlal.u8        q8, d19, d22
>>        vmlal.u8        q8, d20, d23
>>        add     r3, r3, #1
>>        vshrn.i16       d16, q8, #8
>>        cmp     r3, r2
>>        vst1.8  {d16}, [r0]!
>>        bne     .L3
>>
>> Tested on arm-linux-gnueabi.  OK to install?
>
> It looks reasonable given the past discussion, but - can you move forward
> with the Neon stuff a bit to see if it really fits?  Or is this all
> that is needed
> for the load/store lane support as well (apart from vectorizer changes of
> course).

Yeah, I have a prototype that hacks up some C support for generating the
(otherwise internal-only) load/store built-in functions that the vectoriser
is suppsoed to generate.  This patch is all that seems to be needed for the
types and optabs generation to work in the natural way.

I'm happy to leave it until the vectoriser stuff is in a more
submittable state though.  Especially given:

> Can you check the code generated by for example
>
> float foo(char *p)
> {
>   float a[2];
>   int i;
>   ((char *)a)[0] = p[0];
>   ((char *)a)[1] = p[1];
>   ((char *)a)[2] = p[2];
>   ((char *)a)[3] = p[3];
>   ((char *)a)[4] = p[4];
>   ((char *)a)[5] = p[5];
>   ((char *)a)[6] = p[6];
>   ((char *)a)[7] = p[7];
>   return a[0] + a[1];
> }
>
> for an array a that would get such a larger mode?  Thus, check what
> happens with partial defs of different types (just to avoid ICEs like the
> ones Jakub was fixing yesterday).

OK, I tried:

#include "arm_neon.h"

uint32x2_t foo(char *p)
{
  uint32x2_t a[2];
  int i;
  ((char *)a)[0] = p[0];
  ((char *)a)[1] = p[1];
  ((char *)a)[2] = p[2];
  ((char *)a)[3] = p[3];
  ((char *)a)[4] = p[4];
  ((char *)a)[5] = p[5];
  ((char *)a)[6] = p[6];
  ((char *)a)[7] = p[7];
  ((char *)a)[8] = p[8];
  ((char *)a)[9] = p[9];
  ((char *)a)[10] = p[10];
  ((char *)a)[11] = p[11];
  ((char *)a)[12] = p[12];
  ((char *)a)[13] = p[13];
  ((char *)a)[14] = p[14];
  ((char *)a)[15] = p[15];
  return vadd_u32 (a[0], a[1]);
}

uint32x4_t bar(char *p, uint32x4_t *b)
{
  uint32x4_t a[2];
  int i;
  ((char *)a)[0] = p[0];
  ((char *)a)[1] = p[1];
  ((char *)a)[2] = p[2];
  ((char *)a)[3] = p[3];
  ((char *)a)[4] = p[4];
  ((char *)a)[5] = p[5];
  ((char *)a)[6] = p[6];
  ((char *)a)[7] = p[7];
  ((char *)a)[8] = p[8];
  ((char *)a)[9] = p[9];
  ((char *)a)[10] = p[10];
  ((char *)a)[11] = p[11];
  ((char *)a)[12] = p[12];
  ((char *)a)[13] = p[13];
  ((char *)a)[14] = p[14];
  ((char *)a)[15] = p[15];
  ((char *)a)[16 + 0] = p[16 + 0];
  ((char *)a)[16 + 1] = p[16 + 1];
  ((char *)a)[16 + 2] = p[16 + 2];
  ((char *)a)[16 + 3] = p[16 + 3];
  ((char *)a)[16 + 4] = p[16 + 4];
  ((char *)a)[16 + 5] = p[16 + 5];
  ((char *)a)[16 + 6] = p[16 + 6];
  ((char *)a)[16 + 7] = p[16 + 7];
  ((char *)a)[16 + 8] = p[16 + 8];
  ((char *)a)[16 + 9] = p[16 + 9];
  ((char *)a)[16 + 10] = p[16 + 10];
  ((char *)a)[16 + 11] = p[16 + 11];
  ((char *)a)[16 + 12] = p[16 + 12];
  ((char *)a)[16 + 13] = p[16 + 13];
  ((char *)a)[16 + 14] = p[16 + 14];
  ((char *)a)[16 + 15] = p[16 + 15];
  return vaddq_u32 (a[0], a[1]);
}

It seemed to avoid the problem Jakub was seeing, but the second function
hit the known const_int reload failure for these modes:

    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46329

Richard



More information about the Gcc-patches mailing list