[RFC][PATCH] MIPS ifunc for binutils

Wed Aug 21 08:08:00 GMT 2013

Jack Carter <jack.carter@imgtec.com> writes:
> On 08/20/2013 01:36 PM, Richard Sandiford wrote:
>> Jack Carter <Jack.Carter@imgtec.com> writes:
>>> I attempted to follow the ARM implementation, but the
>>> traditional MIPS GOT design forced me to dirverge a bit.
>>> All ifunc functions are represented in .iplt section stubs and
>>> in the .igot.plt section table. Each igot.plt entry has an
>>> R_MIPS_IRELOCATE relocation record against it with the initial
>>> igot entry having link time address of the ifunc routine and
>>> after the relocation action, the final runtime target routine
>>> address.
>>>
>>> The only change to the traditional MIPS GOT was to use the .iplt stub
>>> address for the defined ifunc function instead of the function address.
>>> This should allow seamless multigot support
>>
>> Like you said in your earlier message, this seems too big a penalty.
>> I think instead we should come up with some way of integrating ifuncs
>> into the ABI's GOT scheme.  E.g. we could add a new dynamic tag that sets
>> the index of the first local GOT entry, such as DT_MIPS_LOCAL_GOTNO_START.
>> When that tag is defined, ld.so would use it instead of the current hard-coded
>> 1 or 2 (depending on ELF_MIPS_GNU_GOT1_MASK).
>>
>> Then the beginning of the primary GOT could be relocated as normal and we'd
>> have control over the relocation order.  (Of course, we'd only want to use
>> that area where necessary, since the normal scheme is more efficient for
>> the cases that it can handle.)
>>
>> That's just one suggestion.  I'm sure there are other ways of doing it.
>> But I think we should make whichever change seems best now rather than
>> treat it as a future enhancement, otherwise we'll be running the risk
>> of compatibility problems.
>
> It's good to have someone to discuss this with.
>
> My current scheme uses the iplt stubs only with a.outs. If I am not 
> mistaken, I have to use them with non-shared no matter what and in the 
> case of shared a.outs the use defined ifunc will be very limited if at all.
>
> Does this warrant messing with the current got resolution order?

I'm not sure what you mean by "shared a.outs".  If you mean in the sense
of "PIE" (where the output executable is position-independent) or "-mshared"
(where the input ET_RELs are position-independent, but the output executable
isn't unless linked as a PIE), then I don't think those would really affect
how likely it is that the input code uses ifuncs.

But the main problem I see with the ABI GOT vs. ifuncs is that the
ABI GOT needs to be relocated first.  (Or at least, the global part does,
since that is used as a cache when applying the relocations.)  And any
global symbol used in a relocation must also be in the global part of the GOT.

So say we have a DSO with code like this:

--------------------------------------------------------------------
int
foo1 (void)
{
  return 1;
}

int
foo2 (void)
{
  return 2;
}

static int x = 1;
static int *p = &x;

int
(*resolve_foo (void)) (void)
{
  if (*p == 1)
    return foo1;
  else
    return foo2;
}

int foo (void) __attribute__ ((ifunc ("resolve_foo")));

int (*foo_ptr) (void) = foo;
--------------------------------------------------------------------

"foo" is a preemptible global and so doesn't bind locally.  This means
that the "foo_ptr" initialisation should have a relocation against the
global symbol "foo".  That in turns means that "foo" should have a global
GOT entry.

"p" should have a R_MIPS_REL32 relocation against it.

If "foo" isn't preempted (the usual case), the dynamic linker
will resolve "foo" to the local definition and call "resolve_foo".
But it will do this when iterating over the global GOT entries,
before any other relocations have been processed.  AFAICT without
trying it, this means that "resolve_foo" will be called before "p"
has been relocated, so is likely to crash or read spurious memory.

If we use a dynamic tag to stop part of the GOT from being relocated
in this way, the static linker will have full control over the order.
We should probably also say that a STT_GNU_IFUNC must _not_ be in the
ABI-defined global GOT area, even if there is a relocation against it.
(But the second requires the first, since we then need some way of putting
an ifunc into the GOT when explicit GOT relocations require it.)

>>> +/* Is this a non-shared link?  */
>>> +#define MIPS_STATIC_LINK(info)					\
>>> +    ((elf_hash_table(info)->dynamic_sections_created) == FALSE)
>>> +/* Is this a non-fPIC a.out?  */
>>> +#define MIPS_AOUT_LINK(info)					\
>>> +    (info->executable == TRUE)
>>
>> I think it'd be better not to have these and just check the fields directly.
>
> I am happy to change this, but while we are at it are my assumptions 
> correct? It was not clear to me how to make the distinctions. There is a 
> flag for static_link, but does not seem to be used.

Yeah, dynamic_sections_created is the right way to test for static vs.
dynamic output (as in, whether the output has a PT_DYNAMIC).  And:

  executable && !shared => normal executable (ET_EXEC)
  executable && shared  => PIE (ET_DYN)
  !executable && shared => DSO (ET_DYN)

static_link is an AIX-only feature.  It means that DSOs (which are actually
archives on AIX) get linked directly into the output rather than being
linked dynamically.

>>> +/* The format of non-dso IPLT entries.  */
>>> +static const bfd_vma mips_exec_iplt_entry[] =
>>> +{
>>> +  0x3c0f0000,   /* lui $15, %hi(.got.iplt entry)	*/
>>> +  0x01f90000,   /* l[wd] $25, %lo(.got.iplt entry)($15)	*/
>>> +  0x03200008,   /* jr $25				*/
>>> +  0x00000000	/* nop					*/
>>> +};
>>> +
>>> +/* The format of dso IPLT entries.  */
>>> +static const bfd_vma mips_dso_iplt_entry[] =
>>> +{
>>> +  0x03e07821,	/* addu t7, ra, r0				*/
>>> +  0x04110001,	/* bal 8					*/
>>> +  0x3c190000,	/* lui t9, %hi(<.got.plt slot offset>)		*/
>>> +  0x033fc821,	/* addu t9, t9, ra				*/
>>> +  0x8f390000,	/* l[wd] t9, %lo(<.got.plt slot offset>)(t9) 	*/
>>> +  0x03200008,	/* jr $t9					*/
>>> +  0x01e0f821,	/* addu ra, t7, r0				*/
>>> +  0x00000000	/* nop						*/
>>> +};
>>
>> Since we don't use IPLTs for DSOs, just executables, it looks like this
>> is really a PIE vs. non-PIE distinction.  I think the comments would be
>> clearer that way.  But why do we need IPLT entries for PIEs but not DSOs?
>> I'd assumed we didn't need them for DSOs because all calls must go via
>> the GOT, but that's true of PIEs as well.
>>
>> I personally don't care much -- in fact argued against it --
>> but the current PLT entries were chosen to be MIPS I compatible and
>> so avoided filling the load delay slot.  Whichever way we decide to go,
>> I think it'd be better for "normal" PLTs and IPLTs to be consistent.
>
> The DSO version is no longer needed. I just hated to rip it out after I 
> got it working and then figured out how to not need the iplt for dso 
> ifunc definitions.
>
> I'll rip it out.

Thanks.  The MIPS I comment applies to the first entry too though.
There would need to be a "nop" between the load and JR.  The trick
used in PLTs is to allow the "lui" of the following PLT entry to
fill the delay slot instead (but at the cost of fetching 5 instructions
instead of 4).

I still think it's silly to cater for MIPS I here, but like I say,
we should be consistent...

>>> @@ -8177,10 +8432,14 @@ _bfd_mips_elf_check_relocs (bfd *abfd, struct bfd_link_info *info,
>>>
>>>   	      /* We need a stub, not a plt entry for the undefined
>>>   		 function.  But we record it as if it needs plt.  See
>>> -		 _bfd_elf_adjust_dynamic_symbol.  */
>>> -	      h->needs_plt = 1;
>>> -	      h->type = STT_FUNC;
>>> -	    }
>>> +		 _bfd_elf_adjust_dynamic_symbol.  If it is an ifunc
>>> +		 symbol it will go into an iplt section and not plt.  */
>>> +	      if (h->type != STT_GNU_IFUNC)
>>> +		{
>>> +		  h->needs_plt = 1;
>>> +		  h->type = STT_FUNC;
>>> +		}
>>> +	      }
>>
>> I think only the "h->type = STT_FUNC" bit should be conditional.
>> If a DSO has a call to a preemptible STT_GNU_IFUNC, it needs to use
>> a normal lazy-binding stub.
>
> Not sure about this one. I have test cases that call from and to ifunc 
> symbols defined in dsos and visa versa. That said, I will look at this 
> again and either change it or come up with a decent defence.

OK, but the case I meant was when a DSO A defines an ifunc "foo",
and either the executable or a higher-priority DSO B overrides "foo"
with a different definition.  A must then call B's definition of "foo",
so it needs a stub to cater for that case.

Thanks,
Richard