[PATCH 1/2] [RFC] Add IFUNC support for MIPS (v4)

Fri Apr 29 14:19:00 GMT 2016

Hi Faraz,

> >> @@ -1598,6 +1702,8 @@ mips_elf_create_stub_symbol (struct bfd_link_info *info,
> >>    elfh->type = ELF_ST_INFO (STB_LOCAL, STT_FUNC);
> >>    elfh->size = size;
> >>    elfh->forced_local = 1;
> >> +  elfh->other = other;
> >> +
> > 
> >  Why do you need to set `other' while you have already set the ISA bit in 
> > `value'?  Our rule for handling is to set either rather than both at a 
> > time, why do you need an exception here?
> 
> I added `other' based on a suggestion from Richard that it was the preferred
> method. The ISA bit setting for MIPS16 was just carried forward from an earlier
> patch and can be removed.

 Would you please give me a reference?  All I came across is this:

> This isn't really a property of the hash table symbol but of the kind of
> stub being created by the calling code.  I think it would be better to
> pass an "st_other" parameter alongside the other stub properties.

-- which only comments on *how* to set `other' rather than whether to set 
it at all in the first place.

 Also please answer my original question, that is: what is the intent for 
this setting -- or IOW what changes if you remove it?

> >  Also you cannot fix IPLT entries based on global MIPS16/microMIPS object 
> > flags as you may have to create mixed IPLT entries in a single output 
> > binary if mixed regular/compressed code is present in input.  You may even 
> > have to create two IPLT entries for a single function symbol.  Otherwise 
> > you'll have issues with tail calls and short delay slots and even if you 
> > escape them you may cause unnecessary mode switches that are costly.  So 
> > you need to decide based on the referring reloc type.
> 
> If the type of the stub matches the IFUNC, then there shouldn't be any
> unnecessary mode switches.

 How do you know that the stub matches the IFUNC given that it is only 
resolved at the run time?  You could have some regular and some compressed 
IFUNC variants provided to choose from, perhaps deliberately.

> I don't quite see the pitfalls of tails calls
> that would require both compressed and regular IPLT for an IFUNC symbol.
> Could you point me to an approximate test-case?

 If you have a tail call in an non-PIC executable, it'll be made with a J 
(or sometimes B) instruction.  This instruction cannot be patched by LD to 
make a mode switch, because there is no suitable instruction in the ISA.  
Consequently the target of the jump (an IPLT stub in this case) and the 
jump itself have to match each other as far as the ISA mode is concerned.
So it's not even a performance issue, but it's a code correctness issue.

 There's no J instruction in the MIPS16 instruction set, so you won't 
encounter MIPS16 tail calls, but still avoiding a mode switch is 
preferable, to prevent a long pipeline stall.

 This is not an issue with an IPLT stub jumping to the IFUNC itself, 
because it's an indirect (register) jump, so the ISA mode can be switched 
if necessary.  Also, as I noted above, you don't know if a switch will be 
necessary up until the run time.  However by optimising to avoid a switch 
on jumping to an IPLT stub you're minimising the number of mode switches 
required to at most one.

> I have added mixed-mode handling for IPLT entries and related test-cases 
> for the next review. Do you see any advantage in having the regular and 
> compressed entries grouped within the IPLT section, versus just occurring
> intermixed as they are encountered? For ordinary PLTs we create all regular
> entries before all compressed entries, but I don't see any problem with
> mixing them up, as long as the offset and symbol type information is correct.

 There are two reasons for regular PLT to have entries sorted.

 First is cache line alignment -- regular PLT entries are 16-byte long 
making them align to cache lines of up to 16 bytes in size, while 
microMIPS entries are 12-byte long (unless `--insn32' has been used) 
guaranteeing alignment to cache lines of up to 4 bytes only.  By placing 
all regular entries first they have the guaranteed 16-byte alignment.

 Second is the handling of synthetic symbols which is much easier with 
such an arrangement -- see `_bfd_mips_elf_get_synthetic_symtab'.

 If neither were an issue with IPLT, then you could leave entries in a 
random order.  But the IPLT entries also have different lengths, even the 
regular ones, e.g. with n64, so there will be performance implications 
from different ordering.

 Can you see if you can do something about it then?  The best would be 
sorting entries in the decreasing alignment order and aligning the section 
according to the largest one.  So e.g. given the sizes of 32, 24, 20, 16, 
the order would be: 32, 16, 24, 20, and the section alignment -- 5.

> >  Fortunately all the necessary details will have already been collected in 
> > case ordinary PLT is to be produced.  So you can just reuse that stuff for 
> > your purpose as well.
> 
> I don't see much opportunity for reuse, since the conditions that mandate
> creation of IPLT entries are distinct from ordinary PLTs, so that code is not
> guaranteed to execute in all cases where IPLTs are needed. But I have
> restructured this condition code to look similar to PLT handling, including
> support for a micromips-insn32 mode. N64/mips16/micromips/insn32 will still be
> determined at the global level, but regular vs. compressed will be selected
> individually for each entry.

 The creation conditions are handled elsewhere, but the ISA modes of the 
callers are recorded as relocations are seen.  This information is then 
used to choose the correct kind of PLT entries in one place, and you can 
also use it to choose the correct kind of IPLT entries where appropriate 
in your code.  For why you need it -- please see my note above.

 Let me know if you have find anything unclear or have any other questions 
or comments.

  Maciej