[PATCH 1/2] [RFC] Add IFUNC support for MIPS (v4)

Mon Dec 5 14:44:00 GMT 2016

On Sat, 21 May 2016, Faraz Shahbazker wrote:

> >>>> @@ -1598,6 +1702,8 @@ mips_elf_create_stub_symbol (struct bfd_link_info *info,
> >>>>    elfh->type = ELF_ST_INFO (STB_LOCAL, STT_FUNC);
> >>>>    elfh->size = size;
> >>>>    elfh->forced_local = 1;
> >>>> +  elfh->other = other;
> >>>> +
> >>>
> >>>  Why do you need to set `other' while you have already set the ISA bit in 
> >>> `value'?  Our rule for handling is to set either rather than both at a 
> >>> time, why do you need an exception here?
> >>
> >> I added `other' based on a suggestion from Richard that it was the preferred
> >> method. The ISA bit setting for MIPS16 was just carried forward from an earlier
> >> patch and can be removed.
> > 
> >  Would you please give me a reference?  All I came across is this:
> > 
> >> This isn't really a property of the hash table symbol but of the kind of
> >> stub being created by the calling code.  I think it would be better to
> >> pass an "st_other" parameter alongside the other stub properties.
> > 
> > -- which only comments on *how* to set `other' rather than whether to set 
> > it at all in the first place.
> >  Also please answer my original question, that is: what is the intent for 
> > this setting -- or IOW what changes if you remove it?
> 
> Nothing. FWIW, even if both are set here, _bfd_mips_elf_link_output_symbol_hook
> takes care of resetting the ISA bit. A similar thing happens with 
> mips_elf_set_plt_sym_value, for example.
> 
> Before this patch, mips_elf_create_stub_symbol was only being used for
> la25 stubs for MIPS or micromips and the pre-existing usage prefers to set 
> the ISA bit. On one hand, I am inclined to stick with that and remove the symbol
> annotation. On the other hand, the behaviour of output_symbol_hook() suggests 
> that annotation is the preferred way of indicating ISA, since it retains that
> setting over the ISA bit, when both are set. What would you recommend?

 I take back my original concern.  You need the ISA annotation for correct 
final symbol table contents as shown with commit a848a2271b9b ("MIPS/BFD: 
Add microMIPS annotation to LA25 stub symbols").  And you need the ISA bit 
set as well for the relocation checks in `mips_elf_calculate_relocation' 
added with commit 9d862524f6ae ("MIPS: Verify the ISA mode and alignment 
of branch and jump targets").  The ISA bit will be stripped from the final 
symbol table produced with the ISA annotation remaining only, by code in 
`_bfd_mips_elf_link_output_symbol_hook'.  This does not affect any dynsym 
table produced where the ISA bit does remain.

> >>>  Also you cannot fix IPLT entries based on global MIPS16/microMIPS object 
> >>> flags as you may have to create mixed IPLT entries in a single output 
> >>> binary if mixed regular/compressed code is present in input.  You may even 
> >>> have to create two IPLT entries for a single function symbol.  Otherwise 
> >>> you'll have issues with tail calls and short delay slots and even if you 
> >>> escape them you may cause unnecessary mode switches that are costly.  So 
> >>> you need to decide based on the referring reloc type.
> 
> Finally got your point! Using the referring reloc to choose between regular
> & compressed stub makes sense for locally bound IFUNCs. For an exported IFUNC
> with external references and non-call relocs, the exported symbol has to 
> be bound to a specific IPLT stub. Okay to pick the stub from the ISA type
> of the resolver in that case?

 Yes, that sounds reasonable to me.

> This change takes the number of IFUNC-related fields in the struct
> mips_elf_link_hash_entry to 7 (4 bit-flags and 3 offsets). Would it be better
> to pack them all in to a struct which we allocate only for IFUNCs? Should
> definitely save memory with large links, especially on 64-bit hosts.

 Why would it save more memory on 64-bit hosts than on 32-bit ones?  I 
would have thought the reverse is the case, owing to the host pointer size 
difference only, as the size of all the offsets is determined by the 
target ABI.  Also the bit flags are free, being merged with the existing 
ones in one 32-bit integer (and we're far from overrunning it) and I can 
only see 2 offsets added in your change, `iplt_offset' and `igot_offset' 
(why is the former of the `bfd_vma' type while the latter is plain `int' 
BTW?).

 So it looks to me like with both offsets using 32-bit types there'd be no 
saving at all on a 64-bit host and a saving of only 4 bytes per symbol on 
a 32-bit host.  I'm not convinced it's worth the extra complication.

> >  There are two reasons for regular PLT to have entries sorted.
> >  First is cache line alignment -- regular PLT entries are 16-byte long 
> > making them align to cache lines of up to 16 bytes in size, while 
> > microMIPS entries are 12-byte long (unless `--insn32' has been used) 
> > guaranteeing alignment to cache lines of up to 4 bytes only.  By placing 
> > all regular entries first they have the guaranteed 16-byte alignment.
> > 
> >  Second is the handling of synthetic symbols which is much easier with 
> > such an arrangement -- see `_bfd_mips_elf_get_synthetic_symtab'.
> > 
> >  If neither were an issue with IPLT, then you could leave entries in a 
> > random order.  But the IPLT entries also have different lengths, even the 
> > regular ones, e.g. with n64, so there will be performance implications 
> > from different ordering.
> >
> >  Can you see if you can do something about it then?  The best would be 
> > sorting entries in the decreasing alignment order and aligning the section 
> > according to the largest one.  So e.g. given the sizes of 32, 24, 20, 16, 
> > the order would be: 32, 16, 24, 20, and the section alignment -- 5.
> 
> I get the general idea. But given that we're selecting 32 vs. 64/
> micromips vs. mips16 and those combinations cannot be inter-linked, I
> don't see how we can have more than 2 stub sizes in a given link.

 I just gave a single general illustration so as not to multiply examples, 
regardless of how many sizes at a time we'll end up with.

> Besides
> after performing the delay slot optimization that you suggested elsewhere,
> all variations of mips32 will be 12 bytes. All variations of mips64 will
> be 28-bytes, except mips16/64 which could be 16-bytes.

 Or 20 bytes in the 48-bit n64 case.  Also we don't support n64 MIPS16 
with dynamic loading, so no MIPS16 n64 mixing is needed.

> Also, note that although we have 32/48-bit variants for 64-bit, the IPLT stub
> size is still the same as the regular entry. Only the # of instructions that
> need to be executed is fewer. The selection is made based on the actual
> address bits needed to get to the GOT/IGOT entry of that symbol. I should
> probably be filling the tail part of these stubs with NOPs.

 Is that to simplify processing?  We should be able to handle all exact 
sizes rather than just 2 of them at a time in a given link.

> The delay slot optimization itself requires all regular entries to be
> grouped together. Beyond that, stub sizes would only matter for mips16/64
> combination. Okay then, to just sort on regular vs. compressed and keep
> all compressed stubs before all regular ones? Compressed stubs generally
> don't have the lagging delay-slot problem, except for the micromips_insn32
> variant.

 Fine with me as long as the alignment constraints I outlined above are 
met and you can handle it all with `_bfd_mips_elf_get_synthetic_symtab' 
too (if applicable; I haven't checked).

> Related point: Should I go ahead and add all mips64 variants? Right now we
> only support regular mips64 - no compressed stubs.

 By MIPS64 do you mean both n64 and n32?

 We have no pure-microMIPS PLT support for the n32/n64 ABI implemented, 
although you should be able to create pure-microMIPS SVR4 n32/n64 
executables.  It also looks to me like o32 microMIPS PLT entries should 
work for n32 executables unchanged -- it's just that we don't have them 
wired for such use right now.  For n64 executables new PLT entries would 
have to be encoded to use LD rather than LW, and ADDIUPC can be retained 
for GOT references because n64 PLT only supports `-msym32' executables.

 Based on these observations I think it will only be reasonable to have 
n64/n32 compressed stub support implemented if the effort to do so is low.  
So use your judgement in deciding whether you want to do that or not.

 Please let me know if you have further questions or comments, and 
apologies for the long RTT.

  Maciej