[Mips}Using DT tags for handling local ifuncs

Richard Sandiford rdsandiford@googlemail.com
Thu Dec 12 09:47:00 GMT 2013


"Maciej W. Rozycki" <macro@codesourcery.com> writes:
>> Unless this last argument below can convince you or at least give you pause
>> to consider "implicit, explicit, default" ordering, I will start on
>> "explicit, implicit,
>> default".
>> 
>> I don't think it is the "right" thing to do, but what the heck, what
>> is right and wrong
>> anyway? What you are proposing should be workable. I just need to get
>> it correctly
>> written up before shipping.
>
>  Regrettably I still haven't had the time to absorb all the details of 
> this stuff, but I think I've ingested enough to ask one question: given 
> that explicit dynamic relocs will be used anyway, does this new chunk of 
> run-time relocatable data have to be a part of the GOT as defined by the 
> traditional SVR4 MIPS psABI in the first place?  How about we leave the 
> current definitions of the DT_PLTGOT and DT_MIPS_LOCAL_GOTNO dynamic tags 
> and the .got special section intact?
>
>  I gather all that is needed is that ifunc pointers are reachable with 
> gp-relative addressing (so that the same standard calling sequence can be 
> used, either the SVR4 PIC or the non-PIC PLT type, regardless of whether 
> calling an ifunc or an ordinary function), so grouping them in a section 
> called .igot.plt and then either prepending or appending to .got should 
> do; with a linker script even.  Of course the static linker will have to 
> ensure that all the pointers in the combined sections are in range from 
> $gp (and the same with secondary $gp values in the multi-GOT case).

I don't follow the comment about calling convention, sorry.  The problem
here is what to do with:

	lw	$4,%got_disp(foo)($28)

in cases where foo is an ifunc that binds locally.  We need some way
of putting it in the GOT and having an IRELATIVE relocation against it.

I think you're suggesting that we allow the ABI-defined GOT to start at
something other than $gp - 0x7ff0, so that explicitly-relocated data
could go first.  I think that would be more disruptive in some ways,
since the 0x7ff0 offset is hard-coded into glibc.  The resolver for
lazy-binding stubs subtracts 0x7ff0 from the incoming $gp to get the
start of the ABI-defined GOT and then gets the link map from entry 1
(assuming that the GNU extension is in use).

I suppose it'd be possible to adjust $gp in the stub so that $gp - 0x7ff0
is right on entry to the resolver.  But that would be difficult to do
cleanly on n32 and n64, where $gp is call-saved.  The resolver would
probably have to return to the stub, which in turn would mean that the
stub would need call-frame information.

>  BTW, for loading 64-bit addresses I suggest using two temporaries (we've 
> got plenty of them) for a sequence that is faster on superscalar 
> processors, i.e. rather than:
>
> static const bfd_vma mips64_exec_iplt_entry[] =
> {
>   0x3c0f0000,	/* lui $15, %highest(.got.iplt entry)        */
>   0x65ef0000,	/* daddiu $15, $15, %higher(.got.iplt entry) */
>   0x000f7c38,	/* dsll $15,$15, 16                          */
>   0x65ef0000,	/* daddiu $15, $15, %hi(.got.iplt entry)     */
>   0x000f7c38,	/* dsll $15,$15, 16                          */
>   0x01f90000,	/* l[wd] $25, %lo(.got.iplt entry)($15)      */
>   0x03200008,	/* jr $25                                    */
>   0x00000000,	/* nop                                       */
> };
>
> use:
>
> static const bfd_vma mips64_exec_iplt_entry[] =
> {
>   0x3c0f0000,	/* lui $15, %highest(.got.iplt entry)        */
>   0x3c0e0000,	/* lui $14, %hi(.got.iplt entry)             */
>   0x25ef0000,	/* addiu $15, $15, %higher(.got.iplt entry)  */
>   0x000f783c,	/* dsll32 $15, $15, 0x0                      */
>   0x01ee782d,	/* daddu $15, $15, $14                       */
>   0xddf90000,	/* ld $25, %lo(.got.iplt entry)($15)         */
>   0x03200008,	/* jr $25                                    */
>   0x00000000,	/* nop                                       */
> };
>
> (this also avoids a DADDIU erratum early R4000/R4400 chips had).

Yeah, I wondered about this when I first saw it too, but Jack optimized
the sequence based on the address, so that we would only have the full
thing if %highest really was needed.  Since the usual base address is
0x120000000, I think the full sequence will in effect never be used.

I'm not opposed to having two n64 sequences, one for when %highest
is needed and one for when it isn't.  It just doesn't seem like a
priority.

Thanks,
Richard



More information about the Binutils mailing list