[PATCH][x86_64] Convert indirect call via GOT to direct when possible

Sriraman Tallam tmsriram@google.com
Tue May 31 18:03:00 GMT 2016


On Sat, May 28, 2016 at 10:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 27, 2016 at 3:14 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>> However, if the function ends up defined in the executable this patch
>>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>
>>>> Here is a simple example:
>>>>
>>>> main.c
>>>> ---------
>>>> extern int foo();
>>>> int main() {
>>>>   return foo();
>>>> }
>>>>
>>>> deffoo.c
>>>> -----------
>>>> int foo() {
>>>>   return 0;
>>>> }
>>>>
>>>> $ gcc -fno-plt main.c deffoo.c
>>>> $objdump -d a.out
>>>>
>>>> 0000000000400626 <main>:
>>>>   ...
>>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>
>>>> The call is indirect even though foo is defined in the executable.
>>>>
>>>> With this patch,
>>>> 0000000000400606 <main>:
>>>>    ....
>>>>    40060a:       90                      nop
>>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>>
>>>> The call is now direct with an extra nop.
>>>>
>>>>
>>>
>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>> Also for
>>>
>>> jmp *foo#GOTPCREL(%rip)
>>>
>>>  ld converts it to
>>>
>>> jmp foo
>>> nop
>>
>> I have modified the patch to keep it consistent with what ld produces.
>>
>> Please take another look.
>>
>> * x86_64.cc (can_convert_callq_to_direct): New function.
>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>> GOT can be converted to direct.
>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>> via GOT that can be converted.
>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>> * testsuite/Makefile.in: Regenerate.
>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>
>
> Do you need to check R_X86_64_REX_GOTPCRELX for branch?

Ok, patch changed to not check for this and refactored a bit.

Thanks
Sri

>
> --
> H.J.
-------------- next part --------------
	* x86_64.cc (can_convert_callq_to_direct): New function.
	Target_x86_64<size>::Scan::global: Check if an indirect call via
	GOT can be converted to direct.
	Target_x86_64<size>::Relocate::relocate: Change any indirect call
	via GOT that can be converted.
	* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/x86_64_indirect_call_to_direct1.s: New file.
	* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index 01cae9f..f5cc0db 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1096,6 +1096,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
 x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
 	$(TEST_OBJDUMP) -dw $< > $@
 
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+	x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+	x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+
 check_SCRIPTS += x86_64_overflow_pc32.sh
 check_DATA += x86_64_overflow_pc32.err
 MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct.sh b/gold/testsuite/x86_64_indirect_call_to_direct.sh
index e69de29..d54d024 100755
--- a/gold/testsuite/x86_64_indirect_call_to_direct.sh
+++ b/gold/testsuite/x86_64_indirect_call_to_direct.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 onwards Free Software Foundation, Inc.
+# Written by Sriraman Tallam <tmsriram@google.com>
+
+# This file is part of gold.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+set -e
+
+grep -q "callq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_call_to_direct1.stdout
+grep -q "jmpq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_jump_to_direct1.stdout
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	call	*foo@GOTPCREL(%rip)
+	ret
+	.size	main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	jmp	*foo@GOTPCREL(%rip)
+	.size	main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..d774d5b 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -891,6 +891,22 @@ class Target_x86_64 : public Sized_target<size, false>
 	    && strcmp(gsym->name(), "_DYNAMIC") != 0);
   }
 
+  // Convert
+  // callq *foo@GOTPCRELX(%rip) to
+  // addr32 callq foo
+  // and jmpq *foo@GOTPCRELX(%rip) to
+  // jmpq foo
+  // nop
+  static bool
+  can_convert_callq_to_direct(const Symbol* gsym)
+  {
+    gold_assert(gsym != NULL);
+    return (gsym->type() == elfcpp::STT_FUNC
+	    && !gsym->is_undefined ()
+	    && !gsym->is_from_dynobj()
+	    && !gsym->is_preemptible());
+  }
+
   // Adjust TLS relocation type based on the options and whether this
   // is a local symbol.
   static tls::Tls_optimization
@@ -2931,17 +2947,34 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
 	// If we convert this from
 	// mov foo@GOTPCREL(%rip), %reg
 	// to lea foo(%rip), %reg.
+	// OR
+	// if we convert
+	// (callq|jmpq) *foo@GOTPCRELX(%rip) to
+	// (callq|jmpq) foo
 	// in Relocate::relocate, then there is nothing to do here.
-	if ((r_type == elfcpp::R_X86_64_GOTPCREL
-	     || r_type == elfcpp::R_X86_64_GOTPCRELX
-	     || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
-	    && reloc.get_r_offset() >= 2
-	    && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
+	bool do_convert_mov_to_lea
+	    = ((r_type == elfcpp::R_X86_64_GOTPCREL
+	        || r_type == elfcpp::R_X86_64_GOTPCRELX
+	        || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
+	       && reloc.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_mov_to_lea(gsym));
+	bool do_convert_callq_to_direct
+	    = (r_type == elfcpp::R_X86_64_GOTPCRELX
+	       && reloc.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_callq_to_direct(gsym));
+	if (do_convert_mov_to_lea || do_convert_callq_to_direct)
 	  {
 	    section_size_type stype;
 	    const unsigned char* view = object->section_contents(data_shndx,
 								 &stype, true);
-	    if (view[reloc.get_r_offset() - 2] == 0x8b)
+	    if (do_convert_mov_to_lea
+		&& view[reloc.get_r_offset() - 2] == 0x8b)
+	      break;
+
+	    if (do_convert_callq_to_direct
+	        && view[reloc.get_r_offset() - 2] == 0xff
+	        && (view[reloc.get_r_offset() - 1] == 0x15
+	    	    || view[reloc.get_r_offset() - 1] == 0x25))
 	      break;
 	  }
 
@@ -3634,6 +3667,45 @@ Target_x86_64<size>::Relocate::relocate(
 	  view[-2] = 0x8d;
 	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
 	}
+      // Convert
+      // callq *foo@GOTPCRELX(%rip) to
+      // addr32 callq foo
+      // and jmpq *foo@GOTPCRELX(%rip) to
+      // jmpq foo
+      // nop
+      else if (r_type == elfcpp::R_X86_64_GOTPCRELX
+	       && rela.get_r_offset() >= 2
+	       && view[-2] == 0xff
+	       && (view [-1] == 0x15 || view [-1] == 0x25)
+	       && (gsym != NULL
+		   && Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
+	{
+	  if (view[-1] == 0x15)
+	    {
+	      // Convert callq *foo@GOTPCRELX(%rip) to addr32 callq.
+	      // Opcode of addr32 is 0x67 and opcode of direct callq is 0xe8.
+	      view[-2] = 0x67;
+	      view[-1] = 0xe8;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.
+	      Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+	    }
+	  else
+	    {
+	      // Convert jmpq *foo@GOTPCRELX(%rip) to
+	      // jmpq foo
+	      // nop
+	      // The opcode of direct jmpq is 0xe9.
+	      view[-2] = 0xe9;
+	      // The opcode of nop is 0x90.
+	      view[3] = 0x90;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.  jmpq is rip
+	      // relative and since the instruction following the jmpq is now
+	      // the nop, offset the address by 1 byte.  The start of the
+              // relocation also moves ahead by 1 byte.
+	      Reloc_funcs::pcrela32(&view[-1], object, psymval, addend,
+				    address - 1);
+	    }
+	}
       else
 	{
 	  if (gsym != NULL)


More information about the Binutils mailing list