[PATCH][x86_64] Convert indirect call via GOT to direct when possible
Sriraman Tallam
tmsriram@google.com
Tue May 31 18:03:00 GMT 2016
On Sat, May 28, 2016 at 10:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 27, 2016 at 3:14 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>> GCC has option -fno-plt which converts all extern calls to indirect
>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>> However, if the function ends up defined in the executable this patch
>>>> will convert those indirect calls/jumps to direct. Since the indirect
>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>
>>>> Here is a simple example:
>>>>
>>>> main.c
>>>> ---------
>>>> extern int foo();
>>>> int main() {
>>>> return foo();
>>>> }
>>>>
>>>> deffoo.c
>>>> -----------
>>>> int foo() {
>>>> return 0;
>>>> }
>>>>
>>>> $ gcc -fno-plt main.c deffoo.c
>>>> $objdump -d a.out
>>>>
>>>> 0000000000400626 <main>:
>>>> ...
>>>> 40062a: ff 15 28 14 00 00 callq *0x1428(%rip) #
>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>
>>>> The call is indirect even though foo is defined in the executable.
>>>>
>>>> With this patch,
>>>> 0000000000400606 <main>:
>>>> ....
>>>> 40060a: 90 nop
>>>> 40060b: e8 03 00 00 00 callq 400613 <foo>
>>>>
>>>> The call is now direct with an extra nop.
>>>>
>>>>
>>>
>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>> Also for
>>>
>>> jmp *foo#GOTPCREL(%rip)
>>>
>>> ld converts it to
>>>
>>> jmp foo
>>> nop
>>
>> I have modified the patch to keep it consistent with what ld produces.
>>
>> Please take another look.
>>
>> * x86_64.cc (can_convert_callq_to_direct): New function.
>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>> GOT can be converted to direct.
>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>> via GOT that can be converted.
>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>> * testsuite/Makefile.in: Regenerate.
>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>
>
> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
Ok, patch changed to not check for this and refactored a bit.
Thanks
Sri
>
> --
> H.J.
-------------- next part --------------
* x86_64.cc (can_convert_callq_to_direct): New function.
Target_x86_64<size>::Scan::global: Check if an indirect call via
GOT can be converted to direct.
Target_x86_64<size>::Relocate::relocate: Change any indirect call
via GOT that can be converted.
* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
* testsuite/Makefile.in: Regenerate.
* testsuite/x86_64_indirect_call_to_direct1.s: New file.
* testsuite/x86_64_indirect_jump_to_direct1.s: New file.
diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index 01cae9f..f5cc0db 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1096,6 +1096,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
$(TEST_OBJDUMP) -dw $< > $@
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+ x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+ x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+ $(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+ gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+ $(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+ $(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+ gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+ $(TEST_OBJDUMP) -dw $< > $@
+
check_SCRIPTS += x86_64_overflow_pc32.sh
check_DATA += x86_64_overflow_pc32.err
MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct.sh b/gold/testsuite/x86_64_indirect_call_to_direct.sh
index e69de29..d54d024 100755
--- a/gold/testsuite/x86_64_indirect_call_to_direct.sh
+++ b/gold/testsuite/x86_64_indirect_call_to_direct.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 onwards Free Software Foundation, Inc.
+# Written by Sriraman Tallam <tmsriram@google.com>
+
+# This file is part of gold.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+set -e
+
+grep -q "callq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_call_to_direct1.stdout
+grep -q "jmpq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_jump_to_direct1.stdout
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+ .text
+ .globl foo
+ .type foo, @function
+foo:
+ ret
+ .size foo, .-foo
+ .globl main
+ .type main, @function
+main:
+ call *foo@GOTPCREL(%rip)
+ ret
+ .size main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+ .text
+ .globl foo
+ .type foo, @function
+foo:
+ ret
+ .size foo, .-foo
+ .globl main
+ .type main, @function
+main:
+ jmp *foo@GOTPCREL(%rip)
+ .size main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..d774d5b 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -891,6 +891,22 @@ class Target_x86_64 : public Sized_target<size, false>
&& strcmp(gsym->name(), "_DYNAMIC") != 0);
}
+ // Convert
+ // callq *foo@GOTPCRELX(%rip) to
+ // addr32 callq foo
+ // and jmpq *foo@GOTPCRELX(%rip) to
+ // jmpq foo
+ // nop
+ static bool
+ can_convert_callq_to_direct(const Symbol* gsym)
+ {
+ gold_assert(gsym != NULL);
+ return (gsym->type() == elfcpp::STT_FUNC
+ && !gsym->is_undefined ()
+ && !gsym->is_from_dynobj()
+ && !gsym->is_preemptible());
+ }
+
// Adjust TLS relocation type based on the options and whether this
// is a local symbol.
static tls::Tls_optimization
@@ -2931,17 +2947,34 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
// If we convert this from
// mov foo@GOTPCREL(%rip), %reg
// to lea foo(%rip), %reg.
+ // OR
+ // if we convert
+ // (callq|jmpq) *foo@GOTPCRELX(%rip) to
+ // (callq|jmpq) foo
// in Relocate::relocate, then there is nothing to do here.
- if ((r_type == elfcpp::R_X86_64_GOTPCREL
- || r_type == elfcpp::R_X86_64_GOTPCRELX
- || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
- && reloc.get_r_offset() >= 2
- && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
+ bool do_convert_mov_to_lea
+ = ((r_type == elfcpp::R_X86_64_GOTPCREL
+ || r_type == elfcpp::R_X86_64_GOTPCRELX
+ || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
+ && reloc.get_r_offset() >= 2
+ && Target_x86_64<size>::can_convert_mov_to_lea(gsym));
+ bool do_convert_callq_to_direct
+ = (r_type == elfcpp::R_X86_64_GOTPCRELX
+ && reloc.get_r_offset() >= 2
+ && Target_x86_64<size>::can_convert_callq_to_direct(gsym));
+ if (do_convert_mov_to_lea || do_convert_callq_to_direct)
{
section_size_type stype;
const unsigned char* view = object->section_contents(data_shndx,
&stype, true);
- if (view[reloc.get_r_offset() - 2] == 0x8b)
+ if (do_convert_mov_to_lea
+ && view[reloc.get_r_offset() - 2] == 0x8b)
+ break;
+
+ if (do_convert_callq_to_direct
+ && view[reloc.get_r_offset() - 2] == 0xff
+ && (view[reloc.get_r_offset() - 1] == 0x15
+ || view[reloc.get_r_offset() - 1] == 0x25))
break;
}
@@ -3634,6 +3667,45 @@ Target_x86_64<size>::Relocate::relocate(
view[-2] = 0x8d;
Reloc_funcs::pcrela32(view, object, psymval, addend, address);
}
+ // Convert
+ // callq *foo@GOTPCRELX(%rip) to
+ // addr32 callq foo
+ // and jmpq *foo@GOTPCRELX(%rip) to
+ // jmpq foo
+ // nop
+ else if (r_type == elfcpp::R_X86_64_GOTPCRELX
+ && rela.get_r_offset() >= 2
+ && view[-2] == 0xff
+ && (view [-1] == 0x15 || view [-1] == 0x25)
+ && (gsym != NULL
+ && Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
+ {
+ if (view[-1] == 0x15)
+ {
+ // Convert callq *foo@GOTPCRELX(%rip) to addr32 callq.
+ // Opcode of addr32 is 0x67 and opcode of direct callq is 0xe8.
+ view[-2] = 0x67;
+ view[-1] = 0xe8;
+ // Convert GOTPCRELX to 32-bit pc relative reloc.
+ Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+ }
+ else
+ {
+ // Convert jmpq *foo@GOTPCRELX(%rip) to
+ // jmpq foo
+ // nop
+ // The opcode of direct jmpq is 0xe9.
+ view[-2] = 0xe9;
+ // The opcode of nop is 0x90.
+ view[3] = 0x90;
+ // Convert GOTPCRELX to 32-bit pc relative reloc. jmpq is rip
+ // relative and since the instruction following the jmpq is now
+ // the nop, offset the address by 1 byte. The start of the
+ // relocation also moves ahead by 1 byte.
+ Reloc_funcs::pcrela32(&view[-1], object, psymval, addend,
+ address - 1);
+ }
+ }
else
{
if (gsym != NULL)
More information about the Binutils
mailing list