Summary: | AArch64: Symbol address inconsistency across compilation units | ||
---|---|---|---|
Product: | binutils | Reporter: | Nick Clifton <nickc> |
Component: | ld | Assignee: | Jiong Wang <jiwang> |
Status: | RESOLVED FIXED | ||
Severity: | critical | CC: | fweimer, jiwang |
Priority: | P2 | ||
Version: | 2.29 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Demonstration of the bug |
Here is a reduced, C based, test case: % cat main.c extern int int_datum_1; extern int int_datum_2; extern int printf (const char *, ...); extern void call_lib (void); int * ptr1 = & int_datum_1; int main (void) { printf ("main: int data = %p %p\n", & int_datum_1, & int_datum_2); call_lib (); return 0; } % cat lib.c int int_datum_1 = 1; int int_datum_2 = 1; extern int printf (const char *, ...); void call_lib (void) { printf ("lib : int data = %p %p\n", & int_datum_1, & int_datum_2); } % gcc -fPIC -c main.c lib.c % gcc -fPIC -Wl,-Bsymbolic,-z,defs,-soname,libfred.so -shared -o libl.so lib.o % gcc -fPIC -o test main.o libl.so % LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH ./test main: int data = 0x420038 0xffff93f40018 lib : int data = 0xffff93f4001c 0xffff93f40018 Note how the address of int_datum_2 is consistent, but the address of int_datum_1 differs. The difference is that the address of int_datum_1 is used to initialise ptr1 in main.c. Hi Nick,
Looks like AArch64 could avoid emiting COPY relocation if it's a relocation against writable section. We could adopt similar code from elf_x86_64_adjust_dynamic_symbol or ppc64_elf_adjust_dynamic_symbol.
> It looks like the linker creates a copy of IEEEsingle in the executable but
> then the GOT entry for IEEEsingle is not updated in libLLVM.so, causing any
> accesses to IEEEsingle in libLLVM.so to refer to the original definition
Does this means this is a glibc bug as well? dynamic linker needs to redirect the reference in the .so to the copy as well?
Hi Jiong, Thanks for taking a look at this. > Looks like AArch64 could avoid emiting COPY relocation if it's a relocation > against writable section. We could adopt similar code from > elf_x86_64_adjust_dynamic_symbol or ppc64_elf_adjust_dynamic_symbol. Ah - maybe not. The original testcase (in the BZ) shows the bug happening when constant data is used rather than writeable data. You can also extend the second testcase I posted to use constant data and the same problem happens. >> It looks like the linker creates a copy of IEEEsingle in the executable but >> then the GOT entry for IEEEsingle is not updated in libLLVM.so, causing any >> accesses to IEEEsingle in libLLVM.so to refer to the original definition > > Does this means this is a glibc bug as well? dynamic linker needs to redirect > the reference in the .so to the copy as well? This is a possibility, although I am not an expert on the dynamic linker. I think however that we need to verify that the static linker is producing the correct dynamic relocations first. Cheers Nick (In reply to Nick Clifton from comment #3) > Hi Jiong, > > Thanks for taking a look at this. > > > Looks like AArch64 could avoid emiting COPY relocation if it's a relocation > > against writable section. We could adopt similar code from > > elf_x86_64_adjust_dynamic_symbol or ppc64_elf_adjust_dynamic_symbol. > > Ah - maybe not. The original testcase (in the BZ) shows the bug happening > when > constant data is used rather than writeable data. You can also extend the > second > testcase I posted to use constant data and the same problem happens. > > > >> It looks like the linker creates a copy of IEEEsingle in the executable but > >> then the GOT entry for IEEEsingle is not updated in libLLVM.so, causing any > >> accesses to IEEEsingle in libLLVM.so to refer to the original definition > > > > Does this means this is a glibc bug as well? dynamic linker needs to redirect > > the reference in the .so to the copy as well? > > This is a possibility, although I am not an expert on the dynamic linker. I > think > however that we need to verify that the static linker is producing the > correct > dynamic relocations first. Thanks, I will prepare a patch to fix the BFD static linker. (In reply to Nick Clifton from comment #3) > Hi Jiong, > > Thanks for taking a look at this. > > > Looks like AArch64 could avoid emiting COPY relocation if it's a relocation > > against writable section. We could adopt similar code from > > elf_x86_64_adjust_dynamic_symbol or ppc64_elf_adjust_dynamic_symbol. > > Ah - maybe not. The original testcase (in the BZ) shows the bug happening > when > constant data is used rather than writeable data. You can also extend the > second > testcase I posted to use constant data and the same problem happens. Hi Nick, It looks like those cases are relro sections, that the elf attribute is still writable, so COPY relocation still can be elimiated I have posted the patch at https://sourceware.org/ml/binutils/2017-06/msg00070.html, both testcases on this PR passed. The master branch has been updated by Jiong Wang <jiwang@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bc327528fd2ccdc6c29ab6ae608085dddbad5cc8 commit bc327528fd2ccdc6c29ab6ae608085dddbad5cc8 Author: Jiong Wang <jiong.wang@arm.com> Date: Wed Jun 7 12:05:39 2017 +0100 [AArch64] Allow COPY relocation elimination As discussed at the PR, this patch tries to avoid COPY relocation generation and propagate the original relocation into runtime if it was relocating on writable section. The ELIMINATE_COPY_RELOCS has been set to true and it's underlying infrastructure has been improved so that the COPY reloc elimination at least working on absoluate relocations (ABS64) after this patch. bfd/ PR ld/21532 * elfnn-aarch64.c (ELIMINATE_COPY_RELOCS): Set to 1. (elfNN_aarch64_final_link_relocate): Also propagate relocations to runtime for copy relocation elimination cases. (alias_readonly_dynrelocs): New function. (elfNN_aarch64_adjust_dynamic_symbol): Keep the dynamic relocs instead of generating copy relocation if it is not against read-only sections. (elfNN_aarch64_check_relocs): Likewise. ld/ * testsuite/ld-aarch64/copy-reloc-eliminate.d: New test. * testsuite/ld-aarch64/copy-reloc-exe-eliminate.s: New test source file. * testsuite/ld-aarch64/aarch64-elf.exp: Run new testcase. Mark as fixed. Temporarily reopen this bug as the fix was reverted because it will leak some PC-relative relocations in while PC-relative support on copy relocation elimination is going on as explained at the email. So this fix itself is incomplete. I will commit the fix later together PC-relative support after which is done and reviewed. Sorry for the trouble. (In reply to Jiong Wang from comment #8) > Temporarily reopen this bug as the fix was reverted because it will leak > some PC-relative relocations in while PC-relative support on copy relocation > elimination is going on as explained at the email. So this fix itself is > incomplete. The PC-relative regression is looks like the following: unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol `_ZTIi@@CXXABI_1.3' The failure was because the symbol "_ZTIi@@CXXABI_1.3" is referenced by both absoluate relocation types R_AARCH64_ABS_64 and pc-relative relocation types R_AARCH64_ADR_PREL_PG_HI21. While The current elfNN_aarch64_check_relocs implementation only allocate dynrelocs for absoluate relocation types R_AARCH64_NN, this caused elfNN_aarch64_adjust_dynamic_symbol missed the reference of "_ZTIi@@CXXABI_1.3" by R_AARCH64_ADR_PREL_PG_HI21 which is a text relocation that is read-only in which case we should keep copy relocation. My understanding of the copy relocation elimination framwork in BFD linker is we should always allocate dynrelocs for all those relocation types that are possible to introduce copy relocations, and they are actually relocation types that will be used to form address of external object in non-PIC executable. So a complete fix should also fix the current elfNN_aarch64_check_relocs implementation to: * Also allocate dynrelocs for a few pc-relative relocation types which will be used to form object address under tiny, small memory model. As AArch64 dynamic linker does not support pc-relative relocations, we need to alway keep copy relocation for them later in adjust_dynamic_symbol if pc_count is not zero to avoid introducing runtime relocation for them. * For large memory model, AArch64 also use some MOVW_G* relocations to form object address under non-PIC mode. We need to allocate dynrelocs for their reference on symbol as well. I am testing a fix and will post it after finishing a full test on binutils/gcc/g++ check plus some big project linking. The master branch has been updated by Jiong Wang <jiwang@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6353d82b8fa825c2143f41e84b0d5d4446c6e99a commit 6353d82b8fa825c2143f41e84b0d5d4446c6e99a Author: Jiong Wang <jiong.wang@arm.com> Date: Thu Jun 15 16:51:01 2017 +0100 [AArch64] Allow COPY relocation elimination As discussed at the PR, this patch tries to avoid COPY relocation generation and propagate the original relocation into runtime if it was relocating on writable section. The ELIMINATE_COPY_RELOCS has been set to true and it's underlying infrastructure has been improved so that the COPY reloc elimination at least working on absoluate relocations (ABS64) on AArch64. BFD linker copy relocation elimination framwork requires the backend to always allocate dynrelocs for all those relocation types that are possible to introduce copy relocations. This is for adjust_dynamic_symbol hook to be able to get all symbol reference information. Should one symbol is referenced by more than one relocations, if there is any of them needs copy relocation then linker should generate it. bfd/ PR ld/21532 * elfnn-aarch64.c (ELIMINATE_COPY_RELOCS): Set to 1. (elfNN_aarch64_final_link_relocate): Also propagate relocations to runtime for if there needs copy relocation elimination. (need_copy_relocation_p): New function. Return true for symbol with pc-relative references and if it's against read-only sections. (elfNN_aarch64_adjust_dynamic_symbol): Use need_copy_relocation_p. (elfNN_aarch64_check_relocs): Allocate dynrelocs for relocation types that are related with accessing external objects. (elfNN_aarch64_gc_sweep_hook): Sync the relocation types with the change in elfNN_aarch64_check_relocs. ld/ * testsuite/ld-aarch64/copy-reloc-exe-2.s: New test source file. * testsuite/ld-aarch64/copy-reloc-2.d: New test. * testsuite/ld-aarch64/copy-reloc-exe-eliminate.s: New test source file. * testsuite/ld-aarch64/copy-reloc-eliminate.d: New test. * testsuite/ld-aarch64/copy-reloc-so.s: Define new global objects. * testsuite/ld-aarch64/aarch64-elf.exp: Run new tests. Mark as fixed. |
Created attachment 10074 [details] Demonstration of the bug The unit test fails on AArch64 and passes on x86-64, s390x and ppc64. A static const struct (APFloat::IEEEsingle) is defined in the shared library (libLLVM.so) and accessed both from within the library as well as the executable that links against the shared library. On AArch64, taking the address of IEEEsingle yields different results for code in the executable and code in the shared library. It looks like the linker creates a copy of IEEEsingle in the executable but then the GOT entry for IEEEsingle is not updated in libLLVM.so, causing any accesses to IEEEsingle in libLLVM.so to refer to the original definition (which in itself is not a big problem as all the data is constant). However, this breaks the LLVM APFloat class because it uses the addresses of the various static const structs (IEEEsingle, IEEEdouble, IEEEquad, x87DoubleExtended, etc. which define the properties of the different floating point types) to dispatch on the floating point type in the member functions of APFloat. Thus it is crucial that any of those constants always have the same address no matter where they are referenced from. APFloat is an arbitrary precision floating point implementation that is used in various places across the whole compiler, e.g. there are various codepaths involving APFloat that will lead to crashes on AArch64. Running the attached test case should yield the following output on AArch64: $ ./build.sh Address of IEEEsingle (APFloatTest.cpp): 0x420208 Address of IEEEsingle (APFloat.cpp) : 0x3ff78cf0a58 And the following output on x86-64: $ ./build.sh Address of IEEEsingle (APFloatTest.cpp): 0x7fbedc3cc9a0 Address of IEEEsingle (APFloat.cpp) : 0x7fbedc3cc9a0 As you can see the addresses are not consistent on AArch64.