A little question about function address with no-pie on RISCV
ywgrit
wangxin03@loongson.cn
Tue Dec 3 03:36:12 GMT 2024
It seems there is some hack stuff in ghc. ghc assumes that the type of
all symbols are "OBJECT", which I think is not reasonable.
在 2024/11/28 下午3:39, ywgrit 写道:
>
> 在 2024/11/28 下午3:21, Xi Ruoyao 写道:
>> On Thu, 2024-11-28 at 15:15 +0800, ywgrit wrote:
>>> 在 2024/11/28 下午3:00, Xi Ruoyao 写道:
>>>> On Thu, 2024-11-28 at 14:38 +0800, ywgrit wrote:
>>>>> 在 2024/11/28 上午11:13, Xi Ruoyao 写道:
>>>>>> On Thu, 2024-11-28 at 10:18 +0800, ywgrit wrote:
>>>>>>> Thank you very much.
>>>>>>>
>>>>>>> 在 2024/11/27 下午7:21, Xi Ruoyao 写道:
>>>>>>>> Fangrui has left Google.
>>>>>>>>
>>>>>>>> On Wed, 2024-11-27 at 18:00 +0800, ywgrit wrote:
>>>>>>>>> 1) The program tested on both riscv and x86_64.
>>>>>>>>>
>>>>>>>>> // foo.c
>>>>>>>>>
>>>>>>>>> void foo() {}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> // main.c
>>>>>>>>>
>>>>>>>>> #include <stdio.h>
>>>>>>>>> void foo();
>>>>>>>>>
>>>>>>>>> int main() {
>>>>>>>>> foo();
>>>>>>>>> printf("%p\n", foo);
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> 2) The commands executed.
>>>>>>>>>
>>>>>>>>> gcc -fpic -shared -g -o libfoo.so foo.c
>>>>>>>>>
>>>>>>>>> gcc -c main.c -o main.o
>>>>>>>>>
>>>>>>>>> gcc main.o -o main -lfoo -L. -no-pie
>>>>>>>>>
>>>>>>>>> 3) The result.
>>>>>>>>>
>>>>>>>>> // riscv
>>>>>>>>>
>>>>>>>>> ./main
>>>>>>>>>
>>>>>>>>> 0x120000530
>>>>>>>>>
>>>>>>>>> // x86_64
>>>>>>>>>
>>>>>>>>> ./main
>>>>>>>>>
>>>>>>>>> 0x7ffff3ea05a0
>>>>>>>>>
>>>>>>>>> 4) The question
>>>>>>>>>
>>>>>>>>> With no-pie, riscv print the address of foo@plt which is in
>>>>>>>>> main, x86_64
>>>>>>>>> print the address of foo which is in libfoo.so.
>>>>>>>> On x86_64 if you use
>>>>>>>>
>>>>>>>> gcc main.o -o main -lfoo -L. -no-pie -fno-pie
>>>>>>>>
>>>>>>>> you get the address of the PLT entry too. -fno-pie controls
>>>>>>>> GCC code
>>>>>>>> generation, while -no-pie is directly passed to the linker and
>>>>>>>> mostly
>>>>>>>> ignored by GCC.
>>>>>>>>
>>>>>>>> Traditionally, in a PDE the relocation for addressing
>>>>>>>> *external* symbols
>>>>>>>> was resolved at link time. For example, when t.so provides
>>>>>>>> data symbol
>>>>>>>> dat and function symbol func, and pde.c is:
>>>>>>>>
>>>>>>>> int printf(const char *, ...);
>>>>>>>> extern int dat;
>>>>>>>> extern int func();
>>>>>>>>
>>>>>>>> int main() {
>>>>>>>> printf("%p\n", &dat);
>>>>>>>> printf("%p\n", &func);
>>>>>>>> }
>>>>>>>>
>>>>>>>> On x86_64 "gcc pde.c -fno-pie -no-pie t.so -O2" gives;
>>>>>>>>
>>>>>>>> 0000000000401060 <main>:
>>>>>>>> 401060: 48 83 ec 08 sub $0x8,%rsp
>>>>>>>> 401064: be 20 40 40 00 mov $0x404020,%esi
>>>>>>>> 401069: bf 04 20 40 00 mov $0x402004,%edi
>>>>>>>> 40106e: 31 c0 xor %eax,%eax
>>>>>>>> 401070: e8 bb ff ff ff call 401030 <printf@plt>
>>>>>>>> 401075: be 40 10 40 00 mov $0x401040,%esi
>>>>>>>> 40107a: bf 04 20 40 00 mov $0x402004,%edi
>>>>>>>> 40107f: 31 c0 xor %eax,%eax
>>>>>>>> 401081: e8 aa ff ff ff call 401030 <printf@plt>
>>>>>>>> 401086: 31 c0 xor %eax,%eax
>>>>>>>> 401088: 48 83 c4 08 add $0x8,%rsp
>>>>>>>> 40108c: c3 ret
>>>>>>>> 40108d: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
>>>>>>>> 401094: 00 00 00
>>>>>>>> 401097: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
>>>>>>>> 40109e: 00 00
>>>>>>>>
>>>>>>>> So the address of dat is "fixed" at 0x404020, the address of
>>>>>>>> func is
>>>>>>>> "fixed" at 0x401040, etc. But dat and func are actually in
>>>>>>>> t.so which
>>>>>>>> is loaded elsewhere, so for the data symbol dat, there's a "copy
>>>>>>>> relocation" in the PDE:
>>>>>>>>
>>>>>>>> 000000404020 000400000005 R_X86_64_COPY 0000000000404020 dat + 0
>>>>>>>>
>>>>>>>> which instructs ld.so to copy dat to 0x404020 before running
>>>>>>>> the code in
>>>>>>>> the PDE.
>>>>>>>>
>>>>>>>> For the function symbol func, the solution should be obvious: the
>>>>>>>> "fixed" address 0x401040 is just the address of the PLT entry
>>>>>>>> for func.
>>>>>>> Is it reasonable to access the address of a function and get the
>>>>>>> address
>>>>>>> of the plt of the function?
>>>>>> As long as the result is same when the function pointer is same
>>>>>> when you
>>>>>> take it from the shared library and the PDE, it's fine. C only
>>>>>> requires
>>>>>> a consistent result of function pointer [in]equality test (== and
>>>>>> !=),
>>>>>> but the result from casting the pointer to an integer (like what
>>>>>> printf
>>>>>> does for %p) or other comparison (<. <=, >, >=) are all
>>>>>> implementation-
>>>>>> defined.
>>>>> If the correctness requirement means that the function pointers
>>>>> obtained
>>>>> from the shared library and pde should be same, then
>>>>>
>>>>> x86_64, riscv64, and loongarch64 all satisfy this requirement and the
>>>>> function pointers obtained all point to plt entry.
>>>> It's all the correctness requirement for C. For other languages the
>>>> situation may be different, and the compiler must generate code
>>>> satisfying the language spec. If manually writing assembly the
>>>> programmer has to do all the things correct.
>>> Thanks, I understand now.
>>>
>>>> Also I don't know if we are encountering the same bug despite we've
>>>> both
>>>> found some bug in this area...
>>>>
>>> The second half of my last email describes in detail the bug I
>>> encountered, and I'm actually not sure if we ran into the same bug.
>> Just tried and the bug I found affects RISC-V too. If your bug does not
>> occur on RISC-V it might be a different one.
>>
> It did. Just as following.
>
> In this scenario where ghc fails to run on LoongArch64, both ghc(pde)
> and the shared library fetch the address of function
> 'stg_upd_frame_info' and get the address of
>
> stg_upd_frame_info@plt(passed equality teststated above?), but ghc
> wanted to get the address of stg_upd_frame_info, hence the program error.
>
> Based on the behavior of riscv64 for example above, I'm guessing that
> compile and run ghc on riscv64 (if using llvm as the ghc backend)
> would be wrong just like LoongArch64.
>
> I read the code of gnu-ld and dynamic linker, I found that because the
> st_value of the stg_upd_frame_info symbol in pde is the address of the
> plt entry, dynamic linker will find
>
> stg_upd_frame_info@plt from pde and return it back, whereas if
> st_value == 0, dynamic linker won't find stg_upd_frame_info@plt from
> pde, and will return the address of stg_upd_frame_info
>
> which is found in shared library. The relevant code for this part is
> in the dynamic linker's check_match function.
> The example in my original email was a simplified version of this
> problem with ghc. And in this example, riscv64 behaved just like
> LoongArch64. So if there is a bug on LoongArch, then
>
> riscv probably has a bug too?
> Also, if I add -fno-pie to ghc's compilation options, won't ghc run
> error on x86_64? I'm going to try that.
>
>
More information about the Binutils
mailing list