A little question about function address with no-pie on RISCV

ywgrit wangxin03@loongson.cn
Tue Dec 3 03:36:12 GMT 2024


It seems there is some hack stuff in ghc. ghc assumes that the type of 
all symbols are "OBJECT", which I think is not reasonable.

在 2024/11/28 下午3:39, ywgrit 写道:
>
> 在 2024/11/28 下午3:21, Xi Ruoyao 写道:
>> On Thu, 2024-11-28 at 15:15 +0800, ywgrit wrote:
>>> 在 2024/11/28 下午3:00, Xi Ruoyao 写道:
>>>> On Thu, 2024-11-28 at 14:38 +0800, ywgrit wrote:
>>>>> 在 2024/11/28 上午11:13, Xi Ruoyao 写道:
>>>>>> On Thu, 2024-11-28 at 10:18 +0800, ywgrit wrote:
>>>>>>> Thank you very much.
>>>>>>>
>>>>>>> 在 2024/11/27 下午7:21, Xi Ruoyao 写道:
>>>>>>>> Fangrui has left Google.
>>>>>>>>
>>>>>>>> On Wed, 2024-11-27 at 18:00 +0800, ywgrit wrote:
>>>>>>>>> 1) The program tested on both riscv and x86_64.
>>>>>>>>>
>>>>>>>>> // foo.c
>>>>>>>>>
>>>>>>>>> void foo() {}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> // main.c
>>>>>>>>>
>>>>>>>>> #include <stdio.h>
>>>>>>>>> void foo();
>>>>>>>>>
>>>>>>>>> int main() {
>>>>>>>>>              foo();
>>>>>>>>>              printf("%p\n", foo);
>>>>>>>>>              return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> 2) The commands executed.
>>>>>>>>>
>>>>>>>>> gcc -fpic -shared -g -o libfoo.so foo.c
>>>>>>>>>
>>>>>>>>> gcc -c main.c -o main.o
>>>>>>>>>
>>>>>>>>> gcc main.o -o main -lfoo -L. -no-pie
>>>>>>>>>
>>>>>>>>> 3) The result.
>>>>>>>>>
>>>>>>>>> // riscv
>>>>>>>>>
>>>>>>>>> ./main
>>>>>>>>>
>>>>>>>>> 0x120000530
>>>>>>>>>
>>>>>>>>> // x86_64
>>>>>>>>>
>>>>>>>>> ./main
>>>>>>>>>
>>>>>>>>> 0x7ffff3ea05a0
>>>>>>>>>
>>>>>>>>> 4) The question
>>>>>>>>>
>>>>>>>>> With no-pie, riscv print the address of foo@plt which is in 
>>>>>>>>> main, x86_64
>>>>>>>>> print the address of foo which is in libfoo.so.
>>>>>>>> On x86_64 if you use
>>>>>>>>
>>>>>>>> gcc main.o -o main -lfoo -L. -no-pie -fno-pie
>>>>>>>>
>>>>>>>> you get the address of the PLT entry too.  -fno-pie controls 
>>>>>>>> GCC code
>>>>>>>> generation, while -no-pie is directly passed to the linker and 
>>>>>>>> mostly
>>>>>>>> ignored by GCC.
>>>>>>>>
>>>>>>>> Traditionally, in a PDE the relocation for addressing 
>>>>>>>> *external* symbols
>>>>>>>> was resolved at link time.  For example, when t.so provides 
>>>>>>>> data symbol
>>>>>>>> dat and function symbol func, and pde.c is:
>>>>>>>>
>>>>>>>> int printf(const char *, ...);
>>>>>>>> extern int dat;
>>>>>>>> extern int func();
>>>>>>>>
>>>>>>>> int main() {
>>>>>>>>     printf("%p\n", &dat);
>>>>>>>>     printf("%p\n", &func);
>>>>>>>> }
>>>>>>>>
>>>>>>>> On x86_64 "gcc pde.c -fno-pie -no-pie t.so -O2" gives;
>>>>>>>>
>>>>>>>> 0000000000401060 <main>:
>>>>>>>>       401060:    48 83 ec 08              sub $0x8,%rsp
>>>>>>>>       401064:    be 20 40 40 00           mov $0x404020,%esi
>>>>>>>>       401069:    bf 04 20 40 00           mov $0x402004,%edi
>>>>>>>>       40106e:    31 c0                    xor %eax,%eax
>>>>>>>>       401070:    e8 bb ff ff ff           call 401030 <printf@plt>
>>>>>>>>       401075:    be 40 10 40 00           mov $0x401040,%esi
>>>>>>>>       40107a:    bf 04 20 40 00           mov $0x402004,%edi
>>>>>>>>       40107f:    31 c0                    xor %eax,%eax
>>>>>>>>       401081:    e8 aa ff ff ff           call 401030 <printf@plt>
>>>>>>>>       401086:    31 c0                    xor %eax,%eax
>>>>>>>>       401088:    48 83 c4 08              add $0x8,%rsp
>>>>>>>>       40108c:    c3                       ret
>>>>>>>>       40108d:    66 2e 0f 1f 84 00 00     cs nopw 0x0(%rax,%rax,1)
>>>>>>>>       401094:    00 00 00
>>>>>>>>       401097:    66 0f 1f 84 00 00 00     nopw 0x0(%rax,%rax,1)
>>>>>>>>       40109e:    00 00
>>>>>>>>
>>>>>>>> So the address of dat is "fixed" at 0x404020, the address of 
>>>>>>>> func is
>>>>>>>> "fixed" at 0x401040, etc.  But dat and func are actually in 
>>>>>>>> t.so which
>>>>>>>> is loaded elsewhere, so for the data symbol dat, there's a "copy
>>>>>>>> relocation" in the PDE:
>>>>>>>>
>>>>>>>> 000000404020  000400000005 R_X86_64_COPY 0000000000404020 dat + 0
>>>>>>>>
>>>>>>>> which instructs ld.so to copy dat to 0x404020 before running 
>>>>>>>> the code in
>>>>>>>> the PDE.
>>>>>>>>
>>>>>>>> For the function symbol func, the solution should be obvious: the
>>>>>>>> "fixed" address 0x401040 is just the address of the PLT entry 
>>>>>>>> for func.
>>>>>>> Is it reasonable to access the address of a function and get the 
>>>>>>> address
>>>>>>> of the plt of the function?
>>>>>> As long as the result is same when the function pointer is same 
>>>>>> when you
>>>>>> take it from the shared library and the PDE, it's fine. C only 
>>>>>> requires
>>>>>> a consistent result of function pointer [in]equality test (== and 
>>>>>> !=),
>>>>>> but the result from casting the pointer to an integer (like what 
>>>>>> printf
>>>>>> does for %p) or other comparison (<. <=, >, >=) are all 
>>>>>> implementation-
>>>>>> defined.
>>>>> If the correctness requirement means that the function pointers 
>>>>> obtained
>>>>> from the shared library and pde should be same, then
>>>>>
>>>>> x86_64, riscv64, and loongarch64 all satisfy this requirement and the
>>>>> function pointers obtained all point to plt entry.
>>>> It's all the correctness requirement for C.  For other languages the
>>>> situation may be different, and the compiler must generate code
>>>> satisfying the language spec.  If manually writing assembly the
>>>> programmer has to do all the things correct.
>>> Thanks, I understand now.
>>>
>>>> Also I don't know if we are encountering the same bug despite we've 
>>>> both
>>>> found some bug in this area...
>>>>
>>> The second half of my last email describes in detail the bug I
>>> encountered, and I'm actually not sure if we ran into the same bug.
>> Just tried and the bug I found affects RISC-V too.  If your bug does not
>> occur on RISC-V it might be a different one.
>>
> It did. Just as following.
>
> In this scenario where ghc fails to run on LoongArch64, both ghc(pde) 
> and the shared library fetch the address of function 
> 'stg_upd_frame_info' and get the address of
>
> stg_upd_frame_info@plt(passed equality teststated above?), but ghc 
> wanted to get the address of stg_upd_frame_info, hence the program error.
>
> Based on the behavior of riscv64 for example above, I'm guessing that 
> compile and run ghc on riscv64 (if using llvm as the ghc backend) 
> would be wrong just like LoongArch64.
>
> I read the code of gnu-ld and dynamic linker, I found that because the 
> st_value of the stg_upd_frame_info symbol in pde is the address of the 
> plt entry, dynamic linker will find
>
> stg_upd_frame_info@plt from pde and return it back, whereas if 
> st_value == 0, dynamic linker won't find stg_upd_frame_info@plt from 
> pde, and will return the address of stg_upd_frame_info
>
> which is found in shared library. The relevant code for this part is 
> in the dynamic linker's check_match function.
> The example in my original email was a simplified version of this 
> problem with ghc. And in this example, riscv64 behaved just like 
> LoongArch64. So if there is a bug on LoongArch, then
>
> riscv probably has a bug too?
> Also, if I add -fno-pie to ghc's compilation options, won't ghc run 
> error on x86_64? I'm going to try that.
>
>



More information about the Binutils mailing list