Bug 29655 - s390x gas generates PC32DBL instead of PLT32DBL for function call
Summary: s390x gas generates PC32DBL instead of PLT32DBL for function call
Status: UNCONFIRMED
Alias: None
Product: binutils
Classification: Unclassified
Component: gas (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-06 13:03 UTC by Rui Ueyama
Modified: 2022-10-13 02:23 UTC (History)
6 users (show)

See Also:
Host:
Target: s390x-*-linux-gnu
Build:
Last reconfirmed:


Attachments
Experimental Fix (441 bytes, patch)
2022-10-10 06:25 UTC, Andreas Krebbel
Details | Diff
gcc patch (2.50 KB, patch)
2022-10-13 02:18 UTC, Ilya Leoshkevich
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rui Ueyama 2022-10-06 13:03:38 UTC
gas 2.38 assembles the following assembly

  larl  %r1,printf
  brasl %r1,printf

into the following object file: 

  Disassembly of section .text:

  0000000000000000 <.text>:
     0:   c0 10 00 00 00 00       larl    %r1,0x0
                          2: R_390_PC32DBL        printf+0x2
     6:   c0 15 00 00 00 00       brasl   %r1,0x6
                          8: R_390_PC32DBL        printf+0x2

. As you can see, gas generates PC32DBL for both larl and brasl. PC32DBL seems to be correct for larl because the instruction is to obtain the address of the specified symbol. However, I think it's not correct for brasl because it's not an address-taking instruction but a function call instruction.

For other targets, we generally distinguish address-taking instructions from function-call instructions. For the former, gas generates PC<SIZE> relocation while it generates PLT<SIZE> relocation for the latter. Examples are R_X86_64_PC32 and R_X86_64_PLT32. gas won't create PC32 relocation for the x86-64 CALL instruction.

I noticed this issue when working on porting the mold linker to s390x. In mold, we distinguish address-taking relocations (PC<SIZE> relocs) from non-address-taking relocations (PLT<SIZE> relocs). We use the information to decide whether we should create a canonical PLT or not.

It's a subtle difference, so most applications work fine even if the assembler doesn't distinguish the two types of relocations, but there are a few applications that depend on the difference. Qt5 is a notable example. So I believe Qt5 apps won't work correctly on S390. Maybe it doesn't matter in practice as people probably aren't running GUI apps on mainframes. But Qt5 is just one example; it could cause other subtle portability issues for other apps.

So, could make a change to gas so that it distinguishes R_390_PC32DBL and R_390_PLT32DBL?
Comment 1 Ulrich Weigand 2022-10-06 17:08:57 UTC
You get R_390_PLT32DBL when using 
  brasl %r1,printf@PLT
which is what compilers usually do.  (Except when compiling non-PIC code, in which case R_390_PC32DBL works fine.)
Comment 2 Rui Ueyama 2022-10-07 02:22:58 UTC
Then it's a GCC's bug that doesn't generate R_390_PC32PLT for `brasl` for -fno-PIC. Again, this is subtle, but can cause a real issue. I don't think you can build working Qt5 apps with the current gcc/gas output.
Comment 3 Martin Liska 2022-10-07 07:09:47 UTC
Feel free to create a GCC bug for it, thanks.
Comment 4 Andreas Krebbel 2022-10-07 07:17:33 UTC
Since GCC 12 we always emit @PLT at brasl symbols already:

commit 0990d93dd8a4268bff5bbe48aa26748cf63201c7
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Mon Jun 7 13:44:15 2021 +0200

    IBM Z: Use @PLT symbols for local functions in 64-bit mode
    
    This helps with generating code for kernel hotpatches, which contain
    individual functions and are loaded more than 2G away from vmlinux.
    This should not create performance regressions for the normal use
    cases, because for local functions ld replaces @PLT calls with direct
    calls.


That previous GCCs omit this isn't a problem since ld makes sure to jump through PLT also for PC32DBL.
Comment 5 Rui Ueyama 2022-10-07 07:28:12 UTC
Ooh, that's why I did see this only on SuSE's builder. I'm glad that that's already been handled.

mold does not create PLT for R_390_PC32DBL; it does only for R_390_PLT* relocs. We can change that so that mold would create a PLT for a PC32 reloc just like GNU ld does, but that would break Qt. So I'll keep the mold's current behavior as-is.
Comment 6 Andreas Krebbel 2022-10-07 07:41:15 UTC
(In reply to Rui Ueyama from comment #5)
> Ooh, that's why I did see this only on SuSE's builder. I'm glad that that's
> already been handled.

I would just like to mention that adding the @PLT isn't really a bugfix. We did this to make life easier with hotpatching. Omitting the @PLT for a non-PIC build is also correct I think. At that point it isn't clear whether the jump needs to go through PLT or not. It could very well be resolved locally. So I think it is anyway up to ld to decide whether a PLT stub is needed or not.

> mold does not create PLT for R_390_PC32DBL; it does only for R_390_PLT*
> relocs. We can change that so that mold would create a PLT for a PC32 reloc
> just like GNU ld does, but that would break Qt. So I'll keep the mold's
> current behavior as-is.

Could you please elaborate what is so special about Qt here. If a function symbol with a PC32DBL reloc could not be resolved locally adding a PLT stub should be the right thing.
Comment 7 Rui Ueyama 2022-10-07 08:43:54 UTC
It's about pointer equality. C/C++ guarantees that two function pointers are equal if and only if they are for the same function. For example, an expression `&printf` in your main program code should yields the same address as an expression `&printf` in libc.so.

That doesn't seem like a tricky requirement, but it actually is when dynamic linking is involved. When doing dynamic linking, there are more than one entry point for the same exported function. Your main program will call `printf` via the executable's PLT entry for `printf`; on the other hand, libc may call its own `printf` via its own PLT entry. libc.so can also call `printf` directly by jumping to the entry point of the function without PLT.

`&printf` can be evaluated to any of these addresses, but the result must be consistent across all ELF modules for the same process throughout its lifetime. Otherwise, the pointer equality will not be guaranteed.

For PIC, the linker creates a GOT entry for `printf`, and the code will read the entry value to obtain the address of `printf`. If all ELF modules are PIC (shared objects are always position-independent, so that means the main executable code was compiled with -fPIC), it's just as simple as the dynamic linker fills all GOT entries for `printf` with the same address, which is the real address of the `printf` function in memory.

However, if the main executable wasn't compiled with -fPIC, the main executable assumes that all variables and functions exist at fixed locations in memory. That's a wrong assumption for imported functions such as `printf` though. So, in that case, the static linker makes the dynamic linker to use a main executable's PLT entry as an address of an imported function. The dynamic linker sets addresses of the main executable's PLT entries to shared libraries' GOT entries for pointer equality.

A main executable's PLT entry that is used as a function address is often called a "canonical PLT".

If the main executable isn't compiled as PIC, and if it takes a pointer of imported function `foo`, the linker has to make the main program's `foo`'s PLT entry canonical. If it's not canonical, the main executable would use its own PLT entry as `foo`'s address, while other shared object files in memory would use `foo`'s real address of its address, which breaks the pointer equality guarantee.

Now, we can ask this question: is it safe to always make main executable's PLT entries canonical? I originally thought that the answer would be yes. Even if the main program is compiled with -fPIC, it seems that we can still make `foo`'s PLT entry canonical, so that the PLT entry's address will be used as `foo`'s address throughout the program execution.

The above logic wasn't wrong, but in reality, there are libraries that assume the linker doesn't make PLT entries canonical unless necessary. Qt5 is one of such libraries. It's `connect` function takes a pointer to a callback function. Internally, it compares a given pointer value with a list of pointers. When doing so, it looks like Qt compares a given pointer with addresses of symbol aliases. It works as long as both pointers directly point to the address of a callback function. But it won't when a given pointer points to a PLT entry of the main executable. That caused a mysterious runtime issue. So I made a change to mold so that it makes PLT entries canonical only when needed (i.e. only when a function address is taken, as opposed to function is just called.)

So, long story short, the linker has to distinguish address-taking relocations from non-address-taking relocations to properly make/not make PLT entries canonical. We assume that R_*_PLT* relocations are not address-taking (we assume such relocations are used for instructions such as x86's CALL, and despite its name, R_*_PLT* relocations don't request the linker to always generate PLT entries), and other relocations are address-taking. I want s390x to follow that convention.
Comment 8 Ulrich Weigand 2022-10-07 09:22:36 UTC
(In reply to Rui Ueyama from comment #7)
> A main executable's PLT entry that is used as a function address is often
> called a "canonical PLT".
> 
> If the main executable isn't compiled as PIC, and if it takes a pointer of
> imported function `foo`, the linker has to make the main program's `foo`'s
> PLT entry canonical. If it's not canonical, the main executable would use
> its own PLT entry as `foo`'s address, while other shared object files in
> memory would use `foo`'s real address of its address, which breaks the
> pointer equality guarantee.
> 
> Now, we can ask this question: is it safe to always make main executable's
> PLT entries canonical? I originally thought that the answer would be yes.
> Even if the main program is compiled with -fPIC, it seems that we can still
> make `foo`'s PLT entry canonical, so that the PLT entry's address will be
> used as `foo`'s address throughout the program execution.

My understanding is that the canonical function address of `foo` is the
address of the PLT for `foo` in the main executable, *if such a PLT
exists*, and the address of `foo` otherwise.  That PLT exists if the
main executable references `foo`, but `foo` is not defined in the
main executable.

This is the logic we've been using all along, and it seems to have been
working correctly everywhere ...
 
> The above logic wasn't wrong, but in reality, there are libraries that
> assume the linker doesn't make PLT entries canonical unless necessary. Qt5
> is one of such libraries. It's `connect` function takes a pointer to a
> callback function. Internally, it compares a given pointer value with a list
> of pointers. When doing so, it looks like Qt compares a given pointer with
> addresses of symbol aliases. It works as long as both pointers directly
> point to the address of a callback function. But it won't when a given
> pointer points to a PLT entry of the main executable. That caused a
> mysterious runtime issue. So I made a change to mold so that it makes PLT
> entries canonical only when needed (i.e. only when a function address is
> taken, as opposed to function is just called.)

I don't quite understand the case.  These comparisons would all be
between function pointers, so both sides of the comparison would be
the result of "address taking" operations as you call it.  Whatever
is done for "just-calling" operations does not matter either way
for the result of such comparisons ...
Comment 9 Rui Ueyama 2022-10-07 09:41:55 UTC
> My understanding is that the canonical function address of `foo` is the
> address of the PLT for `foo` in the main executable, *if such a PLT
> exists*, and the address of `foo` otherwise.  That PLT exists if the
> main executable references `foo`, but `foo` is not defined in the
> main executable.

No it's not the case with any of GNU ld, gold, lld or mold. You can see it in action:

  $ cat test1.c
  void bar();
  int main() { bar(); }

  $ cat test2.c
  #include <stdio.h>
  void foo() {}
  void bar() { printf("foo=%p bar=%p\n", foo, bar); }

  $ cc -shared -fPIC -o test2.so test2.c
  $ cc -fuse-ld=bfd -no-pie -fno-PIC -o test1 test1.c ./test2.so

`test` has a PLT entry for `bar`, as it calls `bar` in a shared library. It's address is 0x401020 in this particular test case

  $ readelf -S test | grep -F ' .plt'
  [13] .plt              PROGBITS        0000000000401020 001020 000020 10  AX  0   0 16

However, the PLT address is not used as its function address as you can see below.

  $ ./test1
  foo=0x7f60482d3110 bar=0x7f60482d3120

Both `foo` and `bar` are resolved to the .so's addresses.

But if you take an address of `foo` as below:

  $ cat test3.c
  void bar();
  void *get_bar_addr() { return bar; }
  int main() { bar(); }

then `bar` suddenly started to be resolved to the PLT entry as below:

  $ cc -fuse-ld=bfd -no-pie -fno-PIC -o test3 test3.c ./test2.so && ./test3
  foo=0x7fa538226110 bar=0x401030

> I don't quite understand the case.  These comparisons would all be
> between function pointers, so both sides of the comparison would be
> the result of "address taking" operations as you call it.  Whatever
> is done for "just-calling" operations does not matter either way
> for the result of such comparisons ...

You can create aliases to symbols using linker script or linker's `--defsym=foo=bar` option. If you define foo as an alias to bar, you may assume that they refer the same address, but if one symbol gets a canonical PLT and the other don't, they'll refer different addresses. This is arguably a bug in an application, but there are libraries such as Qt5 that assumes all programs are compiled as PIC and therefore there's no canonical PLTs at all.
Comment 10 Ulrich Weigand 2022-10-07 10:00:11 UTC
(In reply to Rui Ueyama from comment #9)

> However, the PLT address is not used as its function address as you can see
> below.
> 
>   $ ./test1
>   foo=0x7f60482d3110 bar=0x7f60482d3120

Well, when I run your example (exact same commands on s390x), I get:

./test1 
foo=0x3ff89800620 bar=0x10004d8

(That's the default toolchain on Ubuntu 20.04)
Comment 11 Rui Ueyama 2022-10-07 10:05:32 UTC
> Well, when I run your example (exact same commands on s390x), I get:
> 
> ./test1 
> foo=0x3ff89800620 bar=0x10004d8
> 
> (That's the default toolchain on Ubuntu 20.04)

Yes, that's the compatibility issue we are talking about. On other targets, you'll get what I said.
Comment 12 Andreas Krebbel 2022-10-07 10:35:04 UTC
So do I understand it correctly that for the Qt case the example would look more like this:

test2.c:

#include <stdio.h>

void alias () __attribute__ ((weak, alias ("bar")));
void bar() { printf("bar=%p alias=%p\n", bar, alias); }

test1.c:

void bar();
// void *get_bar_addr() { return bar; }
int main() { bar(); }


which gives the expected result on x86 (with line 2 commented out):
bar=0x7f1e4cdb4109 alias=0x7f1e4cdb4109

but not on s390x because we always resolve to the PLT slot:
bar=0x10004b8 alias=0x3fffdf00640

On the other hand the behavior Qt is relying on depends on whether the address of bar is taken in main. So by uncommenting line 2 the problem occurs also on x86:

bar=0x401030 alias=0x7f63ae84a109

Neither target appears to guarantee that aliases behave the same wrt pointer equality. On non-s390x targets it only works because Qt happens to trigger a case where it accidentally matches.
Comment 13 Ulrich Weigand 2022-10-07 12:09:39 UTC
In any case, I still don't quite get where there is any incompatibility.

If `bar` resolves to its canonical PLT address, and you use `--defsym=foo=bar`, then `foo` will *also* resolve to the canonical PLT address of `bar`, so they still compare equal.

If I change your test2.c to use "extern void foo();" instead, and add `-Wl,--defsym=foo=bar` to the command line, I get as expected:

./test1 
foo=0x10004f8 bar=0x10004f8

Do you have a self-contained test case that actually shows an incompatibility (i.e. two pointers comparing unequal when they should compare equal)?
Comment 14 Rui Ueyama 2022-10-07 12:31:00 UTC
The following bash script should demonstrate the issue better:

-----
cat <<EOF > libfoo.h
#ifndef __PIC__
#error "this file must be compiled with -fPIC"
#endif

typedef void Fn();
void fn1_public(void);
void call_callback(Fn *callback);
EOF

cat <<EOF > libfoo.c
#include "libfoo.h"
#include <stdio.h>

__attribute__((visibility("hidden")))
void fn1(void) { printf("fn1\n"); }

extern void fn1_public() __attribute__((alias("fn1")));

void call_callback(Fn *callback) {
  if (callback == fn1)
    printf("do something special here ...\n");
  callback();
}
EOF

cat <<EOF > main.c
#include "libfoo.h"

int main() {
  fn1_public();
  call_callback(fn1_public);
}
EOF

cc -fPIC -shared -o libfoo.so libfoo.c
cc -fPIC -c -o main.o main.c
cc -no-pie -o main main.o ./libfoo.so
./main
-----

On non-s390x machines, the above script print out "fn1 do something special here ... fn1" while on s390x it prints out "fn1 fn1".

In this script, libfoo assumes that executables depending on the library don't have canonical PLTs, and it "enforces" the restriction by "#ifdef __PIC__" in the library header. I'm not arguing that that assumption is correct, but there are code out there that does something like this. I also don't understand why people would want to do this, but again, the above code compiles and "just works" if it's not on s390x.
Comment 15 Ulrich Weigand 2022-10-07 13:00:49 UTC
Thanks for the test case!  A few comments:

1. In this scenario, it is indeed not guaranteed that fn1 compares equal to fn1_public everywhere.  This is because fn1 is declared hidden in libfoo.c, so all references from that file must go to the definition within the file.  However, fn1_public *can* be overridden according the usual ELF rules, e.g. if the main executable (or some other library loaded earlier) had its own definition of `fn1_public`, the ELF standard requires that all references to `fn1_public` point to that definition instead of the aliased `fn1`.   A decision to use a canonical PLT definition for `fn1_public` in the main executable counts as "overriding" in that sense as well.  So the observed difference in behavior comes indeed down to a difference in whether canonical PLTs are used or not.

2. The difference can be observed only when compiling main with -fPIC.  Without -fPIC, both x86_64 and and s390x will use canonical PLT.  However, with -fPIC, only s390x uses a canonical PLT while x86_64 does not.

3. Looking at the generated assembler in the -fPIC case, the two platforms are very similar:
        call    fn1_public@PLT
        movq    fn1_public@GOTPCREL(%rip), %rax
        movq    %rax, %rdi
        call    call_callback@PLT
vs.
        brasl   %r14,fn1_public@PLT
        lgrl    %r1,fn1_public@GOTENT
        lgr     %r2,%r1
        brasl   %r14,call_callback@PLT
And in fact, there *shouldn't* be any necessity for a canonical PLT on either platform - neither the call (using @PLT) nor the GOT-based function pointer would require it.

So it seems to me there may indeed be a problem on s390x, but it doesn't appear to have anything to with gas - rather, the question is why the linker chooses to use a canonical PLT for fn1_public if it is only used via @PLT and @GOTENT relocations ...
Comment 16 Andreas Krebbel 2022-10-10 06:25:51 UTC
Created attachment 14386 [details]
Experimental Fix

This patch fixes the testcase from comment 14. No testsuite regressions in the Binutils testsuite.
Comment 17 Andreas Krebbel 2022-10-10 06:39:39 UTC
I have attached a patch for the testcase in Comment 14. Turns out that we also have to zero out the symbol value in order to avoid function pointer references in the main binary to be wired up to the main binary PLT slot.

The patch also helps with my non-pic testcase from Comment 12. However, here it relies on the @PLT marker on the symbol in the function call. I think with that we are circling back to Rui's original question from Comment 2.

How should the linker recognize whether a symbol is used only as part of direct function calls or whether its address is taken? All the linker can look at are the relocations. Mold currently relies on symbols in direct function calls to use a PLT32DBL reloc while function pointer references use PC32DBL or R_390_64. With the attached patch the same will apply to ld.

Clang always adds the @PLT marker and GCC starts doing this with version 12. So I think we are ok. I would rather not want to backport the GCC patch to older versions.
Comment 18 Rui Ueyama 2022-10-10 08:23:45 UTC
Was there any reason to limit it to R_390_64 and R_390_PC32DBL?
Comment 19 Andreas Krebbel 2022-10-10 08:38:14 UTC
(In reply to Rui Ueyama from comment #18)
> Was there any reason to limit it to R_390_64 and R_390_PC32DBL?

These were the only relocs which made sense to me as 64 bit pointers.
Comment 20 Rui Ueyama 2022-10-12 08:35:35 UTC
GCC 12 seems to always append `@PLT` to a function symbol even if we are not calling that function. Here is a test case.

```
$ echo 'void foo(); void *bar() { return foo; }' | gcc-12 -S -o- -xc -
        .file   "<stdin>"
        .machinemode zarch
        .machine "z196"
.text
        .align  8
.globl bar
        .type   bar, @function
bar:
.LFB0:
        .cfi_startproc
        ldgr    %f0,%r11
        .cfi_register 11, 16
        lgr     %r11,%r15
        .cfi_def_cfa_register 11
        larl    %r1,foo@PLT
        lgr     %r2,%r1
        lgdr    %r11,%f0
        .cfi_restore 11
        .cfi_def_cfa_register 15
        br      %r14
        .cfi_endproc
.LFE0:
        .size   bar, .-bar
        .ident  "GCC: (SUSE Linux) 12.1.1 20220812 [revision 6b7d570a5001bb79e34c0d1626a8c7f55386dac7]"
        .section        .note.GNU-stack,"",@progbits
```

I think `larl %r1,foo@PLT` should be `larl %r1,foo`.
Comment 21 Andreas Krebbel 2022-10-12 11:37:46 UTC
(In reply to Rui Ueyama from comment #20)
> GCC 12 seems to always append `@PLT` to a function symbol even if we are not
> calling that function. Here is a test case.
...
> I think `larl %r1,foo@PLT` should be `larl %r1,foo`.

I agree. Ilya is having a look. Thanks!
Comment 22 Ilya Leoshkevich 2022-10-13 02:18:27 UTC
Created attachment 14395 [details]
gcc patch
Comment 23 Ilya Leoshkevich 2022-10-13 02:23:06 UTC
I'm currently regtesting the attached patch.

This discussion made me realize that our hotpatching support is currently broken in a fashion similar to that of Qt5 (details in the patch's commit message), so I'm shamelessly breaking it even further in order to fix this issue.  I will ask Sumanth to double-check my findings, but we will probably have to rework this area.