Bug 27279 - x86_64 _dl_runtime_resolve should preserve r10/r11
Summary: x86_64 _dl_runtime_resolve should preserve r10/r11
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-29 19:35 UTC by James Y Knight
Modified: 2021-03-02 05:06 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2021-02-01 00:00:00
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James Y Knight 2021-01-29 19:35:24 UTC
This is arguably _not_ actually a bug. Yet, I still think it should probably be fixed.

The x86-64 abi does not specify which registers should be preserved through a lazy PLT stub resolution (unlike, say, the AARCH64 psABI which specifies that all registers but r16, r17 must be preserved). Thus, it's arguably unacceptable to use _any_ non-standard calling convention when calling through a PLT that might invoke lazy binding.

However, users do this, and expect it to work, and are upset when it doesn't work.

Because of that, the current state of x86_64's _dl_runtime_resolve is that it _does_ preserve nearly every register, even those which are not required by any specification. This changed most recently in 2017, via bug 21265, after some debate -- seemingly resulting in grudging agreement that supporting other calling conventions was a reasonable thing to do after all (grumble grumble).

After that change, _almost_ all registers -- vector, float, and GPR -- are now preserved either explicitly in the assembly code, or implicitly via being callee-save in the C function it calls.

But unfortunately, there are two GPRs which still get clobbered: r10 and r11. And, there's a calling convention which expects all GPRs except r11 to be preserved: <https://clang.llvm.org/docs/AttributeReference.html#preserve-most>. This has caused a bug in a piece of software, where the developer didn't realize that the "preserve_most" calling convention was incompatible with calls that might go through a PLT stub.

So -- since lazy PLT resolution is already _so close_ to saving literally everything, and the cost of additionally saving r10/r11 is so low compared to everything else it's doing, I'd propose that _dl_runtime_resolve should be modified to save those final 2 still-clobbered GPRs.

And thus, finally, be transparent to ANY calling convention anyone might want to use.
Comment 1 Florian Weimer 2021-02-01 08:45:11 UTC
For CET support, it is rather convenient to keep the r11 clobber. Avoiding the r10 clobber should however be easy. r11 is also clobbered by the large code model calling convention, so I hope it's not much of a problem.

Note that some calling conventions remain unsupported. We still assume that %rsp points to a stack to which we can save lots of data, for example.
Comment 2 James Y Knight 2021-03-02 05:06:54 UTC
As clang's "preserve_most" CC does allow r11 to be clobbered, to fix the actual issue I observed only r10 needs to be saved. 

I had initially added r11 to the request only in an attempt to stave off any *future* request that might arise to preserve it. But, I expect that the use of r11 in the large code model sequences is exactly why it remains clobbered in preserve_most, and consequently why it's fairly unlikely that there would be a future request to preserve it.

So, I agree that saving r10, but leaving r11 clobbered would be reasonable.