[PATCH] x86-64: Restore LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]

Florian Weimer fweimer@redhat.com
Mon Aug 8 13:29:08 GMT 2022


* H. J. Lu:

> On Tue, Aug 2, 2022 at 1:00 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > Crossing 2GB boundaries with indirect calls and jumps can use more
>> > branch prediction resources on several Intel CPUs.  There is visible
>> > performance improvement on workloads with many PLT calls when executable
>> > and shared libraries are mmapped below 2GB.  Add the Prefer_MAP_32BIT_EXEC
>> > bit so that mmap will try to map executable or denywrite pages with
>> > MAP_32BIT first.
>> >
>> > NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space
>> > layout randomization (ASLR), which is always disabled for SUID programs
>> > and can only be enabled by setting environment variable,
>> > LD_PREFER_MAP_32BIT_EXEC.
>>
>> If the performance benefits are significant, this should be handled at
>> the kernel level.  Only the kernel can put the main program, ld.so and
>> the vDSO into the same 2GB window (presumably with the main program at
>> the top, so that the heap can grow almost indefinitely).
>
> ld.so and vDSO aren't performance sensitive.  But we need to handle PIE.

I don't think this is necessarily true.  It depends on execution
profile.

clock_gettime in the vDSO could certainly matter to some workloads.

>> For mapping shared objects, we can give the kernel a hint that they will
>> eventually contain an executable mapping.  If the kernel could reuse
>> MAP_DENYWRITE for that, no glibc changes would be needed after all.
>>
>> Doing this is in glibc is only a very partial solution, and so I'd
>> appreciate if it could be fixed properly in the kernel.
>>
>
> There is no easy way for kernel to selectively mmap PIE with MAP_32BIT.
> Can ld.so re-exec PIE with "ld.so PIE" so that ld.so can mmap PIE with
> MAP_32BIT?

In theory, yes, but that still leaves the vDSO issue.  The kernel could
cover that as well.

Regarding the performance issue, does everything have to be in the first
2 GiB or 4 GiB, or is it sufficient if everything is in the same
+/- 2 GiB window?

Thanks,
Florian



More information about the Libc-alpha mailing list