This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Compile elf/rtld.c with -fno-tree-loop-distribute-patterns.
On 25/11/2019 05:08, Florian Weimer wrote:
> * Sandra Loosemore:
>
>>> Is it possible to do this via a pragma? If I understand things
>>> correctly, this is not necessary for PI_STATIC_AND_HIDDEN targets
>>> (where initialization of the dynamic loader is simpler).
>>
>> I see that the definitions of memset and memmove use
>> "inhibit_loop_to_libcall" (which expands into an optimize attribute) to
>> prevent recursion, but I didn't think the header where that is defined
>> (include/libc-symbols.h) is supposed to be included in the dynamic
>> linker?
>
> elf/rtld.os is built with -include ../include/libc-symbols.h, so the
> declaration should be in scope. I don't know why it is not effective.
> It probably only applies to the implementations of memset and memmove
> themselves (if the generic ones written in C are used).
>
>> Also, already in elf/Makefile there is another instance where
>> it adds -fno-tree-loop-distribute-patterns to the CFLAGS, so I just
>> copied that. I don't work with glibc internals enough to have a good
>> feel for what the preferred solution is but I'll test a different
>> solution if this one isn't good enough.
>
> I had hoped we could write something like this at the start of
> elf/rtld.c:
>
> #ifndef PI_STATIC_AND_HIDDEN
> # pragma GCC optimize ("no-tree-loop-distribute-patterns")
> #endif
>
> Then the optimization would still be applied on the targets where it
> is safe to do so.
>
> But I don't have a strong opinion about this and would appreciate
> feedback from others.
>
We already have a similar code to handle a similar issue:
484 /* Partly clean the `bootstrap_map' structure up. Don't use
485 `memset' since it might not be built in or inlined and we cannot
486 make function calls at this point. Use '__builtin_memset' if we
487 know it is available. We do not have to clear the memory if we
488 do not have to use the temporary bootstrap_map. Global variables
489 are initialized to zero by default. */
490 #ifndef DONT_USE_BOOTSTRAP_MAP
491 # ifdef HAVE_BUILTIN_MEMSET
492 __builtin_memset (bootstrap_map.l_info, '\0', sizeof (bootstrap_map.l_info));
493 # else
494 for (size_t cnt = 0;
495 cnt < sizeof (bootstrap_map.l_info) / sizeof (bootstrap_map.l_info[0]);
496 ++cnt)
497 bootstrap_map.l_info[cnt] = 0;
498 # endif
499 #endif
The HAVE_BUILTIN_MEMSET on configure.ac check if __builtin_memset itself
calls memset and it is within DONT_USE_BOOTSTRAP_MAP mainly because it is
stack allocated for !PI_STATIC_AND_HIDDEN.
(As a side-note, I really think these kind of micro-optimization is just
over-complicate for minimal gain)
However, I am not sure there is really a restriction regarding
PI_STATIC_AND_HIDDEN and internal function calls. Just after this memset
call it has:
516 if (bootstrap_map.l_addr || ! bootstrap_map.l_info[VALIDX(DT_GNU_PRELINKED)])
517 {
518 /* Relocate ourselves so we can do normal function calls and
519 data access using the global offset table. */
520
521 ELF_DYNAMIC_RELOCATE (&bootstrap_map, 0, 0, 0);
522 }
523 bootstrap_map.l_relocated = 1;
So it should be safe to call memcpy/memset calls in _dl_start after this
point (as some ports that with !PI_STATIC_AND_HIDDEN does with memcpy) .
Is gcc creating a mem* calls before ld.so reallocate itself? If it were,
where exactly?