Counting static __cxa_atexit calls

Michael Matz
Wed Aug 24 15:25:51 GMT 2022


On Wed, 24 Aug 2022, Florian Weimer wrote:

> > On Wed, 24 Aug 2022, Florian Weimer wrote:
> >
> >> > Isn't this merely moving the failure point from exception-at-ctor to 
> >> > dlopen-fails?
> >> 
> >> Yes, and that is a soft error that can be handled (likewise for
> >> pthread_create).
> >
> > Makes sense.  Though that actually hints at a design problem with ELF 
> > static ctors/dtors: they should be able to soft-fail (leading to dlopen or 
> > pthread_create error returns).  So, maybe the _best_ way to deal with this 
> > is to extend the definition of the various object-initionalization means 
> > in ELF to allow propagating failure.
> We could enable unwinding through the dynamic linker perhaps.  But as I
> said, those Itanium ABI functions tend to be noexcept, so there's work
> on that front as well.

Yeah, my idea would have been slightly less ambitious: redefine the ABI of 
.init_array functions to be able to return an int.  The loader would abort 
loading if any of them return non-zero.  Now change GCC code emission of 
those helper functions placed in .init_array to catch all exceptions and 
(in case an exception happened) return non-zero.  Or, even easier, don't 
deal with exceptions, but rather just check if __cxa_atexit worked, and if 
not return non-zero right away.  That way all the exception propagation 
(or cxa_atexit error handling) stays purely within the GCC generated code 
and the dynamic loader only needs to deal with return values, not 
exceptions and unwinding.

For backward compat we can't just change the ABI of .init_array, but we 
can devise an alternative: .init_array_mayfail and the associated DT tags.

> For thread-local storage, it's even more difficult because any first
> access can throw even if the constructor is noexcept.

That's extending the scope somewhat, pre-counting cxa_atexit wouldn't 
solve this problem either, right?

> >> I think we need some level of link editor support to avoid drastically
> >> over-counting multiple static calls that get merged into one
> >> implementation as the result of vague linkage.  Not sure how to express
> >> that at the ELF level?
> >
> > Hmm.  The __cxa_atexit calls are coming from the per-file local static 
> > initialization_and_destruction routine which doesn't have vague linkage, 
> > so its contribution to the overall number of cxa_atexit calls doesn't 
> > change from .o to final-exe.  Can you show an example of what you're 
> > worried about?
> Sorry if I didn't use the correct terminology.
> I was thinking about this:
> #include <vector>
> template <int i>
> struct S {
>   static std::vector<int *> vec;
> };
> template <int i> std::vector<int *> S<i>::vec(i);
> std::vector<int *> &
> f()
> {
>   return S<1009>::vec;
> }
> The initialization is deduplicated with the help of a guard variable,
> and that also bounds to number of __cxa_atexit invocations to at most
> one per type.

Ah, right, thanks.  The guard variable for class-local statics, I was 
thinking file-scope globals.  Double-hmm.  I don't readily see a nice way 
to correctly precalculate the number of cxa_atexit calls here.  A simple 
problem is the following: assume a couple files each defining such class 
templates, that ultimately define and initialize static members A<1>::a 
and B<1>::b (assume vague linkage).  Assume we have four files:

a:  defines A::a
b:  defines B::b
ab: defines A::a and B::b
ba: defines B::b and A::a

Now link order influences which file gets to actually initialize the 
members and which ones skip it due to guard variables.  But the object 
files themself don't know enough context of which will be which.  Not even 
the link editor know that because the non-taken cxa_atexit calls aren't in 
linkonce/group sections, there are all there in 
object.o:.text:_Z41__static_initialization_and_destruction_0ii .

So, what would need to be emitted is for instance a list of cxa_atexit 
calls plus guard variable; the link editor could then count all unguarded 
cxa_atexit calls plus all guarded ones, but the latter only once per 
guard.  The key would be the identity of the guard variable.

That seems like an awful lot of complexity at the wrong level for a very 
specific usecase when we could also make .init_array failable, which then 
even might have more usecases.

> > A completely different way would be to not use cxa_atexit at all: 
> > allocate memory statically for the object and dtor addresses in 
> > .rodata (instead of in .text right now), and iterate over those at 
> > static_destruction time.  (For the thread-local ones it would need to 
> > store arguments to __tls_get_addr).
> That only works if the compiler and linker can figure out the
> construction order.  In general, that is not possible, and that case
> seems even quite common with C++.  If the construction order is not
> known ahead of time, it is necessary to record it somewhere, so that
> destruction can happen in reverse.  So I think storing things in .rodata
> is out.

Hmm, right.  The basic idea could be salvaged by also pre-allocating a 
linked list field in .data (or .tdata), and a per-object-file entry to 
such list.  But failable .init_array looks nicer to me right now.


More information about the Binutils mailing list