I will try to use your instruction to run on docker to see what exactly
is happening in your environment.
That's not necessary anymore. I managed to make it reproducible in a much simpler form just now.
The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) :
https://github.com/mulle-nat/ld-so-breakage
Thanks, it is way more useful. I now I understand what is happening and
IMHO this behaviour is a required because on glibc we set that atexit/on_exit
handlers are ran when deregister a library (as for dlclose).
Using the example in your testcase:
---
USE_A=YES ./build/main_adbc
-- install atexit_b
-- install atexit_a
-- run atexit_a
-- run atexit_b
---
The behaviour of atexit handlers being called in wrong order is they are
being registered with '__cxa_atexit' which in turn sets its internal type
as 'ef_cxa'. Since _dl_init is registered last (after all shared library
loading and constructors calls), it will call _dl_fini which in turn will
call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler).
The '__cxa_finalize' will then all 'ef_cxa' function for the module passed
by __do_global_dtors_aux and set the function as 'ef_free'. It will then
prevent '__run_exit_handlers' to run the handlers more than once.
So the question you might ask is why not just to use 'ef_at' for atexit
handlers, make them no to run on __cxa_finalize and thus make your example
run as you expect? The issue is glibc does not know whether your library
would be dlopened or not.
If you set an atfork handler by a constructor that references to a function
inside the shared library and if do *not* set to *not* be ran later you might,
a case of dlopen -> constructor -> dlclose -> exit will try to execute and
invalid mapping. This is exactly what dlfcn/bug-atexit{1,2}.c.
So the question is why exactly glibc defined that atexit should be called
by dlclose. I understand that __cxa_finalize / destructor make sense to
make it possible the shared library to free allocated resources, but I
can't really get why there a need to extend it to 'atexit' as well.