Incorrect order of static dtors in DLL CRT?

Dave Korn dave.korn@artimi.com
Sun Aug 3 22:59:00 GMT 2008


    Evening all,

  I've learnt everything there is to know about pretty much everything that
can possibly go wrong in the world of Dwarf-2 EH over the past little while,
but for the purposes of this discussion only a few facts are germane:

[ and if you don't want to know, page down a couple of times to the obvious
break. ]



-  Dwarf-2 EH tables are linked into the runtime exception handling
mechanism at startup by using static ctors; the main exe and all the shared
libs have one static .ctors entry each that points to a thunk that calls
__register_frame_info with a pointer to that module's exception data.

-  Similarly, the main exe and all the libs each have a .dtors entry that
points to a thunk that loads a pointer to their exception data and calls
__deregister_frame_info.

-  When you throw an exception, the first bit of the stack it's going to
need to be able to unwind is its own stack in _Unwind_RaiseException,
because it's got to work from there back up to the user code.
_Unwind_RaiseException is part of shared libgcc, and so the information
necessary for it to be able to unwind its way back to the user's code is
part of the shared libgcc's EH tables.

-  So: if you throw an exception during the final stages of cleanup, after
the shared libgcc's dtors have run and deregistered the table with all the
EH frame info for the shared libgcc dll, it isn't possible to unwind the
stack and throwing fails; the application aborts.

-  This shouldn't be a problem, since anything which might possibly throw at
shutdown time - the exe, and other C++-using dlls (e.g.: shared libstdc++) -
all depend on shared libgcc, so it'll be the last thing to get unloaded.

-  Libstdc++ does, indeed, throw exceptions during shutdown.


  That's how it ought to be, but that's not quite what happens.  I set ran a
testcase (27_io/objects/char/6.cc) under gdb, with breakpoints on
__register_frame_info and __deregister_frame_info; when it hits, I checked
the backtrace to see what module's static ctors or dtors were being called.
What I saw was this:

----------------------------<snip>----------------------------
Breakpoint 11, 0x63546af3 in cyggcc_s!__register_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#0  0x63546af3 in cyggcc_s!__register_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#1  0x63541041 in ?? ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll

Continuing.

Breakpoint 11, 0x63546af3 in cyggcc_s!__register_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#0  0x63546af3 in cyggcc_s!__register_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#1  0x6c481041 in ?? ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cygstdc++-6.dll

Continuing.

Breakpoint 11, 0x63546af3 in cyggcc_s!__register_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#0  0x63546af3 in cyggcc_s!__register_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#1  0x00401091 in __gcc_register_frame ()
----------------------------<snip>----------------------------

  So, that's the static ctors for the shared libgcc, then libstdc++, then
the main exe, all being called in the correct order of dependency.  Lots of
detail snipped (full log available if wanted), but the main thing it would
show you is those backtraces originating in per_module::run_ctors for the
two dlls, and in do_global_ctors for the main exe.

  After main exits, though, we start to see the shutdown sequence:

----------------------------<snip>----------------------------
Breakpoint 13, 0x63547c63 in cyggcc_s!__deregister_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#0  0x63547c63 in cyggcc_s!__deregister_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#1  0x63541099 in ?? ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll

Continuing.

Breakpoint 13, 0x63547c63 in cyggcc_s!__deregister_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#0  0x63547c63 in cyggcc_s!__deregister_frame_info ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
#1  0x6c481099 in ?? ()
   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cygstdc++-6.dll

Breakpoint 6, __static_initialization_and_destruction_0 (__initialize_p=0, 
    __priority=65535)
    at
/gnu/gcc/release/gcc4-4.3.0-1/src/gcc-4.3.0/libstdc++-v3/testsuite/27_io/obj
ects/char/6.cc:60
60	}
----------------------------<snip>----------------------------

  Ah, whoops.  That last breakpoint isn't like the others.  That's the
static dtors for the main exe, sure enough, but it's not got as far as
calling __deregister_frame_info for the main exe's EH tables, and it's not
going to, because before it does that it's going to try and throw an
exception.  And we saw libgcc deregistering the EH data for
_Unwind_RaiseException just earlier, so it's going to blow up.

  The actual cause of the exception, as it happens, is libstdc++: it
instantiates a static object in the program's data space, which is used to
throw an exception as part of the iostream cleanup.  I don't understand this
mechanism or why it works the way it does, but that's OK; it's trying to
throw, is all that I need to understand, and even if libstdc++ wasn't doing
it, there's no reason in principle why the main exe might not be doing it
anyway; it's supposed to work, even at this late stage of the proceedings.

  So, the thing that went wrong there was that all the dtors got called in
the completely wrong order.  They were called in the same sequence as the
ctors - libgcc, then libstdc++, then main.exe.  That's the wrong order of
course, it should have been the other way round, and then everything would
have worked fine - main's dtors would have been called first, destroyed the
static libstdc++ iostream object, thrown and caught the exception, then
unregistered main's exception tables and exited; then libstdc++ would have
unregistered its own EH tables, and finally libgcc would deregister the
critical EH table containing _Unwind_RaiseException and everyone could have
gone home happily.










[  END OF LONG EXPOSITION   -   START OF PART TWO  -  IF YOU WANT TO GO GET
A CUP OF TEA NOW MIGHT BE A GOOD TIME!  ]








  I've identified two reasons why this happens.  The first is because of
this snippet from dcrt0.cc:

  1112  extern "C" void
  1113  cygwin_exit (int n)
  1114  {
  1115    dll_global_dtors ();
  1116    if (atexit_lock)
  1117      atexit_lock.acquire ();
  1118    exit (n);
  1119  }

  It calls dll_global_dtors, which as the name suggests invokes all the
global dtors for the application's dlls - but this is too soon; it's before
the application's dtors have been called.  They'll be called shortly, when
this function hands off to exit() from newlib, which in turn runs the
atexit() list, which in turn calls the main static dtors for the exe.

  As far as I know the app should always be destroyed before the libs are
destroyed and unloaded, since it's completely reasonable for the app to
still, for example, be using those libs and any resources they allocated in
the dtors of static objects it defines.  So calling dll_global_dtors before
exit AFAICT is just never going to be correct.

  That's ok!  Because it turns out you can just delete that line, and you
still get saved by this snippet:

   998  void __stdcall
   999  do_exit (int status)
  1000  {
  1001    syscall_printf ("do_exit (%d), exit_state %d", status,
exit_state);
  1002
  1003  #ifdef NEWVFORK
            [ ... elided ... ]
  1010  #endif
  1011
  1012    lock_process until_exit (true);
  1013
  1014    if (exit_state < ES_GLOBAL_DTORS)
  1015      {
  1016        exit_state = ES_GLOBAL_DTORS;
  1017        dll_global_dtors ();
  1018      }

which gets called during proper shutdown, *after* the main.exe's dtors have
run to completion.  It's also worth observing that dll_global_dtors is
idempotent: it uses a runonce guard, so it doesn't matter if it gets
over-called - but it /does/ matter if it gets called too early.

  There's still a problem though: the DLLs themselves are still dtor'd in
the same order they were c'tor'd.  It works out OK here, because neither
libgcc nor the libstdc++ DLL want to throw any exceptions from static dtors,
or rather, libstdc++ does, but its static dtors are part of the main.exe, so
they've already run.  But it's still wrong in theory and if there was a
third C++ library in the mix, say a user-written one that depended on
libstdc++, it might still want to throw at static dtor time and it would
fail.

  The reasons the DLLs are run in the wrong order is also simple enough:
that's what the code says to do, here, in dll_init.cc:

    26  static bool dll_global_dtors_recorded;
    27
    28  /* Run destructors for all DLLs on exit. */
    29  void
    30  dll_global_dtors ()
    31  {
    32    int recorded = dll_global_dtors_recorded;
    33    dll_global_dtors_recorded = false;
    34    if (recorded)
    35      for (dll *d = dlls.istart (DLL_ANY); d; d = dlls.inext ())
    36        d->p.run_dtors ();
    37  }

  That's just walking the chain of dlls from start to finish, invoking the
dtors.  The chain is in dependency order; it's walked the same way at
startup:

   215
   216  /* Initialization for all linked DLLs, called by dll_crt0_1. */
   217  void
   218  dll_list::init ()
   219  {
   220    dll_global_dtors_recorded = true;
   221
   222    /* Walk the dll chain, initializing each dll */
   223    dll *d = &start;
   224    while ((d = d->next))
   225      d->init ();
   226  }
   227

  Note that /within/ each individual dll's static objects, dtors are
correctly walked in the inverse order to ctors (see per_module::run_ctors
and per_module::run_dtors for details); it's only between modules that the
ordering is wrong.






[  EVEN THE HIPPOS WANT TO GO GET A CUP OF TEA NOW  ]


  Ummmm, anyway.  I think I've explained everything, but I'm not sure I've
necessarily understood everything.

  In particular I'm not sure I've identified every code path that can be
followed during startup and shutdown that leads to c/dtors being invoked,
when considered against forked and non-forked processes and cygwin processes
launched from other cygwin processes or from non-cygwin processes, so I'm
asking here prior to writing up, testing and submitting a formal patch.

  I /think/ the right things to do are remove the dll_global_dtors
invocation from within cygwin_exit, and to reverse the order of iteration
along the list of dlls within it.

  But I'm wondering what it is I haven't thought of.  I saw some comments in
the changelogs about not relying on atexit that make me believe there's got
to be a reason why things are done this way and in this order, so I wonder
if anyone knows what I'm about to break?


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....



More information about the Cygwin-developers mailing list