[RFC] libgfortran dll i/o redirection lossage caused by order-of-termination issue

Dave Korn dave.korn.cygwin@googlemail.com
Wed Jul 8 21:36:00 GMT 2009


  As you may remember, there is a problem in current gfortran-4.3.2-2, when
linking against the libgfortran DLL, and redirecting stdio.  Trivial testcase:

> $ cat hw.F
>       program main
>       WRITE(*,*)     'SUCCESS'
>       end
> $ gfortran-4 hw.F -o hello
> $ ./hello.exe
> $ ./hello.exe | cat
> $

  I've looked at this in some detail.  The fortran runtime has its own fairly
involved i/o stream library, quite akin to the FILE*-based f-stdio functions
in libc.  Also like libc, these streams ("units" in fortran terminology) can
be buffered or unbuffered.  In the case where you're not redirecting stdout,
the bytes of the string are emitted immediately using calls to libc write(),
but when you redirect on the command line, it notices that it's a non-tty and
switches to buffered i/o mode.

  In buffered i/o mode, the runtime library allows the streams to accumulate a
buffer's-worth of data at a time before it does an actual write() call.  This
means that there can still be buffered data when the user's program exits.
That's OK; libgfortran includes a .dtors entry that calls a shutdown function
that closes and flushes all the buffered data.  So the user's program exits,
the global dtors run, the buffered data is flushed and everything works just fine.

  That's how it goes with static linking, anyway.  But not when you use a DLL.
 There's a problem in the shutdown ordering, and the final buffer of data gets
lost.  At the very end of dll_crt0_0, we have ...

>   if (user_data->main)
>     cygwin_exit (user_data->main (__argc, __argv, *user_data->envptr));

... and cygwin_exit passes control to exit() ...

> extern "C" void
> cygwin_exit (int n)
> {
>   if (atexit_lock)
>     atexit_lock.acquire ();
>   exit (n);
> }

... which is where I think all the exit paths (exit(), abort(), return from
main) come back together:

> void 
> _DEFUN (exit, (code),
> 	int code)
> {
>   __call_exitprocs (code, NULL);
>   if (_GLOBAL_REENT->__cleanup)
>     (*_GLOBAL_REENT->__cleanup) (_GLOBAL_REENT);
>   _exit (code);
> }

... and which passes control to _exit() ...

> extern "C" void
> _exit (int n)
> {
>   do_exit (((DWORD) n & 0xff) << 8);
> }

... which calls do_exit(); and that's where process shutdown gets properly

> void __stdcall
> do_exit (int status)
> {
>   syscall_printf ("do_exit (%d), exit_state %d", status, exit_state);
> #ifdef NEWVFORK
>   vfork_save *vf = vfork_storage.val ();
>   if (vf != NULL && vf->pid < 0)
>     {
>       exit_state = ES_NOT_EXITING;
>       vf->restore_exit (status);
>     }
> #endif
>   lock_process until_exit (true);
>   if (exit_state < ES_GLOBAL_DTORS)
>     {
>       exit_state = ES_GLOBAL_DTORS;
>       dll_global_dtors ();
>     }
>   if (exit_state < ES_EVENTS_TERMINATE)
>     {
>       exit_state = ES_EVENTS_TERMINATE;
>       events_terminate ();
>     }


  Now, in this sequence of events, first we call the atexit hooks, which
includes all static dtors:

> void
> _DEFUN (exit, (code),
> 	int code)
> {
>   __call_exitprocs (code, NULL);

... then we call newlib cleanup ...

>   if (_GLOBAL_REENT->__cleanup)
>     (*_GLOBAL_REENT->__cleanup) (_GLOBAL_REENT);

... then we terminate the loaded DLLs:

>       dll_global_dtors ();

  That's bad.  The call to newlib's __cleanup() hook shuts down stdio
facilities, and so when libgfortran DLL's dtors are finally called, they
attempt to flush the buffer down already-closed stdio channels, and it gets
silently dropped on the floor.

  I think we shouldn't terminate newlib until after all the DLLs have been
finalized, as well as the main application.  In terms of dependency order,
libc is the lowest, so it should be the last to shut down.

  Now, we obviously can't go hacking a call to dll_global_dtors into the
middle of newlib's exit() function.

  Passing NULL as the second argument to __call_exitprocs is newlib's way of
signalling that this is a global shutdown:

> /*
>  * Call registered exit handlers.  If D is null then all handlers are called,
>  * otherwise only the handlers from that DSO are called.
>  */
> void 
> _DEFUN (__call_exitprocs, (code, d),
> 	int code _AND _PTR d)

so on the face of it, I'm wondering if we should hook dll finalisation into
newlib's atexit mechanism somehow/where, rather than running it later on.  We
already do this for do_global_dtors, which gets invoked from
__call_exitprocs().  Another feasible alternative would be to provide our own
definition of exit(), overriding newlib's.

  I've got a bunch of gcc testruns going now, so it's going to be a while
before I can think about replacing the DLL with a patched version to try any
of this out; in the meantime, has anyone got any thoughts about how best to
make this work?

[ Vaguely related:  "Incorrect order of static dtors in DLL CRT?",
  http://www.cygwin.com/ml/cygwin-developers/2008-08/threads.html#00000 ]

More information about the Cygwin-developers mailing list