questions about bug 4737 (fork is not async-signal-safe)

Norbert van Bolhuis nvbolhuis@aimvalley.nl
Tue May 31 10:04:00 GMT 2011


I think I found the corrupted _IO_list_all problem.

It has nothing to do with the earlier mentioned discussion on
libc-hacker
(http://www.sourceware.org/ml/libc-hacker/2007-02/msg00009.html).
This is an application SW design problem which can cause deadlocks.

Nor has it anything to do with the use of fork from within a
signal handler.

The problem is dprintf/dvprintf.
If a multi-threaded application uses fork and dprintf (by different
threads at about the same time) the child process can crash
in fresetlockfiles.

dprintf adds to the global _IO_list_all a temporary
struct _IO_FILE_plus (tmpfil) for which member _lock is NULL.
Here's the code I'm talking about:

  31
  32 int
  33 _IO_vdprintf (d, format, arg)
  34      int d;
  35      const char *format;
  36      _IO_va_list arg;
  37 {
  38   struct _IO_FILE_plus tmpfil;
  39   struct _IO_wide_data wd;
  40   int done;
  41
  42 #ifdef _IO_MTSAFE_IO
  43   tmpfil.file._lock = NULL;
  44 #endif
  45   _IO_no_init (&tmpfil.file, _IO_USER_LOCK, 0, &wd, &_IO_wfile_jumps);
  46   _IO_JUMPS (&tmpfil) = &_IO_file_jumps;
  47   INTUSE(_IO_file_init) (&tmpfil);
  48 #if  !_IO_UNIFIED_JUMPTABLES
  49   tmpfil.vtable = NULL;
  50 #endif
  51   if (INTUSE(_IO_file_attach) (&tmpfil.file, d) == NULL)
  52     {
  53       INTUSE(_IO_un_link) (&tmpfil);
  54       return EOF;
  55     }
  56   tmpfil.file._IO_file_flags =
  57     (_IO_mask_flags (&tmpfil.file, _IO_NO_READS,
  58                      _IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING)
  59      | _IO_DELETE_DONT_CLOSE);
  60
  61   done = INTUSE(_IO_vfprintf) (&tmpfil.file, format, arg);
  62
  63   _IO_FINISH (&tmpfil.file);
  64
  65   return done;
  66 }
"glibc-2.7/libio/iovdprintf.c"

If _IO_file_init returns, adding to _IO_list_all is done
and the list_all_lock is released.
If another thread calls fork at this time (before tmpfil
has been removed from _IO_list_all) the child process
crashes in fresetlockfiles.

The reason it crashes is because fresetlockfiles
re-initializes the file locks by writing to the _lock member
(to some default "_IO_lock_initializer" value). But the
_lock member of the "struct _IO_FILE_plus" coming from dprintf
is NULL.
Here's the code I'm talking about:

  42 static void
  43 fresetlockfiles (void)
  44 {
  45   _IO_ITER i;
  46
  47   for (i = _IO_iter_begin(); i != _IO_iter_end(); i = _IO_iter_next(i))
  48     _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
  49 }
"glibc-2.7/nptl/sysdeps/unix/sysv/linux/fork.c"


The chance for this problem to occur is very small.
dprintf or vdprintf (_IO_vdprintf) needs to be interrupted after adding
tmpfil and before removing it. This is a very tiny window.

I did check the source code of glibc-latest and it seems to be
the problem is still there.

I could anyway simply work around our problem by avoiding
dprintf (we now use sprintf + write(2)).
So now we can happily continue using glibc-2.7 on our
powerpc 32bit platform :-)

---
Norbert van Bolhuis.
AimValley B.V.



More information about the Libc-help mailing list