This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: questions about bug 4737 (fork is not async-signal-safe)
I think I found the corrupted _IO_list_all problem.
It has nothing to do with the earlier mentioned discussion on
libc-hacker
(http://www.sourceware.org/ml/libc-hacker/2007-02/msg00009.html).
This is an application SW design problem which can cause deadlocks.
Nor has it anything to do with the use of fork from within a
signal handler.
The problem is dprintf/dvprintf.
If a multi-threaded application uses fork and dprintf (by different
threads at about the same time) the child process can crash
in fresetlockfiles.
dprintf adds to the global _IO_list_all a temporary
struct _IO_FILE_plus (tmpfil) for which member _lock is NULL.
Here's the code I'm talking about:
31
32 int
33 _IO_vdprintf (d, format, arg)
34 int d;
35 const char *format;
36 _IO_va_list arg;
37 {
38 struct _IO_FILE_plus tmpfil;
39 struct _IO_wide_data wd;
40 int done;
41
42 #ifdef _IO_MTSAFE_IO
43 tmpfil.file._lock = NULL;
44 #endif
45 _IO_no_init (&tmpfil.file, _IO_USER_LOCK, 0, &wd, &_IO_wfile_jumps);
46 _IO_JUMPS (&tmpfil) = &_IO_file_jumps;
47 INTUSE(_IO_file_init) (&tmpfil);
48 #if !_IO_UNIFIED_JUMPTABLES
49 tmpfil.vtable = NULL;
50 #endif
51 if (INTUSE(_IO_file_attach) (&tmpfil.file, d) == NULL)
52 {
53 INTUSE(_IO_un_link) (&tmpfil);
54 return EOF;
55 }
56 tmpfil.file._IO_file_flags =
57 (_IO_mask_flags (&tmpfil.file, _IO_NO_READS,
58 _IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING)
59 | _IO_DELETE_DONT_CLOSE);
60
61 done = INTUSE(_IO_vfprintf) (&tmpfil.file, format, arg);
62
63 _IO_FINISH (&tmpfil.file);
64
65 return done;
66 }
"glibc-2.7/libio/iovdprintf.c"
If _IO_file_init returns, adding to _IO_list_all is done
and the list_all_lock is released.
If another thread calls fork at this time (before tmpfil
has been removed from _IO_list_all) the child process
crashes in fresetlockfiles.
The reason it crashes is because fresetlockfiles
re-initializes the file locks by writing to the _lock member
(to some default "_IO_lock_initializer" value). But the
_lock member of the "struct _IO_FILE_plus" coming from dprintf
is NULL.
Here's the code I'm talking about:
42 static void
43 fresetlockfiles (void)
44 {
45 _IO_ITER i;
46
47 for (i = _IO_iter_begin(); i != _IO_iter_end(); i = _IO_iter_next(i))
48 _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
49 }
"glibc-2.7/nptl/sysdeps/unix/sysv/linux/fork.c"
The chance for this problem to occur is very small.
dprintf or vdprintf (_IO_vdprintf) needs to be interrupted after adding
tmpfil and before removing it. This is a very tiny window.
I did check the source code of glibc-latest and it seems to be
the problem is still there.
I could anyway simply work around our problem by avoiding
dprintf (we now use sprintf + write(2)).
So now we can happily continue using glibc-2.7 on our
powerpc 32bit platform :-)
---
Norbert van Bolhuis.
AimValley B.V.