questions about bug 4737 (fork is not async-signal-safe)

Fri May 13 11:32:00 GMT 2011

On 05/13/11 09:03, Jonathan Nieder wrote:
> Norbert van Bolhuis wrote:
>
>> The problem is that it is a lot of work to move to a new glibc.
>
> 	git clone git://sourceware.org/git/glibc.git
> 	mkdir glibc-build
> 	cd glibc-build
> 	../glibc/configure --prefix=$HOME/opt/glibc
> 	make check
> 	make install
> 	LD_LIBRARY_PATH=$HOME/opt/glibc:/usr/local/lib:/usr/lib:/lib
> 	export LD_LIBRARY_PATH
> 	elf/ld.so /path/to/app arguments
>

well we're on a embedded powerpc 32bit platform and the project
is in a phase where we (read: managers) don't like any new/other glibc.

and even if we would replace glibc, isn't there is a dependency with gcc ?
(I'm saying this because of the creation of our final cross gcc which needed
  glibc-2.7 (which was compiled with a "bootstrap" gcc))
In other words, wouldn't we be forced to update our entire cross development
environment (including not just gcc, but also target libraries (e.g. libpcap),
target tools (e.g. gdb, gdbserver)) ?

hmmm.. now that I'm thinking about this.. I guess we can just cross compile
glibc-latest with our current cross gcc (gcc-4.2.4-glibc-2.7, configured for
target: powerpc-e300c3-linux) and install that, right ?
If yes, is it guaranteed glibc-latest is fully (ABI) backwards compatible
with glibc-2.7 ?

But no matter whether this is solved in glibc-latest or not, we have
other applications using glibc-2.7 and I still like to fully understand
what's wrong.

>> I'm just trying to fully understand the problem and then judge whether
>> our app really needs this bug fixed.
>
> I've enjoyed looking over the various stories you pointed to, so no
> harm done.  But you haven't fully explained the problem you ran into
> (for example by sending a testcase) so there's not much others can do
> to help.
>

Thanks for digging into this!
You're right, let me better explain the problem.
It all started with:
http://www.cygwin.com/ml/libc-help/2010-11/msg00023.html

Last month I finally had more time to dig into this. The 1st
message of this thread is the result.

Let me rephrase that our app does not call fork(2) or system(3) from
a signal handler. It's just a multi-threaded application that uses
glibc-2.7, runs on linux-2.6.28 and uses system(3) quite often.

I believe the earlier mentioned discussion on libc-hacker, see:
http://www.sourceware.org/ml/libc-hacker/2007-02/msg00009.html
is the same problem.
Look at the example program (of about 100 lines). It triggers the problem
perfectly. What's remarkable is that the problem also occurs for
some of our i386 and x86-64 FedoraCora8=glibc-2.7 and FedoraCora14=glibc-2.13
systems).

> The report about fork deadlocks when called from from a signal handler
> that you pointed to is about a situation in which the standard is
> unclear.  OpenSolaris copes by making fork act like _Fork (meaning
> skipping atfork handlers) when called from a signal handler.  Of
> course, glibc does not even provide _Fork yet; I suspect providing it
> would be a nice, uncontroversial improvement if someone wants to help
> in this area.
>

Ok, didn't know about _Fork.

> The report from Wayne Badger about fork corrupting list_all_lock state
> when the syscall fails looked to him like a silicon bug.  It was
> highly CPU-specific.  It seems like a silicon or code generation bug
> to me, too, since the source code does not ask to do what it was
> observed to do.
>
> The report about plash is not about glibc code.
>
> Of course I'd be thrilled if you can make progress on any of the
> above.
>

You're right about plash. I guess I was too eager to find similar cases :-)
I don't know about the others. It's doesn't really matter, we've got one of
our own :-)
If time permits I'm going to further dig into this, I'll certainly
keep you posted.

Thanks and Regards,
Norbert.