Sourceware Bugzilla – Bug 13845
Infrequent random stop in futex_wait using printf inside alarm signal handler
Last modified: 2012-03-15 15:01:35 UTC
Excuse the background, but I feel it may be useful.
Recently i've been developing an application. A single (non threaded) process receives UDP broadcast packets from multiple sources, mixes them into a single audio stream and plays them via ALSA or PULSE audio.
I build 3 versions of the rx process, one driving ALSA, one driving PULSE and one "dummy" version that discards the audio but is in every other respect identical (conditional compiles from the same source). I build these 3 versions for two architectures, intel and ARM.
The "dummy" version on both intel and ARM machines run forever....
The pulse version also runs forever (as best as I can tell)
The alsa version randomly freezes.
I not unreasonably assumed it to be an ALSA bug.
Once I had found the time to track the lockups down they seem to be caused by printf stalling in a handler for a once a second alarm signal.
I checked the documentation for "alarm" and several other places but cant see any warning that suggest that printf can not be used in that context.
I fixed my application by adding a simple flag.
The freezing behaviour I have seen on all linux/gclic versions i've tried so far.
The faster the machine the less frequently the stop happens. On my 3.8Ghz AMD 64 it would typically occur only every few days of runtime, on a 200Mhz ARM the event would occur typically within 3 hours
Attached a stack backtrace of my process running on a 200mhz ARM board.
The process is making heavy use of recvfrom(UDP) and lighter but frequenct use of ALSA snd_pcm_writen.
Process stall after 2hours, running under gdb:
ARM udp-many-way-audio-rx: My IP Address [10.10.10.111] Runtime:[ 0D 2H 5M 57S]
Decoding my own audio packets
Playback ALSA lib:1.0.25 writen underruns: 0 overruns (handled):0
Main loops per second = 51 EAGAIN per second = 52
Total packets received 2546534 , errors 0, Per Second 341
Last error ()
1 [ 10.10.10.6 ] PLS 00000 PC 0000064638 PPS 43 BWD 2 OLP 0
2 [ 10.10.10.111 ] PLS 00000 PC 0000325200 PPS 42 BWD 2 OLP 2
3 [ 10.10.10.66 ] PLS 00000 PC 0000318855 PPS 43 BWD 2 OLP 1
4 [ 10.10.10.65 ] PLS 00000 PC 0000318855 PPS 43 BWD 2 OLP 1
5 [ 10.10.10.64 ] PLS 00000 PC 0000318855 PPS 43 BWD 2 OLP 1
6 [ 10.10.10.63 ] PLS 00000 PC 0000318855 PPS 43 BWD 2 OLP 1
7 [ 10.10.10.62 ] PLS 00000 PC 0000318855 PPS 42 BWD 2 OLP 1
8 [ 10.10.10.61 ] PLS 00000 PC 0000318855 PPS 42 BWD 2 OLP 0
PLS=Packet Last Seen PC=Packet Counter PPS=Packets per second
BWD=Buffers with data OLP=oldest packet buffer
Program received signal SIGINT, Interrupt.
0x000ddbc0 in __lll_lock_wait_private ()
#0 0x000ddbc0 in __lll_lock_wait_private ()
#1 0x000b7a4c in vfprintf ()
#2 0x000bb9f8 in printf ()
#3 0x0000a930 in once_a_second_alarm_handler () at udp-many-way-audio-rx.c:359
#4 <signal handler called>
#5 0x000c24a0 in fflush ()
#6 0x0000ac20 in slot_buffers_firstfree (slot=3) at udp-many-way-audio-rx.c:433
#7 0x0000c0e4 in main (argc=1, argv=0xbefffe14) at udp-many-way-audio-rx.c:692
An strace of the process stalling on intel is here:
printf and fflush are not async-signal-safe.
(In reply to comment #1)
> printf and fflush are not async-signal-safe.
Thats nice !
Would it not have been an idea to tell someone, in the documentation for alarm for example. That way time would saved I would be swearing less.
(In reply to comment #3)
Ok, I found the reference to the issue in an up to date man page for "signal" on a Debian box. My older development machine does not mention it, nor does the man page for alarm I referenced.
"unspecified" is also a pretty vague term.
Lots of nice unsafe examples via google - thanks web ....
"Interactions between alarm() and setitimer() are unspecified"
No "Only thread safe functions should be used in the alarm signal handler"
Its documented with the assumption that somebody has read it in the correct order, the documentation isn't thread safe itself :-(
(In reply to comment #4)
> No "Only thread safe functions should be used in the alarm signal handler"
That statement would be incorrect.
The correct statement reads: "only async-signal-safe functions can be used inside asynchronous signal handler".
This fact is the "ABC" of signal handling, surely *everybody* knows that?