Bug 13845 - Infrequent random stop in futex_wait using printf inside alarm signal handler
Summary: Infrequent random stop in futex_wait using printf inside alarm signal handler
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-14 13:08 UTC by Jonathan Andrews
Modified: 2014-06-26 13:59 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Andrews 2012-03-14 13:08:03 UTC
Excuse the background, but I feel it may be useful.

Recently i've been developing an application. A single (non threaded) process receives UDP broadcast packets from multiple sources, mixes them into a single audio stream and plays them via ALSA or PULSE audio.

I build 3 versions of the rx process, one driving ALSA, one driving PULSE and one "dummy" version that discards the audio but is in every other respect identical (conditional compiles from the same source).  I build these 3 versions for two architectures, intel and ARM.

The "dummy" version on both intel and ARM machines run forever....
The pulse version also runs forever (as best as I can tell)
The alsa version randomly freezes.

I not unreasonably assumed it to be an ALSA bug.

Once I had found the time to track the lockups down they seem to be caused by printf stalling in a handler for a once a second alarm signal.

I checked the documentation for "alarm" and several other places but cant see any warning that suggest that printf can not be used in that context.

I fixed my application by adding a simple flag.
The freezing behaviour I have seen on all linux/gclic versions i've tried so far.

The faster the machine the less frequently the stop happens.  On my 3.8Ghz AMD 64 it would typically occur only every few days of runtime, on a 200Mhz ARM the event would occur typically within 3 hours

Attached a stack backtrace of my process running on a 200mhz ARM board.

The process is making heavy use of recvfrom(UDP) and lighter but frequenct use of ALSA snd_pcm_writen.

Process stall after 2hours, running under gdb:
ARM udp-many-way-audio-rx:  My IP Address [10.10.10.111]  Runtime:[   0D  2H  5M 57S]
Decoding my own audio packets
Playback ALSA lib:1.0.25 writen underruns: 0  overruns (handled):0
Main loops per second = 51      EAGAIN per second = 52
Total packets received 2546534 , errors 0, Per Second 341
Last error ()

 1  [      10.10.10.6  ]  PLS 00000  PC 0000064638 PPS 43  BWD  2  OLP 0
 2  [    10.10.10.111  ]  PLS 00000  PC 0000325200 PPS 42  BWD  2  OLP 2
 3  [     10.10.10.66  ]  PLS 00000  PC 0000318855 PPS 43  BWD  2  OLP 1
 4  [     10.10.10.65  ]  PLS 00000  PC 0000318855 PPS 43  BWD  2  OLP 1
 5  [     10.10.10.64  ]  PLS 00000  PC 0000318855 PPS 43  BWD  2  OLP 1
 6  [     10.10.10.63  ]  PLS 00000  PC 0000318855 PPS 43  BWD  2  OLP 1
 7  [     10.10.10.62  ]  PLS 00000  PC 0000318855 PPS 42  BWD  2  OLP 1
 8  [     10.10.10.61  ]  PLS 00000  PC 0000318855 PPS 42  BWD  2  OLP 0

PLS=Packet Last Seen    PC=Packet Counter       PPS=Packets per second
BWD=Buffers with data  OLP=oldest packet buffer
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^C
Program received signal SIGINT, Interrupt.
0x000ddbc0 in __lll_lock_wait_private ()
(gdb) backtrace
#0  0x000ddbc0 in __lll_lock_wait_private ()
#1  0x000b7a4c in vfprintf ()
#2  0x000bb9f8 in printf ()
#3  0x0000a930 in once_a_second_alarm_handler () at udp-many-way-audio-rx.c:359
#4  <signal handler called>
#5  0x000c24a0 in fflush ()
#6  0x0000ac20 in slot_buffers_firstfree (slot=3) at udp-many-way-audio-rx.c:433
#7  0x0000c0e4 in main (argc=1, argv=0xbefffe14) at udp-many-way-audio-rx.c:692
(gdb)

An strace of the process stalling on intel is here:
http://www.jonshouse.co.uk/download/a_stop.txt.gz

Thanks,
Jon
Comment 1 Andreas Schwab 2012-03-14 13:30:46 UTC
printf and fflush are not async-signal-safe.
Comment 2 Jonathan Andrews 2012-03-14 13:38:18 UTC
(In reply to comment #1)
> printf and fflush are not async-signal-safe.

Thats nice !

Would it not have been an idea to tell someone, in the documentation for alarm for example.  That way time would saved I would be swearing less.
Comment 4 Jonathan Andrews 2012-03-14 14:55:11 UTC
(In reply to comment #3)
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04

Ok, I found the reference to the issue in an up to date man page for "signal" on a Debian box. My older development machine does not mention it, nor does the man page for alarm I referenced.

"unspecified" is also a pretty vague term.

Lots of nice unsafe examples via google - thanks web ....
http://www.ccplusplus.com/2011/10/alarm-function-example-in-c.html

"Interactions between alarm() and setitimer() are unspecified"
No "Only thread safe functions should be used in the alarm signal handler"

Its documented with the assumption that somebody has read it in the correct order, the documentation isn't thread safe itself :-(

Thanks,
Jon
Comment 5 Paul Pluzhnikov 2012-03-15 15:01:35 UTC
(In reply to comment #4)

> No "Only thread safe functions should be used in the alarm signal handler"

That statement would be incorrect.

The correct statement reads: "only async-signal-safe functions can be used inside asynchronous signal handler". 

http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html

This fact is the "ABC" of signal handling, surely *everybody* knows that?