We call getaddrinfo inside our Xorg module. Xorg unfortunately uses setitimer to generate recurring SIGALRM notifications, at 20ms intervals. This can cause various calls to libc to hang.
This is a typical stack trace:
#0 0xffffe402 in __kernel_vsyscall ()
#1 0x00960e6d in poll () from /lib/tls/i686/nosegneg/libc.so.6
#2 0x0098e431 in clntudp_call () from /lib/tls/i686/nosegneg/libc.so.6
#3 0x00b49f84 in do_ypcall () from /lib/libnsl.so.1
#4 0x00b4a6c0 in yp_match () from /lib/libnsl.so.1
#5 0xf77aa351 in internal_gethostbyname2_r () from /lib/libnss_nis.so.2
#6 0x009809bb in gethostbyname2_r@@GLIBC_2.1.2 () from
#7 0x0094f4ea in gaih_inet () from /lib/tls/i686/nosegneg/libc.so.6
#8 0x00952c2d in getaddrinfo () from /lib/tls/i686/nosegneg/libc.so.6
The issue is that clntudp_call retries calls to poll() with every EINTR, but does not adjust the timeout.
See sunrpc/clnt_udp.c:L403. poll() is called repeatedly with the same timeout; we want clntudp_call() to eventually return within utimeout seconds, but when setitemer is using a shorter timeout, clntudp_call loops forever.
The fix is to adjust the timeout to poll each time we loop. (This is the normal way to handle EINTR with timeouts.)
Created attachment 7871 [details]
patch to fix the problem
The problem is not limited to nis or udp rpc; there are various parts of glibc that retry polls without recomputing timeouts. I'm testing this patch to fix them all.