Created attachment 8126 [details] proposed patch for the bug A large (java) multi-threaded server process was found to be hanging on calls to gethostbyname_r. It was further determined that it only hung when /etc/hosts.conf contained "reorder on". Inspecting the source for _res_hconf_reorder_addrs, it is straightforward to see the bug. Assume there are 3 threads executing the function at the same time. All see num_ifs is -1 at line 407, and attempt to get the lock on line 422. One thread gets the lock at line 422, initializes the static data structure, and unlocks the lock. The next thread gets the lock. It double-checks the value of num_ifs at line 425. Seeing that it is now >0, it skips the initialization. But this thread does not unlock the lock. The last thread hangs on the lock forever.
Funny. The bug was introduced by this change: [BZ #5375] * resolv/res_hconf.c (_res_hconf_reorder_addrs): Fix locking when initializing interface list. The proposed patch in bug 5375 got it right.
Eric, did this cause service availability issues in your setup? Do you think it would be possible to trigger this deliberately?
The error did cause service availability failures. Note: * it requires the "reorder on" configuration * it requires at least 3 threads attempting to run the _res_hconf_reorder_addrs function concurrently to lock up a single thread * these concurrent accesses must be on the first run of the function * future requests of _res_hconf_reorder_addrs will work just fine In this case, a component of a large distributed database was asked by hundred of other services to perform work at the moment it came online. These requests locked other resources before performing a name lookup. It would be hard, but not impossible to do this deliberately to a service. It was not deliberately caused when it was found. Just (un)lucky.
(In reply to Eric Newton from comment #3) > Note: > * it requires the "reorder on" configuration > * it requires at least 3 threads attempting to run the > _res_hconf_reorder_addrs function concurrently to lock up a single thread > * these concurrent accesses must be on the first run of the function > * future requests of _res_hconf_reorder_addrs will work just fine > It would be hard, but not impossible to do this deliberately to a service. Based on your third note, it seems to me that this can only happen once per process, right? Wouldn't this make it difficult to trigger *deliberately* in a typical long-running multi-threaded service? (I'm trying to figure out if we have to treat this as a security issue or a mere bug.)
You are correct: it only happens once per process.
Is there any way this patch can be applied? It seems like a small, trivial fix to a potentially big, hard to debug, problem. It'd be really useful to get this patch in, so it can be applied downstream, in https://bugzilla.redhat.com/show_bug.cgi?id=1192621
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via b57525f1a376149840f740a31535681c07152ba4 (commit) via 47852c972d1ad80d8b38d9e94507b27df0ede421 (commit) from 554edb23ffc7a953ca86309cc5f02dbd1a63abe0 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b57525f1a376149840f740a31535681c07152ba4 commit b57525f1a376149840f740a31535681c07152ba4 Author: Dmitry V. Levin <ldv@altlinux.org> Date: Thu Jun 18 21:40:46 2015 +0000 Fix potential hanging of gethostbyaddr_r/gethostbyname_r When "reorder" resolver option is enabled, threads of a multi-threaded process could hang in gethostbyaddr_r, gethostbyname_r, or gethostbyname2_r. Due to a trivial bug in _res_hconf_reorder_addrs, simultaneous invocations of this function in a multi-threaded process could result to _res_hconf_reorder_addrs returning without releasing the lock it holds, causing other threads to block indefinitely while waiting for the lock that is not going to be released. [BZ #17977] * resolv/res_hconf.c (_res_hconf_reorder_addrs): Fix unlocking when initializing interface list, based on the bug analysis and the patch proposed by Eric Newton. * resolv/tst-res_hconf_reorder.c: New test. * resolv/Makefile [$(have-thread-library) = yes] (tests): Add tst-res_hconf_reorder. ($(objpfx)tst-res_hconf_reorder): Depend on $(libdl) and $(shared-thread-library). (tst-res_hconf_reorder-ENV): New variable. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=47852c972d1ad80d8b38d9e94507b27df0ede421 commit 47852c972d1ad80d8b38d9e94507b27df0ede421 Author: Dmitry V. Levin <ldv@altlinux.org> Date: Mon Jun 22 09:57:14 2015 +0000 _res_hconf_reorder_addrs: fix typo in comment * resolv/res_hconf.c (_res_hconf_reorder_addrs): Fix typo in comment. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 16 ++++++ NEWS | 20 ++++---- resolv/Makefile | 4 ++ resolv/res_hconf.c | 6 +- resolv/tst-res_hconf_reorder.c | 112 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 145 insertions(+), 13 deletions(-) create mode 100644 resolv/tst-res_hconf_reorder.c
Fixed in master.
Test case improvement: https://sourceware.org/git/?p=glibc.git;a=commit;h=731a713b72e1281d58b3304738f04efb7bfca8b7
Please disregard the previous comment.