I have seen some reports where glibc crashes if nscd is not running, especially with chromium but also with ktorrent. This is not always reproduceable, so there's some kind of race or random memory corruption. valgrind reports: ==16750== Process terminating with default action of signal 11 (SIGSEGV) ==16750== Access not within mapped region at address 0x17 ==16750== at 0xC63DC1E: __nscd_get_mapping (in /lib64/libc-2.15.so) ==16750== by 0xC63DDE3: __nscd_get_map_ref (in /lib64/libc-2.15.so) ==16750== by 0xC63B2E1: nscd_gethst_r (in /lib64/libc-2.15.so) ==16750== by 0xC63BB96: __nscd_gethostbyname2_r (in /lib64/libc-2.15.so) ==16750== by 0xC6213F0: gethostbyname2_r@@GLIBC_2.2.5 (in /lib64/libc-2.15.so) ==16750== by 0xC5F5D8F: gaih_inet (in /lib64/libc-2.15.so) ==16750== by 0xC5F6ED2: getaddrinfo (in /lib64/libc-2.15.so) ==16750== by 0x140E7BD: net::SystemHostResolverProc(std::string const&, net::AddressFamily, int, net::AddressList*, int*) (in /usr/lib64/chromium/chromium) ==16750== by 0x1407C57: net::HostResolverImpl::Job::DoLookup(base::TimeTicks const&, unsigned int) (in /usr/lib64/chromium/chromium) ==16750== by 0x2B5B2FA: base::(anonymous namespace)::WorkerThread::ThreadMain() (in /usr/lib64/chromium/chromium) ==16750== by 0x11753C1: base::(anonymous namespace)::ThreadFunc(void*) (in /usr/lib64/chromium/chromium) ==16750== by 0x919FF65: start_thread (in /lib64/libpthread-2.15.so) gdb gives a similar backtrace. After reverting commit 3a2c02424d9824f5cdea4ebd32ff929b2b1f49c6, the problem does not appear anymore. I can't reproduce it yet on my system - so no further information. Reports: https://bugzilla.novell.com/show_bug.cgi?id=741021 https://bbs.archlinux.org/viewtopic.php?id=133021
I confirm this for Chromium on archlinux. Specs (just in case): - kernel 3.3-rc2 (vanilla) - glibc 2.15 I can only add 2 things: - this happens only once in a session, i.e. when Chromium is launched for the first time after boot; after that Chromium starts normally. - I couldn't yet reproduce this bug *within* gdb - it always starts without a problem. I will continue trying though.
I am just starting testing an update from "2.14.90" to "2.15" for mandriva, and I get this randomly, varying from 1 to 4 times every restart of chromium-browser. $ rpm -qf /usr/bin/chromium-browser chromium-browser-unstable-17.0.963.26-1-mdv2012.0.x86_64 $ LD_LIBRARY_PATH=/usr/lib64/chromium-browser gdb /usr/lib64/chromium-browser/chrome ... 0x00007ffff18f5e7e in __nscd_get_mapping (type=<optimized out>, key= 0x7ffff19436b4 "hosts", mappedp=0x7ffff1b82548) at nscd_helper.c:417 417 if (oldval != NULL && atomic_decrement_val (&oldval->counter) == 0) (gdb) p oldval $1 = (struct mapped_database *) 0xffffffffffffffff (gdb) bt #0 0x00007ffff18f5e7e in __nscd_get_mapping (type=<optimized out>, key= 0x7ffff19436b4 "hosts", mappedp=0x7ffff1b82548) at nscd_helper.c:417 #1 0x00007ffff18f4098 in __nscd_get_nl_timestamp () at nscd_gethst_r.c:113 #2 0x00007ffff18e2be8 in __check_pf (seen_ipv4=0x7ffff7ed071e, seen_ipv6= 0x7ffff7ed071f, in6ai=0x7ffff7ed06e0, in6ailen=0x7ffff7ed06f0) at ../sysdeps/unix/sysv/linux/check_pf.c:324 #3 0x00007ffff18aa015 in __GI_getaddrinfo (name= 0x555559ba3a68 "www.statcounter.com", service=<optimized out>, hints= 0x7ffff7ed0a60, pai=0x7ffff7ed0a98) at ../sysdeps/posix/getaddrinfo.c:2305 #4 0x00005555566a0c9c in ?? () #5 0x000055555669b3e8 in ?? () #6 0x0000555557d85b95 in ?? () #7 0x0000555556437fb2 in ?? () #8 0x00007ffff4239bd0 in start_thread (arg=0x7ffff7ed1700) at pthread_create.c:309 #9 0x00007ffff18bd93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 I believe this should correct it, but not the proper solution: --- glibc-2.15-a316c1f/nscd/nscd_helper.c.orig 2012-02-11 20:25:37.804514879 -0200 +++ glibc-2.15-a316c1f/nscd/nscd_helper.c 2012-02-11 20:26:07.428588082 -0200 @@ -414,7 +414,8 @@ __nscd_get_mapping (request_type type, c struct mapped_database *oldval = *mappedp; *mappedp = result; - if (oldval != NULL && atomic_decrement_val (&oldval->counter) == 0) + if (oldval != NULL && oldval != NO_MAPPING + && atomic_decrement_val (&oldval->counter) == 0) __nscd_unmap (oldval); return result; hopefully also useful: (gdb) frame 0#0 0x00007ffff18f5e7e in __nscd_get_mapping (type=<optimized out>, key= 0x7ffff19436b4 "hosts", mappedp=0x7ffff1b82548) at nscd_helper.c:417 417 if (oldval != NULL && atomic_decrement_val (&oldval->counter) == 0) (gdb) p keylen $12 = 6 (gdb) p mapsize $13 = 0 (gdb) p iov $14 = {{iov_base = 0x7ffff7ed0330, iov_len = 6}, {iov_base = 0x7ffff7ed04a0, iov_len = 8}} (gdb) p cmsg $15 = <optimized out> (gdb) p (cmsg)->__cmsg_data value has been optimized out (gdb) p ip $16 = <optimized out> (gdb) p mapfd $17 = <optimized out> (gdb) p st No symbol "st" in current context. (gdb) p mapping $18 = <optimized out> (gdb) p size No symbol "size" in current context. (gdb) p oldval $19 = (struct mapped_database *) 0xffffffffffffffff (gdb) p result $20 = (struct mapped_database *) 0xffffffffffffffff
Forgot to add some extra information: $ ls /usr/lib64/chromium-browser chrome* libppGoogleNaClPluginChrome.so* resources.pak chrome.pak locales/ themes/ chrome-sandbox* nacl_helper* xdg-mime* chromium-wrapper* nacl_helper_bootstrap* xdg-settings* default_apps/ nacl_irt_x86_64.nexe libffmpegsumo.so* resources/ If removing libppGoogleNaClPluginChrome.so from that directory, or overriding the wrapper and starting chrome without setting LD_LIBRARY_PATH I could not get it to crash, neither did notice any problems.
It's already fixed in archlinux, though it was one helluva big commit: http://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/glibc&id=8e950112da65c96ad17cbd650ac9db3050343a3f I'm not even sure where to look...
That is called reverting the commit that caused the issue... Not fixing it.
(In reply to comment #5) > That is called reverting the commit that caused the issue... Not fixing it. Okay. It's already reverted in archlinux. Hope this stays reverted for good and finds it's way into the upstream (if this wasn't archlinux-only problem since it was *package* version that changed...)
Based on just reading the code, I wonder if a one thread is mucking up hst_map_handle.mapped behind the back of nscd_get_mapping. nscd_get_nl_timestamp doesn't bother to grab the hst_map_handle lock and calls into nscd_get_mapping which could potentially change hst_map_handle.mapped to NO_MAPPING. If this occurs after another thread had passed the NO_MAPPING check in nscd_get_map_ref, but hasn't yet hit the atomic_decrement_val in nscd_get_mapping then it could cause the failure mode reported in this report (and several others across various distros, upstream kde and possibly elsewhere). That would also explain why the patch in c#2 works as well as the lack of reproducability. Vladimir/Paulo: I don't have a way to reproduce the problem here, but I could pass along a patch to y'all if you're interested in testing my theory.
(In reply to comment #7) [...] > Vladimir/Paulo: I don't have a way to reproduce the problem here, but I could > pass along a patch to y'all if you're interested in testing my theory. I can test it tonight at home, that is the only place I managed to reproduce the problem. I almost sure it should be some race condition as when running chromium under gdb it would always have a lot of threads running, and when playing with the libraries and LD_LIBRARY_PATH I should have just changed some timing.
Jeff, please add the patch here and we all can test it. Thanks for looking into this!
Created attachment 6307 [details] Potential fix
Just a note from a tester within Red Hat. He was reporting a ktorrent crashes when starting up which appeared to be related to, or possibly this same problem. After installing the patch already attached to this BZ, the ktorrent crashes have ceased.
I've gotten confirmation from a few Ubuntu users that Jeff's fix is working for them, FWIW.
We want this fix in 2.16, setting milestone.
this is fixed now with: commit 509072a0f7f8a37bedf61a78c0cdd7783368c65a Author: Andreas Jaeger <aj@suse.de> Date: Tue May 15 20:35:53 2012 +0200 Avoid race in nscd 2012-05-15 Jeff Law <law@redhat.com> Andreas Jaeger <aj@suse.de> [BZ #13594] * nscd/nscd-client.h (__nscd_acquire_maplock): New function, split out from... * nscd/nscd_helper.c (__nscd_get_map_ref): ... here. * nscd/nscd-client.h: Add __nscd_acquire_maplock. * nscd/nscd_gethst_r.c (__nscd_get_nl_timestamp): Add locking to code changing __hst_map_handle.map.
Thanks for taking care of this Andreas. I've just updated Fedora Rawhide to use your version of this fix.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.