Bug 13594 - Crash if nscd is not running in __nscd_get_mapping
Summary: Crash if nscd is not running in __nscd_get_mapping
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: nscd (show other bugs)
Version: 2.15
: P2 normal
Target Milestone: 2.16
Assignee: Andreas Jaeger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-13 16:38 UTC by Andreas Jaeger
Modified: 2014-06-27 11:12 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Potential fix (687 bytes, patch)
2012-03-28 16:51 UTC, law
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Jaeger 2012-01-13 16:38:09 UTC
I have seen some reports where glibc crashes if nscd is not running, especially with chromium but also with ktorrent. This is not always reproduceable, so there's some kind of race or random memory corruption.

valgrind reports:
==16750== Process terminating with default action of signal 11 (SIGSEGV)
==16750==  Access not within mapped region at address 0x17
==16750==    at 0xC63DC1E: __nscd_get_mapping (in /lib64/libc-2.15.so)
==16750==    by 0xC63DDE3: __nscd_get_map_ref (in /lib64/libc-2.15.so)
==16750==    by 0xC63B2E1: nscd_gethst_r (in /lib64/libc-2.15.so)
==16750==    by 0xC63BB96: __nscd_gethostbyname2_r (in /lib64/libc-2.15.so)
==16750==    by 0xC6213F0: gethostbyname2_r@@GLIBC_2.2.5 (in /lib64/libc-2.15.so)
==16750==    by 0xC5F5D8F: gaih_inet (in /lib64/libc-2.15.so)
==16750==    by 0xC5F6ED2: getaddrinfo (in /lib64/libc-2.15.so)
==16750==    by 0x140E7BD: net::SystemHostResolverProc(std::string const&, net::AddressFamily, int, net::AddressList*, int*) (in /usr/lib64/chromium/chromium)
==16750==    by 0x1407C57: net::HostResolverImpl::Job::DoLookup(base::TimeTicks const&, unsigned int) (in /usr/lib64/chromium/chromium)
==16750==    by 0x2B5B2FA: base::(anonymous namespace)::WorkerThread::ThreadMain() (in /usr/lib64/chromium/chromium)
==16750==    by 0x11753C1: base::(anonymous namespace)::ThreadFunc(void*) (in /usr/lib64/chromium/chromium)
==16750==    by 0x919FF65: start_thread (in /lib64/libpthread-2.15.so)

gdb gives a similar backtrace.

After reverting commit 3a2c02424d9824f5cdea4ebd32ff929b2b1f49c6, the problem does not appear anymore.

I can't reproduce it yet on my system - so no further information.

Reports:
https://bugzilla.novell.com/show_bug.cgi?id=741021
https://bbs.archlinux.org/viewtopic.php?id=133021
Comment 1 Vladimir Shorikov 2012-02-07 16:44:47 UTC
I confirm this for Chromium on archlinux.

Specs (just in case):
- kernel 3.3-rc2 (vanilla)
- glibc 2.15

I can only add 2 things:
- this happens only once in a session, i.e. when Chromium is launched for the first time after boot; after that Chromium starts normally.
- I couldn't yet reproduce this bug *within* gdb - it always starts without a problem. I will continue trying though.
Comment 2 Paulo César Pereira de Andrade 2012-02-11 22:52:36 UTC
I am just starting testing an update from "2.14.90" to "2.15" for
mandriva, and I get this randomly, varying from 1 to 4 times every
restart of chromium-browser.

$ rpm -qf /usr/bin/chromium-browser 
chromium-browser-unstable-17.0.963.26-1-mdv2012.0.x86_64

$ LD_LIBRARY_PATH=/usr/lib64/chromium-browser gdb /usr/lib64/chromium-browser/chrome
...
0x00007ffff18f5e7e in __nscd_get_mapping (type=<optimized out>, key=
    0x7ffff19436b4 "hosts", mappedp=0x7ffff1b82548) at nscd_helper.c:417
417       if (oldval != NULL && atomic_decrement_val (&oldval->counter) == 0)
(gdb) p oldval
$1 = (struct mapped_database *) 0xffffffffffffffff
(gdb) bt
#0  0x00007ffff18f5e7e in __nscd_get_mapping (type=<optimized out>, key=
    0x7ffff19436b4 "hosts", mappedp=0x7ffff1b82548) at nscd_helper.c:417
#1  0x00007ffff18f4098 in __nscd_get_nl_timestamp () at nscd_gethst_r.c:113
#2  0x00007ffff18e2be8 in __check_pf (seen_ipv4=0x7ffff7ed071e, seen_ipv6=
    0x7ffff7ed071f, in6ai=0x7ffff7ed06e0, in6ailen=0x7ffff7ed06f0)
    at ../sysdeps/unix/sysv/linux/check_pf.c:324
#3  0x00007ffff18aa015 in __GI_getaddrinfo (name=
    0x555559ba3a68 "www.statcounter.com", service=<optimized out>, hints=
    0x7ffff7ed0a60, pai=0x7ffff7ed0a98) at ../sysdeps/posix/getaddrinfo.c:2305
#4  0x00005555566a0c9c in ?? ()
#5  0x000055555669b3e8 in ?? ()
#6  0x0000555557d85b95 in ?? ()
#7  0x0000555556437fb2 in ?? ()
#8  0x00007ffff4239bd0 in start_thread (arg=0x7ffff7ed1700)
    at pthread_create.c:309
#9  0x00007ffff18bd93d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

I believe this should correct it, but not the proper solution:

--- glibc-2.15-a316c1f/nscd/nscd_helper.c.orig	2012-02-11 20:25:37.804514879 -0200
+++ glibc-2.15-a316c1f/nscd/nscd_helper.c	2012-02-11 20:26:07.428588082 -0200
@@ -414,7 +414,8 @@ __nscd_get_mapping (request_type type, c
   struct mapped_database *oldval = *mappedp;
   *mappedp = result;
 
-  if (oldval != NULL && atomic_decrement_val (&oldval->counter) == 0)
+  if (oldval != NULL && oldval != NO_MAPPING
+      && atomic_decrement_val (&oldval->counter) == 0)
     __nscd_unmap (oldval);
 
   return result;


hopefully also useful:

(gdb) frame 0#0  0x00007ffff18f5e7e in __nscd_get_mapping (type=<optimized out>, key=
    0x7ffff19436b4 "hosts", mappedp=0x7ffff1b82548) at nscd_helper.c:417
417       if (oldval != NULL && atomic_decrement_val (&oldval->counter) == 0)
(gdb) p keylen
$12 = 6
(gdb) p mapsize
$13 = 0
(gdb) p iov
$14 = {{iov_base = 0x7ffff7ed0330, iov_len = 6}, {iov_base = 0x7ffff7ed04a0, 
    iov_len = 8}}
(gdb) p cmsg
$15 = <optimized out>
(gdb) p (cmsg)->__cmsg_data
value has been optimized out
(gdb) p ip
$16 = <optimized out>
(gdb) p mapfd
$17 = <optimized out>
(gdb) p st
No symbol "st" in current context.
(gdb) p mapping
$18 = <optimized out>
(gdb) p size
No symbol "size" in current context.
(gdb) p oldval
$19 = (struct mapped_database *) 0xffffffffffffffff
(gdb) p result
$20 = (struct mapped_database *) 0xffffffffffffffff
Comment 3 Paulo César Pereira de Andrade 2012-02-11 23:01:24 UTC
Forgot to add some extra information:

$ ls /usr/lib64/chromium-browser
chrome*            libppGoogleNaClPluginChrome.so*  resources.pak
chrome.pak         locales/                         themes/
chrome-sandbox*    nacl_helper*                     xdg-mime*
chromium-wrapper*  nacl_helper_bootstrap*           xdg-settings*
default_apps/      nacl_irt_x86_64.nexe
libffmpegsumo.so*  resources/

If removing libppGoogleNaClPluginChrome.so from that directory,
or overriding the wrapper and starting chrome without setting
LD_LIBRARY_PATH I could not get it to crash, neither did notice
any problems.
Comment 4 Vladimir Shorikov 2012-02-12 04:25:56 UTC
It's already fixed in archlinux, though it was one helluva big commit:
http://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/glibc&id=8e950112da65c96ad17cbd650ac9db3050343a3f
I'm not even sure where to look...
Comment 5 Allan McRae 2012-02-12 04:33:38 UTC
That is called reverting the commit that caused the issue...  Not fixing it.
Comment 6 Vladimir Shorikov 2012-02-12 04:39:26 UTC
(In reply to comment #5)
> That is called reverting the commit that caused the issue...  Not fixing it.

Okay.
It's already reverted in archlinux. Hope this stays reverted for good and finds it's way into the upstream (if this wasn't archlinux-only problem since it was *package* version that changed...)
Comment 7 law 2012-03-27 18:19:34 UTC
Based on just reading the code, I wonder if a one thread is mucking up hst_map_handle.mapped behind the back of nscd_get_mapping.

nscd_get_nl_timestamp doesn't bother to grab the hst_map_handle lock and calls into nscd_get_mapping which could potentially change hst_map_handle.mapped to NO_MAPPING.

If this occurs after another thread had passed the NO_MAPPING check in nscd_get_map_ref, but hasn't yet hit the atomic_decrement_val in nscd_get_mapping then it could cause the failure mode reported in this report (and several others across various distros, upstream kde and possibly elsewhere).

That would also explain why the patch in c#2 works as well as the lack of reproducability.

Vladimir/Paulo: I don't have a way to reproduce the problem here, but I could pass along a patch to y'all if you're interested in testing my theory.
Comment 8 Paulo César Pereira de Andrade 2012-03-27 18:46:35 UTC
(In reply to comment #7)
[...]
> Vladimir/Paulo: I don't have a way to reproduce the problem here, but I could
> pass along a patch to y'all if you're interested in testing my theory.

  I can test it tonight at home, that is the only place I managed to
reproduce the problem. I almost sure it should be some race condition
as when running chromium under gdb it would always have a lot of
threads running, and when playing with the libraries and LD_LIBRARY_PATH
I should have just changed some timing.
Comment 9 Andreas Jaeger 2012-03-28 08:15:31 UTC
Jeff, please add the patch here and we all can test it. Thanks for looking into this!
Comment 10 law 2012-03-28 16:51:17 UTC
Created attachment 6307 [details]
Potential fix
Comment 11 law 2012-03-29 16:21:29 UTC
Just a note from a tester within Red Hat.  He was reporting a ktorrent crashes when starting up which appeared to be related to, or possibly this same problem.
After installing the patch already attached to this BZ, the ktorrent crashes have ceased.
Comment 12 Adam Conrad 2012-04-16 15:04:53 UTC
I've gotten confirmation from a few Ubuntu users that Jeff's fix is working for them, FWIW.
Comment 13 Carlos O'Donell 2012-05-07 20:27:09 UTC
We want this fix in 2.16, setting milestone.
Comment 14 Andreas Jaeger 2012-05-15 18:37:56 UTC
this is fixed now with:

commit 509072a0f7f8a37bedf61a78c0cdd7783368c65a
Author: Andreas Jaeger <aj@suse.de>
Date:   Tue May 15 20:35:53 2012 +0200

    Avoid race in nscd
    
    2012-05-15  Jeff Law  <law@redhat.com>
                Andreas Jaeger  <aj@suse.de>
    
            [BZ #13594]
            * nscd/nscd-client.h (__nscd_acquire_maplock): New function, split
            out from...
            * nscd/nscd_helper.c (__nscd_get_map_ref): ... here.
            * nscd/nscd-client.h: Add __nscd_acquire_maplock.
            * nscd/nscd_gethst_r.c (__nscd_get_nl_timestamp): Add locking to
            code changing __hst_map_handle.map.
Comment 15 law 2012-05-16 04:23:10 UTC
Thanks for taking care of this Andreas.  I've just updated Fedora Rawhide to use your version of this fix.
Comment 16 Jackie Rosen 2014-02-16 19:42:11 UTC Comment hidden (spam)