This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Fix incorrect double-checked locking in __nss_database_lookup. [BZ #20483]
- From: Torvald Riegel <triegel at redhat dot com>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Tue, 13 Sep 2016 13:05:29 +0200
- Subject: Re: [PATCH] Fix incorrect double-checked locking in __nss_database_lookup. [BZ #20483]
- Authentication-results: sourceware.org; auth=none
- References: <1471553358.14544.108.camel@localhost.localdomain> <e7ea618c-1f26-b863-22d7-5ae6b1e4ccaf@redhat.com> <1473756228.30192.64.camel@localhost.localdomain> <a0c3b91a-28bc-0cea-99fb-891999b1d790@redhat.com>
On Tue, 2016-09-13 at 11:17 +0200, Florian Weimer wrote:
> On 09/13/2016 10:43 AM, Torvald Riegel wrote:
> > On Mon, 2016-09-12 at 20:55 +0200, Florian Weimer wrote:
> >> On 08/18/2016 10:49 PM, Torvald Riegel wrote:
> >>> DB_LOOKUP_FCT (service_user **ni, const char *fct_name, const char *fct2_name,
> >>> void **fctp)
> >>> {
> >>> - if (DATABASE_NAME_SYMBOL == NULL
> >>> - && __nss_database_lookup (DATABASE_NAME_STRING, ALTERNATE_NAME_STRING,
> >>> - DEFAULT_CONFIG, &DATABASE_NAME_SYMBOL) < 0)
> >>> + /* Relaxed MO is fine because this is just about whether we have to perform
> >>> + the lookup; we will do another acquire-MO load next before assuming that
> >>> + the lookup has happened. */
> >>> + if ((atomic_load_relaxed (&(DATABASE_NAME_SYMBOL)) == NULL)
> >>> + && (__nss_database_lookup (DATABASE_NAME_STRING, ALTERNATE_NAME_STRING,
> >>> + DEFAULT_CONFIG, &DATABASE_NAME_SYMBOL) < 0))
> >>> return -1;
> >>>
> >>> - *ni = DATABASE_NAME_SYMBOL;
> >>> + /* Acquire MO as required by __nss_database_lookup. */
> >>> + *ni = atomic_load_acquire (&(DATABASE_NAME_SYMBOL));
> >>>
> >>> return __nss_lookup (ni, fct_name, fct2_name, fctp);
> >>> }
> >>
> >> I'm not sure if this is the double-checked locking pattern we want to
> >> use for new code. GCC currently cannot merge the two loads (and maybe
> >> it never will do so because it is beneficial to treat the separate loads
> >> as an optimization hint).
> >
> > I don't think the (re)load matters performance-wise, given that
> > DATABASE_NAME_SYMBOL is initialized once, and so the access will most
> > likely hit in the cache. What may matter were if we'd often returned -1
> > and would have an unnecessary acquire MO in this path, though OTOH
> > __nss_database_lookup will acquire the lock anyway.
>
> __nss_database_lookup fails only if there is a low-level failure (like a
> failed memory allocation). As far as I can see, syntax errors in the
> configuration file are ignored. You always get a positive result out of
> it, so it's only called once.
>
> > Loading it into a temporary first, then doing the check, and then
> > issuing an acquire-MO fence would be another way to avoid the extra
> > load.
>
> Couldn't we do an acquire load unconditionally? The == NULL case is
> really not worth optimizing for, it only happens during startup.
We could, but because __nss_database_lookup doesn't return the pointer,
we'd still have to do an atomic_load_relaxed (...) at the caller. So,
not much of a gain it seems.
> >> I still think we would be better off if we centralize this particular
> >> code in a single function with an interface similar to
> >> __nss_database_lookup. The manual inlining seems unnecessary.
> >
> > Doing all of that in __nss_database_lookup might be cleaner, I agree.
> > But my focus for this patch was just a bug fix, not refactoring; I'd
> > like to leave it to others to decide how to clean-up this part of glibc.
> > I also wasn't quite sure whether __nss_database_lookup is part of the
> > ABI or not.
>
> I looked at the current implementation and I already see a *ni != NULL
> in __nss_database_lookup, so I think we can just move the double-checked
> locking into __nss_database_lookup and simplify the callers.
Same name, but that's a different ni :) The ni in __nss_database_lookup
is the global variable; the one in the callers is the private data the
callers use.
> This would
> not result in an observable behavior change, so it is okay although
> __nss_database_lookup is part of the ABI.
You can make this cleaner if __nss_database_lookup would return the
value from the exactly-once-initialization, and this is what I initially
did when preparing this patch. But then I noticed that it looked to be
part of the ABI, and had to revert it. (This was before PTO, so I'm
just paging this back in right now... ;) )
> And the refactoring would mostly be about deleting a bunch of code
> around calls of __nss_database_lookup. The core changes in
> __nss_database_lookup (to move the release-MO store to the end, after
> all initialization) are needed either way.
If we can assume that the callers are always in sync with the
__nss_database_lookup, then I'd agree -- but then this wouldn't be
really an ABI issue. If the __nss_database_lookup implementation can be
older than the callers' changes, we need to be more careful. I think I
looked into this and concluded that we couldn't improve the code much if
having to consider compatibility, but I don't remember the details of
why I thought that, sorry.
My suggestion would be to first apply this patch (if it's OK otherwise),
and then try to refactor. We could put a comment into the first patch
highlighting that refactoring might be beneficial.