This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

symbol name space issues with NSS modules


[I've changed the subject line to reflect what this thread is actually
talking about now, and trimmed the CC of people I'm sure are on the list.]

> gdb has a timeval_add symbol, and none of the direct dependencies 
> of gdb define a conflicting symbol.  However, gdb calls gethostbyname, 
> and that, due to some internal implementation detail of glibc, 
> dynamically loads a module which does have a global symbol of the 
> same name, which ended up unintentionally overridden by gdb's symbol.

I see.  That was not clear to me from the bug, but perhaps I did not
read it closely enough.  It would certainly be clearer if the bug
included a small reproducer program rather than just talking about a
case with gdb.

I'd call this just a simple bug in the NSS module in question.  We can
fix those without addressing the broader issue about NSS modules in
the abstract.  This is just a "normal" case of a name space violation,
which is something that we identify and fix in libc as a matter of
course.

For name space issues in the core libraries, we catch them with our
"linknamespace" tests.  These are exhaustive in the sense of covering
all possible name space violations in all the library code examined.
But they only test static linking cases.  For most of our code, this
is a sufficient proxy to catch problems that would arise in shared
library cases just because the static-library and shared-library code
are close enough to being compiled from the same sources.  It's easy
to test this way for static linking precisely because it's static
linking, so all the dependency arcs are travelled at link time.

For DSOs in general, it's hard to find these issues without false
positives.  Consider libc.so itself: it defines the nonstandard symbol
strfry, but that is OK since nothing (in the transitive closure of
dependencies from functions in standardized name spaces) refers to
strfry.  Your application's strfry will take precedence over libc's,
but that's OK because nobody will be making calls to libc's strfry.
The naive approach of just looking for nonstandard symbols will see
strfry in the DSO and report a false-positive error.  In the static
linking linknamespace tests, that problem doesn't arise because the
linker is implicitly doing the fine-grained call-graph analysis as a
by-product of linking and so it will see nothing leading to strfry.

We could theoretically implement some sufficient testing for this when
building the DSOs.  That is, look at each object file going into the
DSO and construct the static call graph to verify that no arc crosses
from a more-restricted name space into a less-restricted one.  That
doesn't seem like a huge amount of work, but I'm not sure it would
catch anything in practice that we don't already catch with our static
linking linknamespace tests.

For modules that are programmatically loaded at runtime, the problem
is a bit different.  (That is, NSS modules, iconv modules, libcidn,
and libgcc_s.)  In the general case, it's difficult or impossible to
know what library code (and hence what name space constraint
environments) enters what loaded module code (and hence what parts of
the DSO's static call graph are relevant).  But for the modules we
actually use, it's easy enough to define some conservative rules.

For NSS modules, I think all the library entry points that can lead to
entering NSS module code are POSIX.1 symbols or GNU/misc extensions.
Being conservative, we can ignore the latitude available when the
entry point in question is a GNU extension and just act as if every
entry into an NSS module is from a POSIX.1 function.  If that's true,
then we can just make the blanket statement that NSS modules can refer
only to the POSIX.1, ISO C, and implementation name spaces.  This is
something we can test statically in the built module itself, though if
it has other DT_NEEDED dependencies then testing those correctly might
be difficult (to do without false positives).

I just eyeballed the symbol table of libnss_files.so, which is the NSS
module that is simplest and so seems the least likely to have
problematic symbol dependencies.  But even that has one real bug and
one false positive.  The real bug is calling inet_network, which is a
nonstandard symbol reachable via the POSIX.1 getnet* functions.  The
false positive is xdecrypt, which every NSS module seems to use in its
implementation of the getsecretkey backend--but getsecretkey is a
nonstandard function in the same name space category as xdecrypt, so
that's actually fine but looks bad to the naive test I just suggested
above.  (Arguably we ought to change the NSS protocol for getsecretkey
so that the module returns the "raw" value and libc calls xdecrypt.
Then at least it would become possible to write a fully-functioning
NSS module that passes the naive name space test.)

For less trivial modules that use other DSOs to do their work, this
can be a very hard problem.  In the general case, it could require
name space discipline for all of the reachable code.  When the DSOs in
question are general-purpose things (which off hand seems like the
only reason there would be dependent DSOs) then they'd be exporting
symbols in the implementation name space (_*), which is directly
contrary to general advice.  

In many common circumstances, dynamic linking rules and symbol
versions will mitigate the problem substantially.  That is, an
application will not have its miscellaneous global symbols in its
dynamic symbol table, only the ones that were found at link time to
override the same-named symbol defined in some directly-linked DSO
(such as libc).  e.g., even if an application had a function called
yp_match this would not be a problem if the application didn't itself
also link against -lnsl (or another library using that symbol
name)--so its yp_match won't override the one that libnss_nis calls.
It might even be the case that it wouldn't override yp_match if the
application was linked against a different library providing yp_match,
that library uses symbol versioning, and its SONAME and/or symbol
version for yp_match do not match libnsl.so.1's.  (That is, the
application's dynamic symbol table does contain yp_match, but its
symbol version binding is not for libnsl.so.1's version.)  But I'm
frankly unsure how symbol version binding affects all these cases.
I'm pretty sure that there are cases where some unintended overriding
could happen (maybe if the other DSO in question doesn't use symbol
versioning?).  I'm quite sure that if an application directly links
against -lnsl and intentionally overrides its yp_match symbol, that
this will indeed intercept calls from libnss_nis.so (and whether that
counts as intended or not is a complex question).

This morass might lead one to want to use dynamic linking name spaces
(dlmopen) for NSS modules.  But that is its own whole can of worms
that might well wind up being worse in totally different ways.

> AFAICS, gdb has not renamed the symbol, and I'm not the one
> who tripped on it in the first place, so I'm not sure
> whether this particular instance of the bug was fixed some
> other way.

The symbol timeval_add does not appear anywhere in the libc sources,
including ChangeLog files.  So I tend to doubt that this symbol name
was used by any libc code in the past either.  That suggests that the
case you hit was in a third-party NSS module.  Obviously we can't
ourselves do anything directly about the quality of third-party
modules.  But we could potentially provide a tool to vet third-party
modules for our name space rules.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]