Bug 19341 - Fix link namespace coordination issues e.g. static vs. dynamic vs. dlmopen.
Summary: Fix link namespace coordination issues e.g. static vs. dynamic vs. dlmopen.
Status: SUSPENDED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.21
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 20468 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-12-07 18:41 UTC by Ian Lance Taylor
Modified: 2017-07-26 19:27 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Lance Taylor 2015-12-07 18:41:53 UTC
This glibc bug report is extracted from https://golang.org/issue/13470.

If you statically link the following C program and run it on Ubuntu Wily, which uses glibc 2.21, it will crash.

I believe that the problem is that the ctype code relies on TLS variables initialized by __ctype_init.  The getpwuid_r function in a statically linked program relies on opening a supporting shared library.  The supporting shared library can not see the ctype information in the statically linked executable, which has no dynamic symbol table, and therefore has its own copy.  That copy is correctly initialized by a call to __ctype_init.  However, if there are any existing threads, the shared library copy of the TLS ctype information is never initialized.  So, if the program manages to call getpwuid_r on a thread that existed when the shared library was opened, it crashes.

Test case:

#include <stdio.h>
#include <ctype.h>
#include <sys/types.h>
#include <pwd.h>
#include <pthread.h>

static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

static void *thread(void *arg) {
	struct passwd pwd;
	char buf[1024];
	struct passwd *result;
	pthread_mutex_lock(&mutex);
	getpwuid_r(0, &pwd, buf, sizeof buf, &result);
	return NULL;
}

int main() {
	pthread_t tid;
	struct passwd pwd;
	char buf[1024];
	struct passwd *result;
	void *retval;
	pthread_mutex_lock(&mutex);
	pthread_create(&tid, NULL, thread, NULL);
	getpwuid_r(0, &pwd, buf, sizeof buf, &result);
	pthread_mutex_unlock(&mutex);
	pthread_join(tid, &retval);
	return 0;
}

Contents of /etc/nsswitch.conf on the failing system:

# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd:         compat
group:          compat
shadow:         compat
gshadow:        files

hosts:          files dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis
Comment 1 Carlos O'Donell 2015-12-07 19:26:17 UTC
(In reply to Ian Lance Taylor from comment #0)
> This glibc bug report is extracted from https://golang.org/issue/13470.
> 
> If you statically link the following C program and run it on Ubuntu Wily,
> which uses glibc 2.21, it will crash.
> 
> I believe that the problem is that the ctype code relies on TLS variables
> initialized by __ctype_init.  The getpwuid_r function in a statically linked
> program relies on opening a supporting shared library.  The supporting
> shared library can not see the ctype information in the statically linked
> executable, which has no dynamic symbol table, and therefore has its own
> copy.  That copy is correctly initialized by a call to __ctype_init. 
> However, if there are any existing threads, the shared library copy of the
> TLS ctype information is never initialized.  So, if the program manages to
> call getpwuid_r on a thread that existed when the shared library was opened,
> it crashes.

We first noticed this at Red Hat with the docker self tests in September. Incarnations of this bug have been around for a long time and they have been closed by the previous community as unsupported. The current community is committed to supporting some kind of static linking, but with caveats and good documentation about the limits of static linking.

Your analysis is correct (I did the same debugging myself), and I've had a fix for this internally at Red Hat for a while. You just switch the ctype initialization to an init-at-first-use pattern, but that doesn't solve all the problems. You get even more breakage further on from other global state variables which are not shared. The first one is __libc_multiple_threads, which in the dynamic shared "namespace" is zero because no threads were created in that namespace. It causes the dynamic namespace to avoid doing any locking because it doesn't think there are any threads active (but there are). This leads to race conditions in malloc and lots of problems.

We've been looking at this internally at Red Hat on-and-off for the last couple of months. There is no easy solution. You have the same problem with errno, and all other global state variables that need to be shared between the static "namespace" and the dynamic "namespace." In fact I've been pondering a similar problem for the extensions to dlmopen that I'm working on which would allow alternate namespaces to atleast operate sensibly.

Either way there is going to be quit a bit of work required to fix this and it won't get fixed soon. We'll need some kind of collected shared global state we can pass to the new namespace which allows the implementation to behave sensibly.

The conclusion at Red Hat was to continue to investigate the issue upstream, but that Docker would move to a non-static linkage, and that any statically linked applications would be very very small and not likely to use any of the features which trigger this problem.

I don't know if that's the answer you wanted to hear, but there it is. I'm moving this bug to SUSPENDED until someone steps up to work on the issue. I won't get to this for a while, but I'm happy to review patches or give technical guidance on proposed solutions.
Comment 2 Florian Weimer 2016-08-15 19:49:54 UTC
*** Bug 20468 has been marked as a duplicate of this bug. ***