Bug 12416 - Dynamic linker incorrectly reduces the stack size when making it executable
Summary: Dynamic linker incorrectly reduces the stack size when making it executable
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.11
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
: 12225 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-01-19 17:07 UTC by Rodrigo Kumpera
Modified: 2014-06-27 14:03 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rodrigo Kumpera 2011-01-19 17:07:19 UTC
If one dlopen a DSO that is not annotated with non executable stack, the dynamic linker will have to make all stacks executables and when doing so for the main thread it sometimes unmap a few pages from the botton of the stack.

This is on amd64 linux with kernel 2.6.34.7 and glibc 2.11.2.

Using the following program one can notice this happening:

#include <stdio.h>
#include <dlfcn.h>
#include <string.h>

size_t proc_self () {
	FILE *proc = fopen ("/proc/self/maps", "r");
	char *l = NULL;
	size_t size, stack_ptr;
	stack_ptr = (size_t)&size;
	while (getline (&l, &size, proc) != -1) {
		size_t start, end;
		sscanf (l, "%lx-%lx", &start, &end);
		if (strstr (l, "[stack]")){
			printf ("found stack: %s", l);
			return end;
		}
	}
	return 0;
	
}

void main (int argc, char *argv[]) {
	size_t r = proc_self ();
	void *handle = dlopen (argc [1], RTLD_LAZY);
	printf ("handle is %p\n", handle);
	size_t r2 = proc_self ();

	if (r != r2)
		printf ("KILLED %lx bytes\n", r - r2);
}



To see this behavior any DSO with execstack set must be used.
The output will be something like:

found stack: 7fff831a9000-7fff831ca000 rw-p 00000000 00:00 0                          [stack]
found stack: 7fff831a9000-7fff831c9000 rwxp 00000000 00:00 0                          [stack]
KILLED 1000 bytes


As you can see a one page has been unmapped from the bottom of the stack and this does affect programs that expect the stack bounds to be sane. This makes pthread_getattr_np return value be unreliable in face of dynamic loading.
Comment 1 Ulrich Drepper 2011-01-19 18:29:08 UTC
You're wrong.  The code works correctly.  Only the parts of the stack which have to be executable are changed.  If the kernel cannot deal with that in the /proc output this is a kernel bug.
Comment 2 Roland McGrath 2011-01-19 18:53:23 UTC
Changing the page protections splits the mapping into two mappings.  There is no other way it could be in the kernel.  The mapping is the granularity at which page protections are managed.  The /proc/pid/maps output is arguably wrong because after the split, only the lower mapping displays as [stack].  

But pthread_getattr_np doesn't pay attention to the [stack] magic name anyway, it just looks at addresses.  So pthread_getattr_np needs to look at multiple contiguous mappings after the one containing __libc_stack_end if the intent is that its results are consistent after a protection change.
Comment 3 Rodrigo Kumpera 2011-01-19 19:25:21 UTC
I believe this is really not the case, if we look at the next mapping after the stack is made executable, a hole is introduced:


found stack:         7fff9c51a000-7fff9c53b000 rw-p 00000000 00:00 0                          [stack]
entry after stack: 7fff9c5ff000-7fff9c600000 r-xp 00000000 00:00 0                          [vdso]

found stack:         7fff9c51a000-7fff9c53a000 rwxp 00000000 00:00 0                          [stack]
entry after stack: 7fff9c53b000-7fff9c53b000 rw-p 00000000 00:00 0 

strace gives:

mprotect(0x7fff9c539000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC|PROT_GROWSDOWN) = 0

So mprotect is ignoring the last page and producing one unmapped page.


But how about this one:

found stack:         7fffe0ebf000-7fffe0ee0000 rw-p 00000000 00:00 0                          [stack]
entry after stack: 7fffe0fbc000-7fffe0fbd000 r-xp 00000000 00:00 0                          [vdso]

found stack:         7fffe0ebf000-7fffe0ede000 rwxp 00000000 00:00 0                          [stack]
entry after stack: 7fffe0edf000-7fffe0ee0000 rw-p 00000000 00:00 0 

mprotect(0x7fffe0edd000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC|PROT_GROWSDOWN) = 0

Here mprotect is ignoring the top 2 pages of the stack and producing one unmapped page.

Both of the above results where with the exact same binary, so how much of the stack will be slashed by
glibc is hard to predict.

It's very hard to defend that the unpredictable value passed to mprotect is a kernel bug.
Comment 4 Siddhesh Poyarekar 2012-05-26 04:31:54 UTC
Fixed by making pthread_getattr_np output consistent before and after an execstack DSO is loaded. The split vma (or even a non-executable portion after the first frame) does not really make any operational difference and we'd like to keep as less of the stack as executable as we can get away with in the interest of security paranoia.
Comment 5 Rodrigo Kumpera 2012-05-26 15:30:52 UTC
Thanks for improving the situation, but this is a real problem for programs that do conservative stack scanning such as those that depends on boehm-gc or mono.

Even if pthread_getattr_np no longer lies about stack boundaries, querying it all the time is nether optimal and safe as there is no guarantee that it can be done under signal context - and both boehm and mono do stack scanning under signal context.

It still has not been answered why shrinking the stack is needed.
Comment 6 Siddhesh Poyarekar 2012-05-28 05:26:59 UTC
(In reply to comment #5)
> Thanks for improving the situation, but this is a real problem for programs
> that do conservative stack scanning such as those that depends on boehm-gc or
> mono.

How so? pthread_getattr_np will return the same stack boundaries and you should not have to query it all the time if you're only interested in the underflow end. If you're interested in the overflow end then you have no choice other than calling pthread_getattr_np all the time because the overflow end may change as extra vmas get mapped in below it.

> It still has not been answered why shrinking the stack is needed.

It is more the case that adding extra code to make sure that the entire vma is covered is not demonstrated to be useful at all. If you can provide an unambiguous reason as to why that is required, then we could consider going through that extra effort.

I'd suggest you start a discussion on this on the developer mailing list (libc-alpha at sourceware.org) so that everyone can pitch in on this.
Comment 7 Siddhesh Poyarekar 2012-10-03 18:17:49 UTC
*** Bug 12225 has been marked as a duplicate of this bug. ***