When pthread_create'ed thread firstly calls malloc, glibc create a new heap region for it. Size of this heap depends on compile-time configuration and in my case ( x84-64 arch ) HEAP_MAX_SIZE is 64MB In function malloc/arena.c:new_heap there is a condigion when heap will be aligned by HEAP_MAX_SIZE. This means that address of created heap will be aligned to HEAP_MAX_SIZE what is 64MB in my case. Now we can compute the probability to guess the heap address by attacker. 2^48 - 4096 is size of user-mode task. 64MB is 2^26 that means there is 48-26=22, 2^22 possible heaps in the application. For many Linux distributives mmap_rnd_bits is 28 (tunable with /proc/sys/vm/mmap_rnd_bits ). It means for such systems 8 hi-bits would be set - 48 bits - (mmap_rnd_bits + PAGE_SHIFT) = 8, this means we know 8 hi bits of mmap_base address - it will be started from 0x7f. What does it means? it means that if application create thread and call malloc from new thread, it is 2^14 possible values for such heap. Proof of Concept: first get program that outputs malloc addr: #include <pthread.h> #include <stdlib.h> #include <stdio.h> #include <errno.h> #include <sys/mman.h> void * first(void *x) { int a = (int)x; int *p_a = &a; void *ptr; ptr = malloc(8); if (ptr == 0) { printf("Failed to alloc %d\n", errno); return -1; } printf("%p\n", ptr); return 0; } int main() { int res; pthread_t one; res = pthread_create(&one, NULL, &first, 0); if (res) { printf("Failed create thread %d\n", errno); return -1; } void *val; pthread_join(one,&val); return 0; } now execute it many times and get histogram with python: import subprocess d = {} i = 0 while i < 1000000: output = subprocess.check_output('./thread_stack_heap_hysto') key = int(output, 16) if key in d: d[key] += 1 else: d[key] = 1 i += 1 print 'total: ', len(d) for key in d: print hex(key), d[key] and the result: $ python get_hysto.py total: 16385 … 16385 is 0x4001, and 1 here is because max address of kernel for 1 page less then 2^48. Summary, heap should not be aligned on 64 megabytes, this behaviour allows attacker to brute force heap of pthread created thread.
Flagging as security- because this is just hardening.
Hello, Can you please explain me what exactly this hardening is? If this hardening of security, this should be a security bug, But if you think something different, please explain me. From my point of view this bug only about security because lead to ASLR bypass.
(In reply to Ilya Smith from comment #2) > Hello, > > Can you please explain me what exactly this hardening is? > If this hardening of security, this should be a security bug, > But if you think something different, please explain me. > > From my point of view this bug only about security because lead to ASLR > bypass. ASLR bypass itself is not a vulnerability. It may simplify exploitation of another, unrelated vulnerability.
My understanding of reporter intention is to point out the malloc heap alignment restriction lowers the total entropy available to kernel, which is an implementation detail from glibc. But I agree with Florian reasoning that this is hardening feature.
This patch solves the weakness discovered by Ilya Smith. The problem is important in x86_64, but we think that is SEVERE on other systems (4 ASLR bits is like no ASLR). Ptmalloc aligns to a very large value the arenas. This large alignment greatly reduces the ASLR entropy of all the dynamically allocated data from thread code. Allocations carried out from main() are not affected, unless brk can not expand the heap. The impact of the alignment greatly depends on the architecture. The expected entropy shall be at least that of the mmap() on each system, but the actual entropy is shown in the following table: +----------+--------+-----------------+ | System | Mmap | Thread malloc | +----------+--------+-----------------+ | x86_64 | 28 | 14 | | i386 | 8 | 0 | | x32 | 8 | 0 | | ARM | 8 | 0 | | AARCH | 18 | 4 | | PPC64 | 14 | 4 | | s390(64) | 11 | 4 | | s390 | 11 | 3 | +----------+--------+-----------------+ As it can be seen, all systems but x86_64 are severely affected by this weakness. This patch removes the need to align arenas to HEAP_MAX_SIZE by changing the macro heap_for_ptr(ptr). Arenas are randomized with the same entropy than the rest of mmaped objects. The entropy to randomize the thread's arenas is obtained from the ASLR value of the libraries, by using the address of the __arena_rnd: __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & ~(pagesize-1); This way, if the user disables ASLR (via randomize_va_space, setarch -R, or when using gdb), the entropy of the arena is also automatically disabled. Summary of the patch features: *- Does not change the allocation policy. *- Does not add new data structures (only a long variable). *- The temporal overhead is almost undetectable (just 2 more cpu instructions per free: one "add" and one "or"). malloc is not affected. *- It is fully backward compatible. *- It restores completely, the ASLR entropy to thread's heaps. With the patch the entropy on x86_64 is 28bits (15 more than the current one), and i386 is 8bits (8 more), which is the same entropy that the mallocs made from the main thread. Checked on x86_64-linux-gnu and i386-linux-gnu Authors: Ismael Ripoll-Ripoll <iripoll@disca.upv.es> Hector Marco-Gisbert <hmarco@hmarco.org> --- malloc/arena.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/malloc/arena.c b/malloc/arena.c index cecdb7f4c4..8dd7b9f028 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -48,7 +48,7 @@ /* A heap is a single contiguous memory region holding (coalesceable) malloc_chunks. It is allocated with mmap() and always starts at an - address aligned to HEAP_MAX_SIZE. */ + address aligned to arena_rnd. */ typedef struct _heap_info { @@ -122,10 +122,15 @@ int __malloc_initialized = -1; ptr = arena_get2 ((size), NULL); \ } while (0) -/* find the heap and corresponding arena for a given ptr */ - +/* find the heap and corresponding arena for a given ptr. Note that + heap_info is not HEAP_MAX_SIZE aligned any more. But a random + offset from the expected alignment, known by the process. This way + it is fast to get the head of the area whereas it is ASLR compatible. +*/ +static unsigned long __arena_rnd; #define heap_for_ptr(ptr) \ - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) + ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \ + | __arena_rnd)) #define arena_for_chunk(ptr) \ (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) @@ -293,6 +298,11 @@ ptmalloc_init (void) __malloc_initialized = 0; + size_t pagesize = GLRO (dl_pagesize); + /* Get the entropy from the already existing ASLR. */ + __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & + ~(pagesize-1); + #ifdef SHARED /* In case this libc copy is in a non-default namespace, never use brk. Likewise if dlopened from statically linked program. */ @@ -439,7 +449,7 @@ dump_heap (heap_info *heap) /* If consecutive mmap (0, HEAP_MAX_SIZE << 1, ...) calls return decreasing addresses as opposed to increasing, new_heap would badly fragment the address space. In that case remember the second HEAP_MAX_SIZE part - aligned to HEAP_MAX_SIZE from last mmap (0, HEAP_MAX_SIZE << 1, ...) + aligned to arena_rnd from last mmap (0, HEAP_MAX_SIZE << 1, ...) call (if it is already aligned) and try to reuse it next time. We need no locking for it, as kernel ensures the atomicity for us - worst case we'll call mmap (addr, HEAP_MAX_SIZE, ...) for some value of addr in @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad) { p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) & ~(HEAP_MAX_SIZE - 1)); + /* The heap_info is at a random offset from the alignment to + HEAP_MAX_SIZE. */ + p2 = (char *) ((unsigned long) p2 | __arena_rnd); + if (p1 + HEAP_MAX_SIZE <= p2) + p2 -= HEAP_MAX_SIZE; ul = p2 - p1; if (ul) __munmap (p1, ul); @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad) else { /* Try to take the chance that an allocation of only HEAP_MAX_SIZE - is already aligned. */ + is already aligned to __arena_rnd. */ p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); if (p2 == MAP_FAILED) return 0; - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd) { __munmap (p2, HEAP_MAX_SIZE); return 0; -- 2.20.1
(In reply to Ismael Ripoll from comment #5) > This patch solves the weakness discovered by Ilya Smith. The problem > is important in x86_64, but we think that is SEVERE on other systems > (4 ASLR bits is like no ASLR). > > Ptmalloc aligns to a very large value the arenas. This large alignment > greatly reduces the ASLR entropy of all the dynamically allocated data > from thread code. Allocations carried out from main() are not > affected, unless brk can not expand the heap. > > The impact of the alignment greatly depends on the architecture. The > expected entropy shall be at least that of the mmap() on each system, > but the actual entropy is shown in the following table: > > +----------+--------+-----------------+ > | System | Mmap | Thread malloc | > +----------+--------+-----------------+ > | x86_64 | 28 | 14 | > | i386 | 8 | 0 | > | x32 | 8 | 0 | > | ARM | 8 | 0 | > | AARCH | 18 | 4 | > | PPC64 | 14 | 4 | > | s390(64) | 11 | 4 | > | s390 | 11 | 3 | > +----------+--------+-----------------+ > > As it can be seen, all systems but x86_64 are severely affected by > this weakness. > > This patch removes the need to align arenas to HEAP_MAX_SIZE by > changing the macro heap_for_ptr(ptr). Arenas are randomized with the > same entropy than the rest of mmaped objects. The entropy to randomize > the thread's arenas is obtained from the ASLR value of the libraries, > by using the address of the __arena_rnd: > > __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & > ~(pagesize-1); > > This way, if the user disables ASLR (via randomize_va_space, setarch > -R, or when using gdb), the entropy of the arena is also automatically > disabled. > > Summary of the patch features: > > *- Does not change the allocation policy. > *- Does not add new data structures (only a long variable). > *- The temporal overhead is almost undetectable (just 2 more cpu > instructions per free: one "add" and one "or"). malloc is not > affected. > *- It is fully backward compatible. > *- It restores completely, the ASLR entropy to thread's heaps. > > With the patch the entropy on x86_64 is 28bits (15 more than the > current one), and i386 is 8bits (8 more), which is the same > entropy that the mallocs made from the main thread. > > Checked on x86_64-linux-gnu and i386-linux-gnu > > Authors: > Ismael Ripoll-Ripoll <iripoll@disca.upv.es> > Hector Marco-Gisbert <hmarco@hmarco.org> > > --- > malloc/arena.c | 29 ++++++++++++++++++++++------- > 1 file changed, 22 insertions(+), 7 deletions(-) > > diff --git a/malloc/arena.c b/malloc/arena.c > index cecdb7f4c4..8dd7b9f028 100644 > --- a/malloc/arena.c > +++ b/malloc/arena.c > @@ -48,7 +48,7 @@ > > /* A heap is a single contiguous memory region holding (coalesceable) > malloc_chunks. It is allocated with mmap() and always starts at an > - address aligned to HEAP_MAX_SIZE. */ > + address aligned to arena_rnd. */ > > typedef struct _heap_info > { > @@ -122,10 +122,15 @@ int __malloc_initialized = -1; > ptr = arena_get2 ((size), NULL); \ > } while (0) > > -/* find the heap and corresponding arena for a given ptr */ > - > +/* find the heap and corresponding arena for a given ptr. Note that > + heap_info is not HEAP_MAX_SIZE aligned any more. But a random > + offset from the expected alignment, known by the process. This way > + it is fast to get the head of the area whereas it is ASLR compatible. > +*/ > +static unsigned long __arena_rnd; > #define heap_for_ptr(ptr) \ > - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) > + ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - > 1)) \ > + | __arena_rnd)) > #define arena_for_chunk(ptr) \ > (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) > > @@ -293,6 +298,11 @@ ptmalloc_init (void) > > __malloc_initialized = 0; > > + size_t pagesize = GLRO (dl_pagesize); > + /* Get the entropy from the already existing ASLR. */ > + __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & > + ~(pagesize-1); > + > #ifdef SHARED > /* In case this libc copy is in a non-default namespace, never use brk. > Likewise if dlopened from statically linked program. */ > @@ -439,7 +449,7 @@ dump_heap (heap_info *heap) > /* If consecutive mmap (0, HEAP_MAX_SIZE << 1, ...) calls return decreasing > addresses as opposed to increasing, new_heap would badly fragment the > address space. In that case remember the second HEAP_MAX_SIZE part > - aligned to HEAP_MAX_SIZE from last mmap (0, HEAP_MAX_SIZE << 1, ...) > + aligned to arena_rnd from last mmap (0, HEAP_MAX_SIZE << 1, ...) > call (if it is already aligned) and try to reuse it next time. We need > no locking for it, as kernel ensures the atomicity for us - worst case > we'll call mmap (addr, HEAP_MAX_SIZE, ...) for some value of addr in > @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad) > { > p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) > & ~(HEAP_MAX_SIZE - 1)); > + /* The heap_info is at a random offset from the alignment to > + HEAP_MAX_SIZE. */ > + p2 = (char *) ((unsigned long) p2 | __arena_rnd); > + if (p1 + HEAP_MAX_SIZE <= p2) > + p2 -= HEAP_MAX_SIZE; > ul = p2 - p1; > if (ul) > __munmap (p1, ul); > @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad) > else > { > /* Try to take the chance that an allocation of only HEAP_MAX_SIZE > - is already aligned. */ > + is already aligned to __arena_rnd. */ > p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); > if (p2 == MAP_FAILED) > return 0; > > - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) > + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd) > { > __munmap (p2, HEAP_MAX_SIZE); > return 0; > -- > 2.20.1 Thanks for the work. However, patch discussion is done through the email list [1]. Could you re-send it on the libc-alpha? It seems that alignment restriction was added originally to simplify the heap_for_ptr macro and I think it should be added on patch description (stating there is no other restriction about arena alignment). Also, would be possible to add a testcase? I am not sure which approach would be better (a maillist discussion might give some ideas). Maybe by parametrize the mmap entropy information you compiled on arch-specific files (similar to the stackinfo.h), create N thread that run M allocation, record the output, and compare the histogram with the expected entropy with a certain margin. Finally, please also add the outcome of the bench-malloc-{simple,thread} with and without the patch. [1] https://sourceware.org/mailman/listinfo/libc-alpha