Summary: | Heap address of pthread_create thread is aligned. | ||
---|---|---|---|
Product: | glibc | Reporter: | Ilya Smith <blackzert> |
Component: | nptl | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | UNCONFIRMED --- | ||
Severity: | normal | CC: | adhemerval.zanella, blackzert, carnil, drepper.fsp, fweimer, iripoll, yann |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Ilya Smith
2018-02-16 12:09:45 UTC
Flagging as security- because this is just hardening. Hello, Can you please explain me what exactly this hardening is? If this hardening of security, this should be a security bug, But if you think something different, please explain me. From my point of view this bug only about security because lead to ASLR bypass. (In reply to Ilya Smith from comment #2) > Hello, > > Can you please explain me what exactly this hardening is? > If this hardening of security, this should be a security bug, > But if you think something different, please explain me. > > From my point of view this bug only about security because lead to ASLR > bypass. ASLR bypass itself is not a vulnerability. It may simplify exploitation of another, unrelated vulnerability. My understanding of reporter intention is to point out the malloc heap alignment restriction lowers the total entropy available to kernel, which is an implementation detail from glibc. But I agree with Florian reasoning that this is hardening feature. This patch solves the weakness discovered by Ilya Smith. The problem is important in x86_64, but we think that is SEVERE on other systems (4 ASLR bits is like no ASLR). Ptmalloc aligns to a very large value the arenas. This large alignment greatly reduces the ASLR entropy of all the dynamically allocated data from thread code. Allocations carried out from main() are not affected, unless brk can not expand the heap. The impact of the alignment greatly depends on the architecture. The expected entropy shall be at least that of the mmap() on each system, but the actual entropy is shown in the following table: +----------+--------+-----------------+ | System | Mmap | Thread malloc | +----------+--------+-----------------+ | x86_64 | 28 | 14 | | i386 | 8 | 0 | | x32 | 8 | 0 | | ARM | 8 | 0 | | AARCH | 18 | 4 | | PPC64 | 14 | 4 | | s390(64) | 11 | 4 | | s390 | 11 | 3 | +----------+--------+-----------------+ As it can be seen, all systems but x86_64 are severely affected by this weakness. This patch removes the need to align arenas to HEAP_MAX_SIZE by changing the macro heap_for_ptr(ptr). Arenas are randomized with the same entropy than the rest of mmaped objects. The entropy to randomize the thread's arenas is obtained from the ASLR value of the libraries, by using the address of the __arena_rnd: __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & ~(pagesize-1); This way, if the user disables ASLR (via randomize_va_space, setarch -R, or when using gdb), the entropy of the arena is also automatically disabled. Summary of the patch features: *- Does not change the allocation policy. *- Does not add new data structures (only a long variable). *- The temporal overhead is almost undetectable (just 2 more cpu instructions per free: one "add" and one "or"). malloc is not affected. *- It is fully backward compatible. *- It restores completely, the ASLR entropy to thread's heaps. With the patch the entropy on x86_64 is 28bits (15 more than the current one), and i386 is 8bits (8 more), which is the same entropy that the mallocs made from the main thread. Checked on x86_64-linux-gnu and i386-linux-gnu Authors: Ismael Ripoll-Ripoll <iripoll@disca.upv.es> Hector Marco-Gisbert <hmarco@hmarco.org> --- malloc/arena.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/malloc/arena.c b/malloc/arena.c index cecdb7f4c4..8dd7b9f028 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -48,7 +48,7 @@ /* A heap is a single contiguous memory region holding (coalesceable) malloc_chunks. It is allocated with mmap() and always starts at an - address aligned to HEAP_MAX_SIZE. */ + address aligned to arena_rnd. */ typedef struct _heap_info { @@ -122,10 +122,15 @@ int __malloc_initialized = -1; ptr = arena_get2 ((size), NULL); \ } while (0) -/* find the heap and corresponding arena for a given ptr */ - +/* find the heap and corresponding arena for a given ptr. Note that + heap_info is not HEAP_MAX_SIZE aligned any more. But a random + offset from the expected alignment, known by the process. This way + it is fast to get the head of the area whereas it is ASLR compatible. +*/ +static unsigned long __arena_rnd; #define heap_for_ptr(ptr) \ - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) + ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \ + | __arena_rnd)) #define arena_for_chunk(ptr) \ (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) @@ -293,6 +298,11 @@ ptmalloc_init (void) __malloc_initialized = 0; + size_t pagesize = GLRO (dl_pagesize); + /* Get the entropy from the already existing ASLR. */ + __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & + ~(pagesize-1); + #ifdef SHARED /* In case this libc copy is in a non-default namespace, never use brk. Likewise if dlopened from statically linked program. */ @@ -439,7 +449,7 @@ dump_heap (heap_info *heap) /* If consecutive mmap (0, HEAP_MAX_SIZE << 1, ...) calls return decreasing addresses as opposed to increasing, new_heap would badly fragment the address space. In that case remember the second HEAP_MAX_SIZE part - aligned to HEAP_MAX_SIZE from last mmap (0, HEAP_MAX_SIZE << 1, ...) + aligned to arena_rnd from last mmap (0, HEAP_MAX_SIZE << 1, ...) call (if it is already aligned) and try to reuse it next time. We need no locking for it, as kernel ensures the atomicity for us - worst case we'll call mmap (addr, HEAP_MAX_SIZE, ...) for some value of addr in @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad) { p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) & ~(HEAP_MAX_SIZE - 1)); + /* The heap_info is at a random offset from the alignment to + HEAP_MAX_SIZE. */ + p2 = (char *) ((unsigned long) p2 | __arena_rnd); + if (p1 + HEAP_MAX_SIZE <= p2) + p2 -= HEAP_MAX_SIZE; ul = p2 - p1; if (ul) __munmap (p1, ul); @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad) else { /* Try to take the chance that an allocation of only HEAP_MAX_SIZE - is already aligned. */ + is already aligned to __arena_rnd. */ p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); if (p2 == MAP_FAILED) return 0; - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd) { __munmap (p2, HEAP_MAX_SIZE); return 0; -- 2.20.1 (In reply to Ismael Ripoll from comment #5) > This patch solves the weakness discovered by Ilya Smith. The problem > is important in x86_64, but we think that is SEVERE on other systems > (4 ASLR bits is like no ASLR). > > Ptmalloc aligns to a very large value the arenas. This large alignment > greatly reduces the ASLR entropy of all the dynamically allocated data > from thread code. Allocations carried out from main() are not > affected, unless brk can not expand the heap. > > The impact of the alignment greatly depends on the architecture. The > expected entropy shall be at least that of the mmap() on each system, > but the actual entropy is shown in the following table: > > +----------+--------+-----------------+ > | System | Mmap | Thread malloc | > +----------+--------+-----------------+ > | x86_64 | 28 | 14 | > | i386 | 8 | 0 | > | x32 | 8 | 0 | > | ARM | 8 | 0 | > | AARCH | 18 | 4 | > | PPC64 | 14 | 4 | > | s390(64) | 11 | 4 | > | s390 | 11 | 3 | > +----------+--------+-----------------+ > > As it can be seen, all systems but x86_64 are severely affected by > this weakness. > > This patch removes the need to align arenas to HEAP_MAX_SIZE by > changing the macro heap_for_ptr(ptr). Arenas are randomized with the > same entropy than the rest of mmaped objects. The entropy to randomize > the thread's arenas is obtained from the ASLR value of the libraries, > by using the address of the __arena_rnd: > > __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & > ~(pagesize-1); > > This way, if the user disables ASLR (via randomize_va_space, setarch > -R, or when using gdb), the entropy of the arena is also automatically > disabled. > > Summary of the patch features: > > *- Does not change the allocation policy. > *- Does not add new data structures (only a long variable). > *- The temporal overhead is almost undetectable (just 2 more cpu > instructions per free: one "add" and one "or"). malloc is not > affected. > *- It is fully backward compatible. > *- It restores completely, the ASLR entropy to thread's heaps. > > With the patch the entropy on x86_64 is 28bits (15 more than the > current one), and i386 is 8bits (8 more), which is the same > entropy that the mallocs made from the main thread. > > Checked on x86_64-linux-gnu and i386-linux-gnu > > Authors: > Ismael Ripoll-Ripoll <iripoll@disca.upv.es> > Hector Marco-Gisbert <hmarco@hmarco.org> > > --- > malloc/arena.c | 29 ++++++++++++++++++++++------- > 1 file changed, 22 insertions(+), 7 deletions(-) > > diff --git a/malloc/arena.c b/malloc/arena.c > index cecdb7f4c4..8dd7b9f028 100644 > --- a/malloc/arena.c > +++ b/malloc/arena.c > @@ -48,7 +48,7 @@ > > /* A heap is a single contiguous memory region holding (coalesceable) > malloc_chunks. It is allocated with mmap() and always starts at an > - address aligned to HEAP_MAX_SIZE. */ > + address aligned to arena_rnd. */ > > typedef struct _heap_info > { > @@ -122,10 +122,15 @@ int __malloc_initialized = -1; > ptr = arena_get2 ((size), NULL); \ > } while (0) > > -/* find the heap and corresponding arena for a given ptr */ > - > +/* find the heap and corresponding arena for a given ptr. Note that > + heap_info is not HEAP_MAX_SIZE aligned any more. But a random > + offset from the expected alignment, known by the process. This way > + it is fast to get the head of the area whereas it is ASLR compatible. > +*/ > +static unsigned long __arena_rnd; > #define heap_for_ptr(ptr) \ > - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) > + ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - > 1)) \ > + | __arena_rnd)) > #define arena_for_chunk(ptr) \ > (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) > > @@ -293,6 +298,11 @@ ptmalloc_init (void) > > __malloc_initialized = 0; > > + size_t pagesize = GLRO (dl_pagesize); > + /* Get the entropy from the already existing ASLR. */ > + __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & > + ~(pagesize-1); > + > #ifdef SHARED > /* In case this libc copy is in a non-default namespace, never use brk. > Likewise if dlopened from statically linked program. */ > @@ -439,7 +449,7 @@ dump_heap (heap_info *heap) > /* If consecutive mmap (0, HEAP_MAX_SIZE << 1, ...) calls return decreasing > addresses as opposed to increasing, new_heap would badly fragment the > address space. In that case remember the second HEAP_MAX_SIZE part > - aligned to HEAP_MAX_SIZE from last mmap (0, HEAP_MAX_SIZE << 1, ...) > + aligned to arena_rnd from last mmap (0, HEAP_MAX_SIZE << 1, ...) > call (if it is already aligned) and try to reuse it next time. We need > no locking for it, as kernel ensures the atomicity for us - worst case > we'll call mmap (addr, HEAP_MAX_SIZE, ...) for some value of addr in > @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad) > { > p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) > & ~(HEAP_MAX_SIZE - 1)); > + /* The heap_info is at a random offset from the alignment to > + HEAP_MAX_SIZE. */ > + p2 = (char *) ((unsigned long) p2 | __arena_rnd); > + if (p1 + HEAP_MAX_SIZE <= p2) > + p2 -= HEAP_MAX_SIZE; > ul = p2 - p1; > if (ul) > __munmap (p1, ul); > @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad) > else > { > /* Try to take the chance that an allocation of only HEAP_MAX_SIZE > - is already aligned. */ > + is already aligned to __arena_rnd. */ > p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); > if (p2 == MAP_FAILED) > return 0; > > - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) > + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd) > { > __munmap (p2, HEAP_MAX_SIZE); > return 0; > -- > 2.20.1 Thanks for the work. However, patch discussion is done through the email list [1]. Could you re-send it on the libc-alpha? It seems that alignment restriction was added originally to simplify the heap_for_ptr macro and I think it should be added on patch description (stating there is no other restriction about arena alignment). Also, would be possible to add a testcase? I am not sure which approach would be better (a maillist discussion might give some ideas). Maybe by parametrize the mmap entropy information you compiled on arch-specific files (similar to the stackinfo.h), create N thread that run M allocation, record the output, and compare the histogram with the expected entropy with a certain margin. Finally, please also add the outcome of the bench-malloc-{simple,thread} with and without the patch. [1] https://sourceware.org/mailman/listinfo/libc-alpha |