[PATCH] malloc tcache: Debugger now sees the address of the corrupted chunk.

Sun Dec 13 02:16:36 GMT 2020

On Sat, Dec 12, 2020 at 7:32 AM DJ Delorie <dj@redhat.com> wrote:
>
> Adder <adder.thief@gmail.com> writes:
> >
> >   malloc.c:2961:5: error: statement with no effect [-Werror=unused-value]
> >        * (volatile char **) tcache->entries[tc_idx];
>
> Sorry, add (void) in front of it:
>
>         (void) * (volatile char **) tcache->entries[tc_idx];

This eliminates the "no effect" warning (=> successful build).
But the desired crash is not produced.
No machine code is generated.

On the bright side, I have just learned to use (void) to avoid a warning.

> The volatile might be in the wrong spot.  Try:
>
>         (void) * (char * volatile *) tcache->entries[tc_idx];

Hurray !

This eliminates the "no effect" warning (=> successful build).

And shuffling the qualifier eliminates the need for casting to void.
I.e. the following also produces no compiler warning:

                 * (char * volatile *) tcache->entries[tc_idx];

The desired crash is produced (=> we have the address of the corrupted chunk).

The generated machine code is lean -- just three extra instructions:

    ...
    cmp rdi, rsi  ; This tests whether r9 is zero. Trust me.
    je @@Passed
    mov rsi, [r9] ; Crash here as desired. r9 is e: '(gdb) print e'.
    @@Passed:
    ...

We can also use the following:

  ((volatile tcache_entry *) tcache->entries[tc_idx])->next;

It produces the exact same machine code, but it is more resilient
to changes in the definition of our tcache_entry struct
(example: to a field being added before our first field, i.e. before "next").

But we may still want to change this to:

    ...
    cmp rdi, rsi  ; This tests whether r9 is zero. Trust me.
    je @@Passed
    mov rsi, [r9] ; Crash here if e is not a pointer.
    mov [r9], rsi ; Crash here if e happens to be an only-allow-reading pointer.
    @@Passed:
    ...

I.e. we may still want to add the extra "mov [r9], rsi" instruction.

This helps us when the user illegally overwrites our tcache_entry's "next" field
with a value which (unluckily) happens to REVEAL_PTR an address in a
present page,
but (luckily) that page is not writable.

We can perform the "Is this page writable ?" test using:

  ((volatile tcache_entry *) e_next)->next = e_next->next;

where "e_next" is the variable I have been trying to introduce
in order to avoid repeated typing of "tcache->entries[tc_idx]":

  tcache_entry *const e_next = REVEAL_PTR (e->next);
  ...
  tcache->entries[tc_idx] = e_next;

Let us add a test which detects, while we still have the important "e",
whether the "key" field in our tcache_entry has been corrupted,
for example after such usage (on a 64-bit platform):

  size_t *const p1 = (size_t *) malloc (8 * sizeof (size_t));
  ...
  free (p1);
  p1 [1] = 0x647952657A696C45;
  ...
  char *const p7 = (char *) malloc (64);

This is useful when p1 [0] is not illegally modified, but p1 [1] is.

Normally, if we detect this corruption, we should call malloc_printerr.
But with all the stack frames resulting from that call,
our important value "e" value is lost to the debugger (at least on x86_64).
Therefore, I suggest we crash right away.

The resulting function is:

 static __always_inline void *
 tcache_get (size_t tc_idx)
 {
   tcache_entry *e = tcache->entries[tc_idx];
+
   if (__glibc_unlikely (!aligned_OK (e)))
     malloc_printerr ("malloc(): unaligned tcache chunk detected");
-  tcache->entries[tc_idx] = REVEAL_PTR (e->next);
+
+  tcache_entry *const e_next = REVEAL_PTR (e->next);
+
+  if (e_next)
+    ((volatile tcache_entry *) e_next)->next = e_next->next;
+
+  if (__glibc_unlikely (e->key != tcache))
+    * ((tcache_entry *volatile *) 0) = e;
+
+  tcache->entries[tc_idx] = e_next;
   --(tcache->counts[tc_idx]);
   e->key = NULL;
   return (void *) e;
 }

I wish to suggest that we quarrel on Comments after we agree on Code. (-:

> The (void) thing is a common idiom.

Thank you for the "(void)" thing and for the "* (volatile T *)" thing.