This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFD - Support for memory tagging in GLIBC

From: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
To: Paul Eggert <eggert at cs dot ucla dot edu>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>
Cc: nd <nd at arm dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 10 Sep 2019 14:38:52 +0000
Subject: Re: RFD - Support for memory tagging in GLIBC
Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5v+EcC94NWz6PqEyN0VNxkmaKpPUokSNuIkOL4VsYEI=; b=lrHLuTi8vU+qY5xH2UKmCv4GkWTPIbjLt8sBEdxl5PEq6GavORdw7szsuFLN+o3pqqCFBry3AvBmu4SI6nSqwZj8tIyy3yuuP2iVj+1WmISI33HdcoxMqQDwT67dOMXrKHgrPN5MB8uWtBIqRm1kkywuTe+TaKjofV/zEs7elOIF9mZbJcGKybaYpoIoymxrL1O1QxDTT9gZ2mZZhyaRB2O/LZPiRgo/Y6B1dxAsaHQXcoIAMibg/rQBncpvUq0xe34bm79RT5sr22PSSIy9/gfM5TFUEDNjHC3ifcUzuywSVI0MVezdYO3MJYE2YCmoO2b5fFifwfq73CkuRXGK9Q==
Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KskjVAf+r1J1Q9TTvi09FyPZO5w/WRXxbxBnelyasRBWLapGD/kOugRPRiDaOnnEDTbdNy08mc1G+UI16EUoAEy27hTlPavjFlILMbCtgvUL7BSrHUB7WmaWKimc404KiGQQlDzhXBgO7+2OSbF78qNRX11yYBJGiMtVocKgdbwOVHxFxgUTe18kmAHwVg871hHZu2fKcOfzajx2QJaZ2cWMYV4lGCxwkAF6Qiq3G48yOH0nmAgL2uptQLJbcT74IK1EknyIeVj0zBeMre57hFzNaN/n3UaujUh+4ZsYVSOxwWo163BCbLabjn0XmGdPXMMgjqTnkCLV2PcK+sIXrQ==
Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs dot Nagy at arm dot com;
References: <8306e032-f980-a409-5239-74629e79d041@arm.com> <bb183d3c-76d3-fbb3-8588-544b5615c2a2@cs.ucla.edu>

On 06/09/2019 16:03, Paul Eggert wrote:
> Thanks for the heads-up on this. Some questions:

Richard already went to Montreal, we will likely discuss
the design at the Cauldron, but i try to give some answers.

> 
> If I build an app with the proposed glibc, can I run it with memory tagging disabled even if the hardware supports memory tagging? If so, will
> malloc etc. behave differently (e.g., return different addresses) depending on whether memory tagging is enabled? That might make debugging harder.
> 

MTE will likely be opt-in even if the hw supports it,
the exact mechanisms are not finalized yet.

there can be two malloc implementations: normal and
tagging one, and libc switches between them at process
startup before the first malloc. (they won't behave the
same way: tagging requires rounding up things to 16 byte
granules and brk won't be used as it has no convenient
flags to enable/disable tagging.)

if the hw supports MTE we can use the tagging malloc
always and just enable/disable tagging of pages
returned by malloc (coloring instructions are still
executed but they have no effect on non-taggable
pages).

or tagging malloc can be used with tagged memory pages,
but disable tag checking of memory accesses (likely
this will be a per process or thread setting that can
be disabled, enabled with async imprecise faults, or
enabled with precise faults).

> What's the application model when a program violates the memory tagging rules? Presumably it gets a SIGSEGV; what sort of si_codes does it get?
> etc. This would need documentation, presumably.

this is part of the kernel abi that's still being worked
out, but for "precise fault" mode it will be a signal
(SIGSEGV or SIGBUS) and likely there will be an extension
to the signal context structure to hold tagging related
information if necessary.

> 
> Could you elaborate a bit on how the proposed work relates to SPARC ADI, support for which is in the Linux kernel? For example, would the
> differences between the two memory-tagging architectures be visible to the application?

i can give some overview:

aarch64 has a TBI feature: "top byte ignored" of
pointers at memory access, this is enabled by default
on linux so applications can use that byte for their
own hacks.

however the linux syscall abi didn't allow pointers
with non-zero top bytes passed into the kernel, see
linux Documentation/arm64/tagged-pointers.rst

there is a proposal for a syscall abi (opt-in) where
the kernel accepts pointers with non-zero top byte.
https://lore.kernel.org/linux-arm-kernel/20190821164730.47450-3-catalin.marinas@arm.com/
https://lore.kernel.org/linux-arm-kernel/20190821164730.47450-4-catalin.marinas@arm.com/
https://lore.kernel.org/linux-arm-kernel/20190823163717.19569-1-catalin.marinas@arm.com/
this requires significant work on the kernel side
to sanitize/untag user pointers.

HWASAN is a software implementation of the memory
tagging idea that uses shadow map (like ASAN) to
store the color information for memory ranges and
the aarch64 TBI feature to have colors in pointers
(otherwise passing tagged pointer to something that
is non-HWASAN aware would lead to a fault). this
requires the new syscall abi so tagged pointers work
in syscalls too. (compared to ASAN it does not require
poisoned red zone around objects to detect failures.)

AArch64 MTE is similar but the hw handles the color
information per 16 byte granule and 4bit colors in
pointers and can check normal load/store (i.e. works
without code gen changes). this is not compatible with
existing software that relies on TBI, so may require
abi markings per dso etc. it also requires the new
syscall abi to allow tagged pointers into the kernel,
but instead of just ignoring the top byte the kernel
should verify the color e.g. when the read and write
syscalls access user memory.

SPARC ADI is similar to MTE, but the kernel syscall abi
is not yet fixed to deal with tagged pointers in drivers
etc which means any non-trivial software would not work
with tagged heap on linux currently. (iirc it uses 4bit
tags and memory is tagged with 64byte granule. it calls
the tags versions, MTE calls them colors.)

ideally heap tagging should be transparent to conforming
c programs and then there is no difference between ADI and
MTE, but in practice i'd expect various subtle differences
(fault handling, kernel side checks, tags in core dumps,
tags in weird memory, reserved colors for special meaning,
enable/disable knobs,...).

i hope this helps.

> 
> I know you weren't asking for detailed code comments, but a couple things anyway:
> 
>> +  bytes = ROUND_UP_ALLOCATION_SIZE (bytes);
> 
> This misfires if bytes == SIZE_MAX.
> 
>> +  /* Quickly check that the freed pointer matches the tag for the memory.
>> +     This gives a useful double-free detection.  */
>> +  *(volatile char *)mem;
> 
> Wouldn't it be safer to check all the storage that the freed pointer addresses? Although O(size) rather than O(1), with MTE free is O(size)
> anyway so....
> 
>> +  /* If we are using memory tagging, then we need to set the tags
>> +     regardless of MORECORE_CLEARS, so we zero the whole block while
>> +     doing so.  */
> 
> Should there be a MORECORE_TAGS? That is, the morecore primitive might be able to tag for us just as it clears for us.

Follow-Ups:
- Re: RFD - Support for memory tagging in GLIBC
  - From: Paul Eggert

References:
- RFD - Support for memory tagging in GLIBC
  - From: Richard Earnshaw (lists)
- Re: RFD - Support for memory tagging in GLIBC
  - From: Paul Eggert

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]