[RFC] AArch64 Memory tagging support

John Baldwin jhb@FreeBSD.org
Wed Aug 21 16:34:00 GMT 2019


On 8/21/19 3:39 AM, Alan Hayward wrote:
> This is a rough design for implementing ARMv8.5 MTE support in GDB,
> detailing the UI changes and sketching out the internals.
> The Linux interfaces (ptrace, coredumps etc) are currently still under
> discussion, and so it will be quite a while before the GDB code is
> implemented, but I wanted to get a design out early to ensure that the GDB
> requirements from the Linux interfaces are known.
> 
> Any comments are welcome. At this stage I’m more concerned about the overall
> strategy being workable.

I have several thoughts on this as I have a somewhat similar need for dealing
with memory tags, though slightly differently.  In my case, I work on a research
project called CHERI that assigns 1 bit tags to every 16-bytes (or in some cases
32-bytes) of memory as well as to certain registers.  I haven't yet really dealt
with tags in memory in my GDB patches to date, but will need to.  Also, in the
case of CHERI, we turn C and C++ pointers into 129-bit (128-bits plus the 1 bit
tag) where the extra 64 bits hold attributes like bounds and permissions of the
pointer, and the 1-bit tag determines validity.  You can ready more about it
at www.chericpu.com if you are curious.  We currently provide models of it on MIPS
and are bringing it up on RISC-V (simulations and FPGA).

To the extent that we can have somewhat generic tagging interface for GDB that
might cover sparc ADI as well, that might be nice.

> Background
> 
> The ARMv8.5 ISA introduces the Memory Tagging Extension (MTE) which allows
> 4bit tags to be assigned to each memory 16bytes of memory. Each allocation
> is referred to as Allocated Tag (AT) in the text below. ATs are stored
> separately to the main memory. When accessing a memory location, 4bits of
> the address are reserved for use as a tag. This is referred to as a Logical
> Tag (LT) in the text below. If the LT does not match the AT in a memory read
> or write, then the access will trap.
> 
> For more details see the MTE links here:
> https://developer.arm.com/architectures/cpu-architecture/a-profile#mte
> 
> For a very high-level overview see:
> https://threatpost.com/google-arm-android-bugs-memory-tagging/146950/
> 
> 
> GDB UI: Memory Access
> 
> In the general use case, when using GDB to examine memory, GDB should print
> out when a memory tag failure happens. However, the operation it was doing (for
> example, reading/writing memory) should still succeed. A GDB user would not
> expect a signal to be passed upwards to the subject program.
> 
> For example, x is an int* variable in the subject application and it contains
> an address with an incorrect LT:
> 
> (gdb) print x             /* x contains an incorrect LT. */
> $1 = 0x1234007c0
> (gdb) print *x
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $2 = 67
> (gdb) set *x = 72
> <incorrect memory tag 0x12 for address 0x1234007c0>
> (gdb) print *x
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $2 = 72

I would like to have something similar eventually where attempts to access an
out-of-bounds pointer would fail, but perhaps with some kind of override flag
(like p/r for disabling pretty-printers) to permit examining out-of-bounds
memory contents.  I think having the same type of override to "dump the memory
anyway, even if the tag is wrong" might be useful for users, though I agree
the default behavior should be to warn about invalid use.

> When printing areas of memory (for example with the command x) this warning
> should only be printed once per dump.
> 
> (gdb) x/20xw y
> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
> <incorrect memory tag 0x12 for address 0x1234007c0>
> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
> 0x1234007d0: 0x00000000 0x00000000 0xffffffff 0x00000009
> 0x1234007e0: 0x00033000 0x00000700 0x00000000 0x00000067

One other thing that might be nice to have is some kind of view of memory that
dumps tags and bytes in parallel, so something like:

(gdb) x/20xwt y
0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000 [0x13]
0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000 [0x12]

etc.

> However, there will be instances where the GDB user wants to either suppress
> any tag warning entirely or pass any errors upwards to the subject program as
> a signal. GDB already has similar functionality available for signals using
> the command handle. An Aarch64 only command "memtag” should be added for this.
> 
> (gdb) memtag handle
> Memory tag failures will be printed
> Memory tag failures will not raise a signal
> (gdb) print *x
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $1 = 67
> (gdb) memtag handle noprint
> Memory tag failures will not be printed
> Memory tag failures will not raise a signal
> (gdb) print *x
> $2 = 67
> (gdb) memtag handle raise
> Memory tag failures will not be printed
> Memory tag failures will raise a signal
> (gdb) print *x
> Program terminated with signal SIGSEGV, Segmentation fault.
> The program no longer exists.
> 
> Suggested arguments to "memtag handle" are "print", "noprint", "raise”,
> "noraise”. This will only change the behaviour for memory tag failures
> generated by the user inside GDB (ie this not affect inferior behaviour)

Given that these features are somewhat MTE-specific, I would perhaps suggest
using 'mte' instead of 'memtag' for the name.
 
> GDB UI: Examining Tags
> 
> The memtags command can also be used to read and write memory tags for a given
> memory location. Also, we want to be able to read and write tags from a given
> address.
> 
> (gdb) print x                               /* x contains an incorrect tag. */
> $1 = 0x1234007c0
> (gdb) print *x
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $1 = 67
> (gdb) memtag showlogicaltag x        /* Extract the 4bit LT from the passed in pointer */
> $2 = 0x12
> (gdb) memtag showtag x        /* Show the AT for the memory address. Never returns errors if address contains the wrong LT.   */
> $3 = 0x13
> (gdb) memtag checktag x        /* Same as showtag, but also errors using the rules in "memtag handle".  */
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $4 = 0x13
> (gdb) memtag writetag x 0x12        /* Write the tag for the passed in memory address  */
> (gdb) memtag checktag x
> $5 = 0x12
> (gdb) memtag writelogicaltag x 0x14        /* Update the tag in the pointer */
> (gdb) print x                               /* x contains an incorrect tag. */
> $1 = 0x1434007c0
> (gdb) memtag checktag x
> <incorrect memory tag 0x14 for address 0x1234007c0>

I would perhaps also use 'mte' here.  'memtag showtag' might be generic to
memory tags in general, but the others are likely MTE-specific.

> Linux Ptrace
> 
> Linux will ignore tags when reading/writing memory via PEEK/POKE ptrace
> methods and /proc/<pid>/mem.
> 
> New ptrace commands PTRACE_PEEKDATATAG and PTRACE_POKEDATATAG will be added
> to read/write data tags. Peek will allow a range of tags to be read in a
> single call.

On FreeBSD (we use a variant of FreeBSD for CHERI research) I had a somewhat
similar plan which was to add a new "address space" for PT_IO that returned
packed tag bits.

> Memory accesses inside GDB
> 
> It should be enough for AArch64 to override target_xfer_partial.
> If the process is using memory tags, and the address contains a LT, then
> call PEEKDATATAG for the memory range being accessed and check if the access
> would succeed. If it doesn't then print just the first failure to the screen.
> If it does succeed then call the overridden function to access the memory.> 
> 
> Core Dumps
> 
> There will be extra sections inside a core dump containing the memory tags.
> The core low version of target_xfer_partial needs overriding. 
> Similar to the xfer_partial override in the previous section, add
> functionality to check tags, and report failures. Check the tags by
> accessing the MTE segments in the corefile.  Memory is stored in the core
> dump untagged, so addresses will need stripping before accessing.

I am curious how you were planning to describe tags in cores.  I don't have
concrete thoughts yet but the approach I had been leaning towards was
having something similar to PT_LOAD, but perhaps PT_TAGS or the like whose
header would include "tag size" and "tag stride" and the contents of the
segment would be packed tag bits from a starting VA in the header.  This
would permit storing both 1-bit and 4-bit tags and would also in theory
support some other memory tagging schemes I'm aware of from some other
research.

One thing that I would like that you don't currently have a need for (though
perhaps the memory display mode I suggested above might need) is a way to
pass around a word of memory and it's tag together, perhaps as a single
'struct value'.  In my case I would like to have the tag associated with
either a register or memory present when printing pointers.  (I have a new
gdbarch method in my patches that prints pointer attributes and right now
it ignores the tags, but it would be nice to annotate untagged pointers
which in CHERI's case are not dereferencable.)

-- 
John Baldwin



More information about the Gdb-patches mailing list