This is the mail archive of the
gdb@sources.redhat.com
mailing list for the GDB project.
Harvard proposal
- To: gdb at sources dot redhat dot com
- Subject: Harvard proposal
- From: Nick Duffek <nsd at redhat dot com>
- Date: Sat, 10 Feb 2001 15:25:55 -0500
- CC: cagney at redhat dot com, dje at transmeta dot com, taylor at cygnus dot com, kevinb at cygnus dot com, msnyder at cygnus dot com, jimb at cygnus dot com, per at bothner dot com, eliz at delorie dot com
Recently, I took a stab at converting CORE_ADDR to a struct. It turned
out to be quite difficult, because there's a deeply-embedded assumption
that CORE_ADDR is an offset in a unified byte address space.
So instead, I wrote a patch that takes GDB partway toward the struct
core_addr goal. It insert a bunch of CORE_ADDR conversion macros that
create abstraction barriers between various CORE_ADDR generators and
consumers (user, hardware, target, object file).
My original motivation for the patch was to allow users to see and specify
real addresses without needing to know about the usual 0x1000000/0x2000000
offset and bit-shift conversions. Compensating for those conversions
during assembly debugging is tedious and potentially confusing.
In the following text, I describe the patch largely from that perspective.
However, the patch is general enough to allow architectures to make their
own choices about how to translate user-visible addresses.
The Problem
===========
GDB handles Harvard architectures by mapping instruction and data spaces
onto a single byte address space.
For example, d10v-tdep.c performs the following mapping:
data: 0x2000000 + addr
insn: 0x1000000 + (addr << 2)
The mapping is user-visible, which I think is problematic because:
1. The user is inconvenienced by needing to know bit shifts and arbitrary
data/instruction offsets when specifying and viewing addresses.
2. The user won't necessarily know how GDB will modify registers. $pc and
$sp are obvious candidates for modification, but other registers may have
multiple roles. For example, a link register may hold an instruction
address immediately after a subroutine call, but it may hold data
addresses or even integer values after it's been saved on the stack.
3. Generally speaking, the purpose of GDB is to provide accurate
information about software and hardware, and GDB's address translation
diminishes that accuracy. GDB should reveal, not obscure.
4. Expression evaluation breaks in any number of cases. For example, if
GDB is stopped just after the mvfc instruction in a call to the following
D10V function with n=3:
;; prime(n): return the nth prime number for 1 <= n <= 3.
.text
.global prime
prime:
;; save link register
st r13,@-sp
;; scale n by jump target size, add offset
add r0,r0
addi r0,1
;; calculate and jump to target pc
mvfc r1,pc
add r1,r0
jmp r1
;; 1st prime number
ldi r0,2
bra .L1
;; 2nd prime number
ldi r0,3
bra .L1
;; 3rd prime number
ldi r0,5
nop
.L1:
;; restore link register and return
ld r13,@sp+
jmp r13
the following commands work incorrectly:
(gdb) x/i $r13
0x501d: sub r0, r0 || sub r0, r0
(gdb) x/i $r1 + $r0
0x502c: sub r0, r0 || sub r0, r0
GDB doesn't (and can't) know that $r13 and $r1 hold instruction addresses
and $r0 holds an instruction offset, so it doesn't apply the necessary
internal conversions before querying memory. To compensate, the user
needs to enter the following:
(gdb) x/i ($r13 << 2) + 0x1000000
0x1014074 <main+24>: mv r1, r0 -> mv r0, r1
(gdb) x/i ($r1 << 2) + 0x1000000 + ($r0 << 2)
0x10140b0 <prime+40>: ldi.s r0, 0x5 || nop
Similar problems occur when dereferencing data addresses in registers.
A Solution
==========
Change GDB to treat user-visible addresses as real hardware addresses.
As has been discussed in other threads, this approach reveals the
ambiguity inherent in Harvard architectures. For example, should "x/i 0"
disassemble the first word of the instruction space or the data space?
An obvious disambiguator is an address syntax extension that indicates the
address space. In separate threads, Doug Evans proposed a "<space>:"
prefix and Per Bothner proposed a "@<space>" suffix. E.g.:
x/i insn:0
would disassemble instruction address 0 and
x/i data:0
would disassemble data address 0.
I think that in the absence of the disambiguator, GDB should pick a
reasonable default, e.g. "x/i 0" would disassemble instruction address 0.
That worked well in the two (not-yet-public) ports that use this patch.
An Implementation
=================
Conceptually partition GDB into components that might have a unique
interpretation of CORE_ADDR, e.g.:
user addresses displayed to and received from the GDB user
remote addresses specified to remote target for memory I/O
hardware addresses written to/read from memory or registers
object files symbol addresses
internal GDB all other occurrences of CORE_ADDR
and apply appropriate conversions when crossing boundaries between those
components. The patch does that using gdbarch macros with the following
nomenclature:
ADDR_<direction>_<component>[_<space>]
<direction>
IN moving to internal GDB from another component
OUT moving from internal GDB to another component
<component>
REAL user-visible and hardware addresses
OBJ symbol and entry-point addresses in object files
GDB internal GDB addresses
SEC offset in an object file section
REMOTE addresses specified to remote target for memory I/O
<space>
INSN instruction space
DATA data space
SEC infer space from the struct sec argument
TYPE infer space from the struct type argument
For example, ADDR_IN_REAL_TYPE (CORE_ADDR addr, struct type *type) returns
the real address ADDR of a TYPE object converted to an internal gdb
address. I've appended the current list of ADDR_* macros to this message.
[ADDR_IN_REAL_TYPE is identical to the existing POINTER_TO_ADDRESS; I
chose the alternative ADDR_* nomenclature because it results in shorter
names and reflects the hierarchical relationships between the macros.]
Architectures can use the ADDR_* macros to map multiple address spaces
into internal GDB CORE_ADDRs. For convenience, I wrote a harvard.c module
that handles simple d10v-ish bit-shift and bit-offset mappings to and from
the current internal GDB unified byte address space. The interface is:
extern void harvard_init (struct gdbarch *gdbarch,
CORE_ADDR gdb_data_off, int gdb_data_shift,
CORE_ADDR gdb_insn_off, int gdb_insn_shift,
CORE_ADDR obj_data_off, int obj_data_shift,
CORE_ADDR obj_insn_off, int obj_insn_shift,
CORE_ADDR remote_data_off, int remote_data_shift,
CORE_ADDR remote_insn_off, int remote_insn_shift);
I might split that into multiple calls to allow for future components.
I'll post the actual patch soon, and if people like the idea, I'll try
converting d10v-tdep.c to use it.
What do you think?
Nick
[gdbarch macros follow]
ADDR_IN_REAL_DATA (CORE_ADDR addr)
Return real data address ADDR converted to an internal gdb address.
ADDR_IN_REAL_INSN (CORE_ADDR addr)
Return real instruction address ADDR converted to an internal gdb
address.
ADDR_IN_REAL_SEC (CORE_ADDR addr, struct sec *sec)
Return real address ADDR in SEC converted to an internal gdb address.
ADDR_IN_REAL_TYPE (CORE_ADDR addr, struct type *type)
Return real address ADDR of a TYPE object converted to an internal gdb
address.
ADDR_IN_OBJ_DATA (CORE_ADDR addr)
Return object file data address ADDR converted to an internal gdb
address.
ADDR_IN_OBJ_INSN (CORE_ADDR addr)
Return object file instruction address ADDR converted to an internal
gdb address.
ADDR_IN_OBJ_SEC (CORE_ADDR addr, struct sec *sec)
Return object file address ADDR in SEC converted to an internal gdb
address.
ADDR_IN_OBJ (CORE_ADDR addr)
Return object file address ADDR converted to an internal gdb address.
ADDR_IN_OBJ_P ()
Whether to apply ADDR_IN_OBJ* conversions.
ADDR_IN_GDB_INSN (CORE_ADDR addr)
Return internal gdb address ADDR converted to an internal gdb
instruction address if it isn't one already.
ADDR_OUT_REAL (CORE_ADDR addr)
Return internal gdb address ADDR converted to a real address.
ADDR_OUT_OBJ (CORE_ADDR addr)
Return internal gdb address ADDR converted to an object file address.
ADDR_OUT_SEC (CORE_ADDR addr)
Return internal gdb address ADDR converted to an offset from the start
of its section.
ADDR_OUT_REMOTE (CORE_ADDR addr)
Return internal gdb address ADDR converted to a remote address.
ADDROFF_OUT_REAL (CORE_ADDR addr, CORE_ADDR offset)
Return internal gdb address OFFSET from ADDR converted to a real
address offset.