[00/03] per-aspace target_gdbarch (+local gdbarch obsoletion?)

Wed Jan 20 08:55:00 GMT 2010

Hi Jan,

sorry for the late reply, I'm currently on vacation with limited email access.
(I'll be back in Februray.)

> let's imaginge address_space contains its gdbarch now (it is [01/03]).

I'm not quite sure I understand what an address space's gdbarch is supposed
to be.  Currently, there are two main sources for where a (target-side)
gdbarch can come from:

- The architecture of the target connection (target_gdbarch)
  This basically answers the question: Which architecture defines the
  registers sent across a GDB remote protocol connection (or across
  the native OS interface, as the case may be)?

- The architecture of a frame (get_frame_arch ())
  This basically answers the question: Which architecture defines the
  registers to be shown with "info registers" in that frame?

On the Cell/B.E., the target_gdbarch is *always* PowerPC, while the
frame architecture may be PowerPC or SPU, depending on where we are
currently executing.  (Note that if the frame architecture differs
from the target architecture, there needs to be a target stack layer
that implements the fetch/store_registers routine for the frame
architecture even while the lower layers' fetch/store_registers
routines use the target architecture.)

There's a third source of gdbarchs, which can be considered more on
the symbol side: get_objfile_arch returns the architecture associated
with an objfile.  This is somewhat limited in that it e.g. does not
define register properties.  Maybe at some point we should really split
this up into two different data structures ...  Note that I'd consider
the bp_location->gdbarch to be a symbol-side architecture of this type.
It is used only for symbol-side properties like which ISA to use for
the breakpoint instructions.  (On the other hand, the *breakpoint*
architecture is a target-side gdbarch, as it answers the question how
register names appearing in the breakpoint string or condition string
as supposed to be interpreted.)

So it is not clear to me what the "architecture of an address space"
is supposed to mean.   Is this a symbol-side property determined
before the inferior starts, e.g. the objfile architecture of files
loaded into the associated program space?  Or is is a target-side
property determined at run-time, e.g. the architecture defining the
ISA and register set whenever code located in the address space runs?
In either case, the question arises why that should be always a
single architecture ...

> There is now gdbarch in bp_location since:
> 	Re: [06/15] Per-breakpoint architecture support
> 	http://sourceware.org/ml/gdb-patches/2009-07/msg00075.htmlw
> so one can have an idea replacing bp_location->gdbarch by
> bp_location->pspace->aspace->gdbarch.  Still I think it is inappropriate now
> as it would break compatibility with Cell/B.E. - its PPE and SPE code shares
> the same aspace now while it needs different gdbarch for functions like
> gdbarch_breakpoint_from_pc; Cell/B.E. GDB architecture was described at:
> [rfc] [00/18] Cell multi-arch debugger
> http://sourceware.org/ml/gdb-patches/2008-09/msg00134.html
> % Adresses with most significant bit zero are PowerPC addresses.  Addresses with
> % most significant bit one are SPU addresses, and encode the context ID in the
> % high 32-bit (except the MSB), and the local store address in the low 32-bit.
> 
> Now with multi-executable support checked-in I think SPE should be represented
> as fully separate program_space with its own separate address_space.  This
> whole frame-specific gdbarch would become obsolete together with
> get_current_gdbarch().  target_gdbarch tracking current_program_space
> (=current inferior) would be enough.

That's not quite true.  On the one hand, Cell/B.E. code should certainly be
changed to use multiple address spaces instead of encoding everything into
a single CORE_ADDR value.  I guess this would also imply using separate
program spaces instead of handling SPE binaries as "shared libraries".
On the other hand, a Cell/B.E. application would still be a single *inferior*
(identified by a single PID), which means we'd have to support inferiors
containing multiple program spaces.  We would still have target_gdbarch
always point to PowerPC while frame architectures would still vary.

> Or do you think it is worth to support multi gdbarch in single address_spaces?
> I can imagine wishes for 64bit code linking with proprietary 32bit libraries.
> For Adobe Flash (before it was 64bit) it has been resolved by different
> process/address-space via nspluginwrapper. Still technically the possibility
> is left open in Linux kernel according to Roland McGrath (2009-09-18):
> % there is a "32-bit-flavored task", but it's not really true that it has
> % 32-bit registers.  there is no 32-bit-only userland condition.  any task can
> % always ljmp to the 64-bit code segment and run 64-bit insns including
> % a 64-bit syscall

I certainly agree that within a single address space, code beloning to 
multiple different architectures can be present.  Even on the Cell/B.E.,
and SPE local store memory image can be mapped into the main PowerPC
address space.  It would be nice if the disassemble could still use the
proper SPU instructions if used on this range of the main address space ...

Other hybrid platforms may use a single address space anyway.

> This patchset is required for PIE but only in biarch (64bit debugger -> 32bit
> inferior) or just for 32bit host built with --enable-64-bit-bfd mode.

It seems to me there's something else going on.  Maybe the underlying problem
your patch set is trying to solve can be handled elsewhere?

For example, considering your change to breakpoint_address_match:

 breakpoint_address_match (struct address_space *aspace1, CORE_ADDR addr1,
 			  struct address_space *aspace2, CORE_ADDR addr2)
 {
+  int addr1_bit, addr2_bit;
+  CORE_ADDR addr1_mask = CORE_ADDR_MAX;
+  CORE_ADDR addr2_mask = CORE_ADDR_MAX;
+
+  gdb_assert (aspace1 != NULL);
+  gdb_assert (aspace2 != NULL);
+
+  addr1_bit = gdbarch_addr_bit (address_space_gdbarch (aspace1));
+  if (addr1_bit < (sizeof (CORE_ADDR) * HOST_CHAR_BIT))
+    addr1_mask = ((CORE_ADDR) 1 << addr1_bit) - 1;
+  addr2_bit = gdbarch_addr_bit (address_space_gdbarch (aspace2));
+  if (addr2_bit < (sizeof (CORE_ADDR) * HOST_CHAR_BIT))
+    addr2_mask = ((CORE_ADDR) 1 << addr2_bit) - 1;
+

It seems odd to me to allow multiple different CORE_ADDR values to 
represent the same point in an address space, which requires every
compare to perform the masking.

Wouldn't it be more natural to force a canonical representation at
the time the address is *determined* in the first place?  This may
have the advantage that you may still have the required information
at this place, e.g. while relocating an objfile due to PIE, you know
the objfile's architecture and hence know which mask needs to be
applied ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com