[PATCH] gdb/python: add gdb.Architecture.format_address

Andrew Burgess aburgess@redhat.com
Mon Feb 21 17:27:21 GMT 2022


Eli Zaretskii via Gdb-patches <gdb-patches@sourceware.org> writes:

>> Date: Fri, 11 Feb 2022 16:17:21 +0000
>> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
>> Cc: Andrew Burgess <andrew.burgess@embecosm.com>
>> 
>> diff --git a/gdb/NEWS b/gdb/NEWS
>> index e173d38c3a1..4f4f0c2af6d 100644
>> --- a/gdb/NEWS
>> +++ b/gdb/NEWS
>> @@ -187,6 +187,11 @@ GNU/Linux/LoongArch    loongarch*-*-linux*
>>  
>>  GNU/Linux/OpenRISC		or1k*-*-linux*
>>  
>> +  ** New function gdb.Architecture.format_address(ADDRESS), which
>> +     takes an address in the currently selected inferior's address
>> +     space, and returns a string representing the address.  The format
>> +     of the returned string is '0x.... <symbol+offset>'.
>> +
>>  *** Changes in GDB 11
>>  
>>  * The 'set disassembler-options' command now supports specifying options
>> diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
>> index c1a3f5f2a7e..50443f7b704 100644
>> --- a/gdb/doc/python.texi
>> +++ b/gdb/doc/python.texi
>> @@ -6016,6 +6016,25 @@
>>  @code{gdb.Architecture}.
>>  @end defun
>>  
>> +@defun Architecture.format_address (@var{address})
>> +Returns @var{address}, an address within the currently selected
>
> Our style is to say "Return", not "Returns".
>
> Also, saying "return ADDRESS" basically misses the main rationale of
> this function, I think; see below.

I've completely rewritten both the /doc/ entry, and the NEWS entry based
on your feedback.

>
>> +inferior's address space, formatted as a string.  When a suitable
>> +symbol can be found to associate with @var{address} this will be
>> +included in the returned string, formatted like this:
>> +
>> +@smallexample
>> +0x00001042 <symbol+16>
>> +@end smallexample
>> +
>> +If there is no symbol that @value{GDBN} can find to associate with
>> +@var{address} then the returned string will just contain
>> +@var{address}.
>> +
>> +If @var{address} is not accessible within the current inferior's
>> +address space, this function will still return a string containing
>> +@var{address}.
>> +@end defun
>
> More generally, I wonder whether the name "format_address" is the best
> one we could come up with.  Isn't this the equivalent of "info symbol"
> CLI command?  If so, why not call it "address_to_symbol" or somesuch?

Given we have an actual gdb.Symbol class, I'm reluctant to use
address_to_symbol because I think that might give the unrealistic
expectation that this function returns a gdb.Symbol object.

This function is really a wrapper around the internal function
'print_address', but, as the Python version doesn't print anything, I
ended up with format_address.

I'd also be reluctant to go with address_to_string as that might give
the impression that it "just" converts a number to an string, which
obviously would be a really weird thing to have as a separate function
in Python.

I'm certainly not against renaming, if we can come up with a better
name... maybe 'format_address_info'?  I don't know... I still kind of
like 'format_address'...

>
> This goes back to the documentation: saying that a method takes its
> argument and returns it as a string makes the reader wonder why would
> we need such a trivial method.  So the documentation should start by
> saying that the method returns SYMBOL+OFFSET that corresponds to
> ADDRESS, and only mention that it returns ADDRESS as a string as the
> fallback, when SYMBOL cannot be found.

Thanks, I took this advice.  I'm not sure if my use of @samp{} is
acceptable in the new docs - maybe there's better formatting constructs
I could/should use.

Thanks,
Andrew

---

commit eba54d7150c8d87b34db41594ab4f6aef95cd847
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Sat Oct 23 09:59:25 2021 +0100

    gdb/python: add gdb.Architecture.format_address
    
    Add a new method gdb.Architecture.format_address, which is a wrapper
    around GDB's print_address function.
    
    This method takes an address, and returns a string with the format:
    
      ADDRESS <SYMBOL+OFFSET>
    
    Where, ADDRESS is the original address, formatted as hexadecimal, and
    padded with zeros on the left up to the width of an address in the
    current architecture.
    
    SYMBOL is a symbol whose address range covers ADDRESS, and OFFSET is
    the offset from SYMBOL to ADDRESS in decimal.
    
    If there's no SYMBOL whose address range covers ADDRESS, then the
    <SYMBOL+OFFSET> part is not included.
    
    This is useful if a user wants to write a Python script that
    pretty-print addresses, the user no longer needs to do manual symbol
    lookup, and additionally, things like the zero padding on addresses
    will be consistent with the builtin GDB behaviour.

diff --git a/gdb/NEWS b/gdb/NEWS
index 9da74e71796..c4e31188fdb 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -185,6 +185,12 @@ GNU/Linux/LoongArch    loongarch*-*-linux*
      set styling').  When false, which is the default if the argument
      is not given, then no styling is applied to the returned string.
 
+  ** New function gdb.Architecture.format_address(ADDRESS), that takes
+     an address, and returns a string formatted as:
+       ADDRESS <SYMBOL+OFFSET>
+     This is the same format that GDB uses when printing address,
+     symbol, and offset information from the disassembler.
+
 * New features in the GDB remote stub, GDBserver
 
   ** GDBserver is now supported on OpenRISC GNU/Linux.
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index c1a3f5f2a7e..a095d055807 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -6016,6 +6016,30 @@
 @code{gdb.Architecture}.
 @end defun
 
+@defun Architecture.format_address (@var{address})
+Return a string in the format @samp{ADDRESS <SYMBOL+OFFSET>}, where
+@samp{ADDRESS} is @var{address} formatted in hexadecimal,
+@samp{SYMBOL} is a symbol, the address range of which, covers
+@var{address}, and @samp{OFFSET} is the offset from @samp{SYMBOL} to
+@var{address} in decimal.  This is the same format that @value{GDBN}
+uses when printing address, symbol, and offset information, for
+example, within disassembler output.
+
+If no @samp{SYMBOL} has an address range that covers @var{address},
+then the @samp{<SYMBOL+OFFSET>} part is not included in the returned
+string, instead the returned string will just contain the
+@var{address} formatted as hexadecimal.
+
+In all cases, the @samp{ADDRESS} component will be padded with leading
+zeros based on the width of an address for the current architecture.
+
+An example of the returned string is:
+
+@smallexample
+0x00001042 <symbol+16>
+@end smallexample
+@end defun
+
 @node Registers In Python
 @subsubsection Registers In Python
 @cindex Registers In Python
diff --git a/gdb/python/py-arch.c b/gdb/python/py-arch.c
index 0f273b344e4..95ae931e73e 100644
--- a/gdb/python/py-arch.c
+++ b/gdb/python/py-arch.c
@@ -348,6 +348,31 @@ gdbpy_all_architecture_names (PyObject *self, PyObject *args)
  return list.release ();
 }
 
+/* Implement gdb.architecture.format_address(ADDR).  Provide access to
+   GDB's print_address function from Python.  The returned address will
+   have the format '0x..... <symbol+offset>'.  */
+
+static PyObject *
+archpy_format_address (PyObject *self, PyObject *args, PyObject *kw)
+{
+  static const char *keywords[] = { "address", nullptr };
+  PyObject *addr_obj;
+  CORE_ADDR addr;
+  struct gdbarch *gdbarch = nullptr;
+
+  ARCHPY_REQUIRE_VALID (self, gdbarch);
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords, &addr_obj))
+    return nullptr;
+
+  if (get_addr_from_python (addr_obj, &addr) < 0)
+    return nullptr;
+
+  string_file buf;
+  print_address (gdbarch, addr, &buf);
+  return PyString_FromString (buf.c_str ());
+}
+
 void _initialize_py_arch ();
 void
 _initialize_py_arch ()
@@ -391,6 +416,12 @@ group GROUP-NAME." },
     METH_NOARGS,
     "register_groups () -> Iterator.\n\
 Return an iterator over all of the register groups in this architecture." },
+  { "format_address", (PyCFunction) archpy_format_address,
+    METH_VARARGS | METH_KEYWORDS,
+    "format_address (ADDRESS) -> String.\n\
+Format ADDRESS, an address within the currently selected inferior's\n\
+address space, as a string.  The format of the returned string is\n\
+'ADDRESS <SYMBOL+OFFSET>' without the quotes." },
   {NULL}  /* Sentinel */
 };
 
diff --git a/gdb/testsuite/gdb.python/py-arch.exp b/gdb/testsuite/gdb.python/py-arch.exp
index b55778b0b72..c4854033d8c 100644
--- a/gdb/testsuite/gdb.python/py-arch.exp
+++ b/gdb/testsuite/gdb.python/py-arch.exp
@@ -127,3 +127,18 @@ foreach a $arch_names b $py_arch_names {
     }
 }
 gdb_assert { $lists_match }
+
+# Check the gdb.Architecture.format_address method.
+set main_addr [get_hexadecimal_valueof "&main" "UNKNOWN"]
+gdb_test "python print(\"Got: \" + gdb.selected_inferior().architecture().format_address($main_addr))" \
+    "Got: $main_addr <main>" \
+    "gdb.Architecture.format_address, result should have no offset"
+set next_addr [format 0x%x [expr $main_addr + 1]]
+gdb_test "python print(\"Got: \" + gdb.selected_inferior().architecture().format_address($next_addr))" \
+    "Got: $next_addr <main\\+1>" \
+    "gdb.Architecture.format_address, result should have an offset"
+if {![is_address_zero_readable]} {
+    gdb_test "python print(\"Got: \" + gdb.selected_inferior().architecture().format_address(0))" \
+	"Got: 0x0" \
+	"gdb.Architecture.format_address for address 0"
+}



More information about the Gdb-patches mailing list