[RFC - Python Scripting] New method gdb.Architecture.disassemble

Doug Evans dje@google.com
Wed Feb 13 20:42:00 GMT 2013


Siva Chandra writes:
 > Addressed all off Eli's and Tom's comments in the attached patch.
 > 
 > Thanks a lot Tom, for your detailed explanations.
 > 
 > 2013-02-13  Siva Chandra Reddy  <sivachandra@google.com>
 > 
 >         Add a new method 'disassemble' to gdb.Architecture class.
 >         * python/py-arch.c (archpy_disassmble): Implementation of the
 >         new method gdb.Architecture.disassemble.
 >         (arch_object_methods): Add entry for the new method.
 > 
 > doc/
 > 
 >         * gdb.texinfo (Architectures In Python): Add description about
 >         the new method gdb.Architecture.disassemble.
 > 
 > testsuite/
 > 
 >         * gdb.python/py-arch.c: New test case
 >         * gdb.python/py-arch.exp: New tests to test
 >         gdb.Architecture.disassemble
 >         * gdb.python/Makefile.in: Add py-arch to the list of
 >         EXECUTABLES.
 > diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
 > index e3f336e..8436781 100644
 > --- a/gdb/doc/gdb.texinfo
 > +++ b/gdb/doc/gdb.texinfo
 > @@ -26040,6 +26040,37 @@ A @code{gdb.Architecture} class has the following methods:
 >  Return the name (string value) of the architecture.
 >  @end defun
 >  
 > +@defun Architecture.disassemble (@var{low}, @var{high})
 > +Return a list of disassembled instructions whose start address falls in
 > +the closed memory address interval from @var{low} to @var{high}.  Each

Reading beyond high seems counterintuitive, but I can understand why it's easiest if it works that way.

Also, I can imagine a user wanting to specify a count instead of high.
How about supporting both "high" and "count", with the default being high is unspecified and count=1?

 > +element of the list is a Python @code{dict} with the following string
 > +keys:
 > +
 > +@table @code
 > +
 > +@item addr
 > +The value corresponding to this key is a Python long integer capturing
 > +the memory address of the instruction.
 > +
 > +@item asm
 > +The value corresponding to this key is a string value which represents
 > +the instruction with assembly language mnemonics.
 > +
 > +@item func
 > +The value corresponding to this key is the name of the function (string
 > +value) the instruction belongs to.
 > +
 > +@item length
 > +The value correspoding to this key is the length (integer value) of the
 > +instruction in bytes.
 > +
 > +@item offset
 > +The value corresponding to this key is the byte offset (integer value)
 > +of the instruction within the function it belongs to.
 > +
 > +@end table
 > +@end defun
 > +

Including the function name and offset in the output feels wrong.
It's trying to make the API call do too much.

Plus, the documentation needs to specify what "flavor" is used (e.g. intel vs att).
I'd just add a line of texzt saying the flavor (or choose a better word) is the one specified by the CLI variable disassembly-flavor,
and leave for another day adding an input parameter to specify the flavor.

 >  @node Python Auto-loading
 >  @subsection Python Auto-loading
 >  @cindex Python auto-loading
 > diff --git a/gdb/python/py-arch.c b/gdb/python/py-arch.c
 > index edd508f..1da7b67 100644
 > --- a/gdb/python/py-arch.c
 > +++ b/gdb/python/py-arch.c
 > @@ -20,6 +20,7 @@
 >  #include "defs.h"
 >  #include "gdbarch.h"
 >  #include "arch-utils.h"
 > +#include "disasm.h"
 >  #include "python-internal.h"
 >  
 >  typedef struct arch_object_type_object {
 > @@ -86,6 +87,102 @@ archpy_name (PyObject *self, PyObject *args)
 >    return py_name;
 >  }
 >  
 > +/* Implementation of gdb.Architecture.disassemble (self, low, high) -> List.
 > +   Returns a list of instructions in a memory address range.  Each instruction
 > +   in the list is a Python dict object.
 > +*/
 > +
 > +static PyObject *
 > +archpy_disassemble (PyObject *self, PyObject *args, PyObject *kw)
 > +{
 > +  static char *keywords[] = { "low", "high", NULL };
 > +  CORE_ADDR low, high;
 > +  CORE_ADDR pc;
 > +  PyObject *result_list;
 > +  struct gdbarch *gdbarch = arch_object_to_gdbarch (self);
 > +
 > +  if (!PyArg_ParseTupleAndKeywords (args, kw, GDB_PY_LLU_ARG GDB_PY_LLU_ARG,
 > +                                    keywords, &low, &high))
 > +    return NULL;
 > +
 > +  result_list = PyList_New (0);
 > +  if (result_list == NULL)
 > +    return NULL;
 > +
 > +  for (pc = low; pc <= high;)
 > +    {
 > +      int line = -1, unmapped, offset = -1, insn_len = 0;
 > +      char *filename = NULL, *fn = NULL, *as = NULL;
 > +      struct ui_file *memfile = mem_fileopen ();
 > +      PyObject *insn_dict = PyDict_New ();
 > +      volatile struct gdb_exception except;
 > +
 > +      if (insn_dict == NULL)
 > +        {
 > +          Py_DECREF (result_list);
 > +          ui_file_delete (memfile);
 > +
 > +          return NULL;
 > +        }
 > +      if (PyList_Append (result_list, insn_dict))
 > +        {
 > +          Py_DECREF (result_list);
 > +          Py_DECREF (insn_dict);
 > +          ui_file_delete (memfile);
 > +
 > +          return NULL;  /* PyList_Append Sets the exception.  */
 > +        }
 > +
 > +      TRY_CATCH (except, RETURN_MASK_ALL)
 > +        {
 > +          insn_len = gdb_print_insn (gdbarch, pc, memfile, NULL);
 > +          /* Even though filename, line and unmapped are passed as arguments,
 > +             they do not give us any meaningful values currently.  */
 > +          build_address_symbolic (gdbarch, pc, 0, &fn, &offset, &filename,
 > +                                  &line, &unmapped);

Why call build_address_symbol here?
To me it's trying to make this API call do too much.

I'm not against the functionality, far from it.
I'd say if it's useful, provide another API call to get it:
the user may have a pc and want this info, and not want to have to
call disassemble to get it.
What to return for offset for discontiguous functions is an open question.
If there's no real need for offset at the moment, I'd say let's punt
and just provide a routine to return the function for any given pc
(assuming it's useful).
Another open question is what to return for inlined functions,
I can imagine both answers (inline function itself and its caller) being useful.

btw, what about filename,line isn't useful?
It's easy enough to imagine the user wanting to obtain file+line for any given pc
(and in fact there is gdbpy_find_pc_line).

So I can imagine 3 API routines here:
- disassemble (pc)
- get_function_and_offset (pc) [or some such]
- get_file_and_line (pc) [assuming gdbpy_find_pc_line doesn't always do what one wants]

 > +        }
 > +      if (except.reason < 0)
 > +        {
 > +          Py_DECREF (result_list);
 > +          ui_file_delete (memfile);
 > +          xfree (fn);
 > +          xfree (filename);
 > +
 > +          return gdbpy_convert_exception (except);
 > +        }
 > +
 > +      as = ui_file_xstrdup (memfile, NULL);
 > +      if (PyDict_SetItemString (insn_dict, "addr",
 > +                                gdb_py_long_from_ulongest (pc))
 > +          || PyDict_SetItemString (insn_dict, "asm",
 > +                                   PyString_FromString (as ? as : "<unknown>"))
 > +          || PyDict_SetItemString (insn_dict, "func",
 > +                                   PyString_FromString (fn ? fn : "<unknown>"))
 > +          || PyDict_SetItemString (insn_dict, "length",
 > +                                   PyInt_FromLong (insn_len))
 > +          || PyDict_SetItemString (insn_dict, "offset",
 > +                                   PyInt_FromLong (offset)))
 > +        {
 > +          Py_DECREF (result_list);
 > +
 > +          ui_file_delete (memfile);
 > +          xfree (as);
 > +          xfree (fn);
 > +          xfree (filename);
 > +
 > +          return NULL;
 > +        }
 > +
 > +      pc += insn_len;
 > +      ui_file_delete (memfile);
 > +      xfree (as);
 > +      xfree (fn);
 > +      xfree (filename);
 > +    }
 > +
 > +  return result_list;
 > +}
 > +
 >  /* Initializes the Architecture class in the gdb module.  */
 >  
 >  void
 > @@ -105,6 +202,10 @@ static PyMethodDef arch_object_methods [] = {
 >    { "name", archpy_name, METH_NOARGS,
 >      "name () -> String.\n\
 >  Return the name of the architecture as a string value." },
 > +  { "disassemble", (PyCFunction) archpy_disassemble,
 > +    METH_VARARGS | METH_KEYWORDS,
 > +    "disassemble (low, high) -> List.\n\
 > +Return the list of disassembled instructions from LOW to HIGH." },
 >    {NULL}  /* Sentinel */
 >  };



More information about the Gdb-patches mailing list