[PATCHv4 3/5] gdb/python: implement the print_insn extension language hook

Wed May 25 10:37:06 GMT 2022

Andrew Burgess <aburgess@redhat.com> writes:

> Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:
>
>> Sorry for the late reply, I'm a bit busy with work and personal life.
>> I gave some answers below.  I don't think I'll have time to do a
>> thorough review of your v5.  But I already raised any concerns I had,
>> and I trust you for the rest.
>
> Understood.  Thanks for all the time you've already put into reviewing
> this series.
>
>>
>>>> Doesn't really matter, but this could probably modify the string in
>>>> the existing DisassemblerResult object, and then return it:
>>>>
>>>>   result.string += "\t## Comment"
>>>>   return result
>>>>
>>>> But I'm fine with what you have, if you think it's clearer for an
>>>> example.
>>> 
>>> The problem is all the properties of DisassemblerResult are read-only.
>>> Given it's pretty light-weight I didn't really see any problem just
>>> creating a new one.
>>> 
>>> I suspect that I might end up changing that in the future, but for now I
>>> don't see any great need to allow for modifications right now.  I figure
>>> extending the API to allow modifications in the future is fine if/when
>>> that becomes critical.
>>> 
>>> Let me know if that's going to be a problem and I can get the setting
>>> code added now.
>>
>> Ack, this is reasonable.
>>
>>>>> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
>>>>> +{
>>>>> +  PyObject *info_obj, *memory_source_obj = nullptr;
>>>>> +  static const char *keywords[] = { "info", "memory_source", nullptr };
>>>>> +  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
>>>>> +					&disasm_info_object_type, &info_obj,
>>>>> +					&memory_source_obj))
>>>>
>>>> I'm wondering, why is there a separate memory_source parameter when info
>>>> already has a read_memory method, that could potentially be overriden by
>>>> the user?
>>> 
>>> Here's how the API would be used right now:
>>> 
>>>   class MyDisassembler(Disassembler):
>>>     def __init__(self, name):
>>>       super().__init__(name)
>>>       self.__info = None
>>> 
>>>     def __call__(self, info):
>>>       self.__info = info
>>>       result = gdb.disassembler.builtin_disassemble(info, self)
>>> 
>>>     def read_memory(self, length, offset):
>>>       return self.__info.read_memory(length, offset)
>>> 
>>> This is obviosly pretty pointless, the memory source just calls the
>>> standard read_memory routine so you'll get the same behaviour as if no
>>> memory source was passed at all, but it shows how the API works.
>>> 
>>> If we wanted to override the DisassembleInfo.read_memory routine we'd
>>> do something like this:
>>> 
>>>   class MyInfo(DisassembleInfo):
>>>     def __init__(self,old_info):
>>>       super().__init__(old_info)
>>> 
>>>     def read_memory(self, length, offset):
>>>       return super().read_memory(length, offset)
>>> 
>>>   class MyDisassembler(Disassembler):
>>>     def __init__(self, name):
>>>       super().__init__(name)
>>> 
>>>     def __call__(self, info):
>>>       wrapped_info = MyInfo(info)
>>>       result = gdb.disassembler.builtin_disassemble(wrapped_info)
>>> 
>>> What are your thoughts on that?  I think that would be pretty easy to
>>> implement if you feel its an improvement.
>>
>> It's been a bit too long since I've looked at this patch so I've lost
>> context.  I only remember thinking that instead of passing a
>> memory_source, you could simply pass an info with a special read_memory
>> implementation that reads from whatever you want.  So I think
>> essentially what you did in that last example.  The only remember I
>> would mention this is that it seems better to me to not have two ways of
>> doing the same thing, as it's sometimes ambiguous what happens when they
>> are both used, how they interact together.
>
> I think what happened is when I started writing the above I was planning
> to argue that my "memory_source" approach was best, but as I sketched
> out the sub-classing DisassembleInfo approach I realised that your idea
> probably was better.
>
> So I updated my patch, and then failed to reword the above text.
>
> I think we're all in alignment now, the way to intercept memory writes
> is to sub-class DisassembleInfo (like the second example above), so I
> think this issue is resolved.
>
>>
>>>> Is there a use case for other error kinds?  For example, if the
>>>> disassembler finds that the machine code does not encode a valid
>>>> instruction, what should it do?
>>>>
>>>> Maybe this is more a question for the gdb.disassembler.Disassembler
>>>> implementation side of the API.
>>> 
>>> No!  Like the comment says, everything should disassemble to something,
>>> even if it's just ".byte xxxx".  The libopcodes disassembler only
>>> supports reporting one type of error, that's memory error.  It's just
>>> unfortunate libopcodes also uses the return value to indicate that an
>>> error occurred, and in some cases disassemblers return -1 (to indicate
>>> error) without setting a memory error.  In these cases the disassembler
>>> has probably written to the output stream what went wrong, but this
>>> really is not how libopcodes is supposed to work.
>>> 
>>> So, no.  We either have a memory error, or an unknown error.  Ideally,
>>> given enough time, libopcodes will be fixed so that we _only_ ever emit
>>> memory errors.
>>
>> Ok.  Well, for an API that allows users to arbitrarily extend
>> functionality, it makes sense to always allow returning a generic error
>> of some kind.  Perhaps libopcodes can't fail, but perhaps (a bit
>> stretched example to illustrate) my disassembler makes a GET request to
>> an HTTP server to do the actual disassembling.  So it seems good to me
>> to always have a "something went wrong" escape hatch.
>
> I'll take another pass as this aspect and check how throwing arbitrary
> errors from Python code is presented to the user.

I've reworked how exceptions are handled through the whole Python
disassembler process now.  If we consider a disassembler like this:

  class ExampleDisassembler(gdb.disassembler.Disassembler):

      class InfoWrapper(gdb.disassembler.DisassembleInfo):
          def __init__(self, info):
              super().__init__(info)

          def read_memory(self, length, offset):
              buffer = super().read_memory(length, offset)
              return buffer

      def __init__(self):
          super().__init__("ExampleDisassembler")

      def __call__(self, info):
          info = self.InfoWrapper(info)
          result = gdb.disassembler.builtin_disassemble(info)
          return result

Then this will result in a call stack like this:

  gdbpy_print_insn (py-disasm.c)
    ExampleDisassembler.__call__ (user's Python code)
      disasmpy_builtin_disassemble (py-disasm.c)
        InfoWrapper.read_memory (user's Python code)

We can split the exceptions into 3 types:

  1. gdb.MemoryError, if this is raised in `read_memory` then the bultin
  disassembler might mask this exception, or it might choose to re-raise
  a gdb.MemoryError of its own.  If the MemoryError is raised then this
  will be seen (and can be caught) in the ExampleDisassembler.__call__.
  If the MemoryError is not caught then gdbpy_print_insn will handle it,
  the output looks like:

  (gdb) disassemble 0x0000000000401194,0x0000000000401198
  Dump of assembler code from 0x401194 to 0x401198:
     0x0000000000401194 <main+0>:
  Cannot access memory at address 0x401194

  2. gdb.GdbError, if an instance of this exception is raised in
  `read_memory` then it will propagate back and be catchable in
  ExampleDisassembler.__call__.  If the exception is not handled there
  then it will be caught and handled by gdbpy_print_insn.

  In keeping with how these exceptions are described in the `Exception
  Handling` section of the manual, these exceptions are not treated as
  errors in the Python code, but a mechanism for the user to report
  errors to the user.  If one of these makes it to gdbpy_print_insn,
  this is what the output looks like:

  (gdb) disassemble 0x0000000000401194,0x0000000000401198
  Dump of assembler code from 0x401194 to 0x401198:
     0x0000000000401194 <main+0>: this is a GdbError
  unknown disassembler error (error = -1)

  The whole "unknown disassembler error" is a consequence of the core
  GDB disassembler only having a mechanism to handle memory errors.
  Fixing this completely is not trivial though as I think a perfect
  solution would require updates to libopcodes.  Still, we could
  potentially present this error to the user in a slightly better way,
  but I'd like to defer changes in that area until later - it doesn't
  impact the Python API at all, so we can polish that part later with no
  backwards compatibility worries I think.

  3. Any other exception type.  If any other exception type is raised
  from `read_memory` then, as with GdbError, the exception is propagated
  back, and can be caught in ExampleDisassembler.__call__.  If the
  exception is not caught there then it is handled in gdbpy_print_insn,
  the output looks like this:

  (gdb) disassemble 0x0000000000401194,0x0000000000401198
  Dump of assembler code from 0x401194 to 0x401198:
     0x0000000000401194 <main+0>: Python Exception <class 'RuntimeError'>: this is a RuntimeError

  unknown disassembler error (error = -1)

  Other exception types are handled as errors in the user code, and so
  its possible to get a stack-trace using 'set python print-stack full'.

Other significant changes I've made:

  * After gdbpy_print_insn has called ExampleDisassembler.__call__, any
    errors found after this point will be reported as errors and stop
    the disassembler.  Previously, the error would be reported, but the
    default disassembler would then be used.  While working on the
    latest changes I decided that behaviour was not helpful, so removed
    it.

  * There are a few places where the libopcodes disassembler will return
    -1 (indicating an error), without first calling the memory_error
    callback.  As far as I'm concerned, this is a bug in libopcodes.  An
    example of this can be seen by looking for the string '64-bit address
    is disabled' in i386-dis.c.

    Previously, these cases were converted to gdb.MemoryError
    exceptions, however, this meant that there was a difference in
    behaviour between having no Python disassemblers, and a "default",
    pass-through disassembler written in Python.

    I now convert these cases into gdb.GdbError exceptions, with the
    payload of the exception being any text the disassembler has emitted
    so far.

    Combined with the updated handling of GdbError (described above),
    this means that a "default", pass-through disassembler, written in
    Python, gives the exact same output as GDB when no Python
    disassemblers are being used.

I'll follow up shortly with an updated version of this series that
includes all the fixes I describe above.  I know you said you're likely
too busy to review any further versions of this series, which is fine,
I'll give the new version a little time for others to look at, and then
I'll go ahead and merge it as I think we probably align on most of this
stuff now.

Thanks,
Andrew