[patch][rfc] Implement Python lazy strings (PR 10705)

Mon Dec 21 20:06:00 GMT 2009

>>>>> "Phil" == Phil Muldoon <pmuldoon@redhat.com> writes:

Phil> I wanted to send this patch out before the holidays for initial
Phil> review.  This patch implements Python lazy strings, and also
Phil> alters the Python pretty-printing process to handle them
Phil> accordingly.

This looks quite nice.

I think you should submit the next revision of this patch upstream.  In
general I think that fixes and modifications to Python code should go
upstream if the underlying functionality is already upstream.

Phil> +If the optional @var{encoding} argument is given, it must be a string
Phil> +naming the encoding of the string in the @code{gdb.LazyString}, such as
Phil> +@code{"ascii"}, @code{"iso-8859-6"} or @code{"utf-8"}.  When a lazy
Phil> +string is printed, the @value{GDBN} codec machinery is used to convert
Phil> +the string during printing.  If @var{encoding} is not given, or if
Phil> +@var{encoding} is an empty string, then @var{encoding} is set to
Phil> +NULL.  When @value{GDBN} attempts to print a @code{gdb.LazyString}
Phil> +where @var{encoding} is set to NULL, @value{GDBN} will automatically
Phil> +select the encoding most suitable for the string type.

I think this paragraph should be reorganized a little.  The "When a lazy
string is printed" sentence should probably be in its own paragraph.

Phil> +If the optional @var{length} argument is given, the string will be
Phil> +fetched and encoded to the given length.

This should mention whether the length is in chars or bytes, and what
happens if no length is given.

Phil> +@defivar LazyString length
Phil> +This attribute holds the length of the string.  This attribute is not
Phil> +writable.

Likewise.

Phil> +@defivar LazyString encoding
Phil> +This attribute holds the encoding that will be applied to the string
Phil> +when the string is printed by @value{GDBN}.  This attribute is not
Phil> +writable.

This should be None if there is no encoding.

Phil> +@defivar LazyString type
Phil> +This attribute holds the type that is associated with the string.

This should mention if it is the string's type or the underlying
character type.

Phil> +typedef struct {
Phil> +  PyObject_HEAD
Phil> +  CORE_ADDR address;
Phil> +  char *encoding;
Phil> +  long length;
Phil> +  struct type *type;
Phil> +} lazy_string_object;

I try to put comments on each local field (not HEAD) indicating the
meaning of the field.  This could mirror the docs, more or less.

Phil> +static PyObject *
Phil> +stpy_get_encoding (PyObject *self, void *closure)
Phil> +{
Phil> +  lazy_string_object *self_string = (lazy_string_object *) self;
Phil> +
Phil> +  /* An encoding can be set to NULL by the user, so check before
Phil> +     attempting a Python FromString call.  */
Phil> +  if (self_string->encoding)
Phil> +    return PyString_FromString (self_string->encoding);
Phil> +
Phil> +  return NULL;

I think this should return None if there is no encoding.

Phil> +PyObject *
Phil> +stpy_get_type (PyObject *self, void *closure)
Phil> +{
Phil> +  lazy_string_object *obj = (lazy_string_object *) self;
Phil> +  PyObject *type;
Phil> +
Phil> +  if (obj->type)
Phil> +    type = type_to_type_object (obj->type);
Phil> +  else
Phil> +    return NULL;

I think the lack of a type should be detected and rejected when the
object is constructed.

BTW it isn't generally valid to just return NULL in a Python method.
You must set the error first.

Phil> +PyObject *
Phil> +create_lazy_string_object (CORE_ADDR address, long length,
Phil> +			   const char *encoding, struct type *type)

I think we usually use some py_ name for exported APIs.

Phil> +  str_obj = PyObject_New (lazy_string_object, &lazy_string_object_type);
Phil> +  if (!str_obj)
Phil> +    {
Phil> +      PyErr_SetString (PyExc_MemoryError,
Phil> +		       "Could not allocate lazy string object.");
Phil> +      return NULL;
Phil> +    }

I think if PyObject_New returns NULL, then it has probably already set
the error, and you can simply return NULL without setting it again.

Phil> +  if (address == 0)
Phil> +    {
Phil> +      PyErr_SetString (PyExc_MemoryError,
Phil> +		       "Cannot create a lazy string from a GDB-side string.");
Phil> +      return NULL;
Phil> +    }

Do this check first, before making a new object.
Otherwise, this leaks the new object.

Phil> +  str_obj->address = address;
Phil> +  str_obj->length = length;
Phil> +  if (encoding == NULL || strcmp (encoding, ""))

You probably meant !strcmp here.

Phil> +  type_incref (type);

Upstream doesn't have this stuff FWIW.
Hopefully we'll fix it up next year...

Phil> +  val = value_at_lazy (self_string->type, self_string->address);

I guess this means the lazy string holds the string type, not the
character type.

Phil> +  xfree (((lazy_string_object *) self)->encoding);
Phil> +  type_decref (((lazy_string_object *) self)->type);

Introduce a temporary to avoid duplicating the ugly cast.

Phil> +/* Determine whether the printer object pointed to by OBJ is a
Phil> +   Python lazy string.  */
Phil> +int is_lazy_string (PyObject *result)

Newline after "int".

Probably should have a py_ name for this and for extract_lazy_string.

Phil> +gdb_byte *
Phil> +extract_lazy_string (PyObject *string, struct type
Phil> +		     **str_type, long *length,

Don't line break before "**str_type", that looks weird.

Phil> +  errcode = read_string (value_address (output), *length, width,
Phil> +			 *length, byte_order, &buffer,
Phil> +			 &bytes_read);

This seems fishy.  Doesn't this mean we end up reading the full contents
of the string?

I looked a little and it seems this is what gdb already does.
I guess I am confused about something here.

I suppose this does solve the decoding problem nicely, just not the full
laziness problem.

Also, note that read_string will allocate 'buffer' even on error (not on
exception though).  So this should free buffer if read_string returns an
error.

And, read_string can throw, so you need a try-catch here.

Phil> +	  if (is_lazy_string (py_v))
Phil> +	    {
Phil> +	      output = extract_lazy_string (py_v, &type, &length, &encoding);
Phil> +	      xfree (encoding);
Phil> +	    }
[...]
Phil> +	      fputs_filtered (output, stream);

It isn't ok to print bytes from the target like this.
They have to be converted to the host encoding.

Phil> +static PyObject *
Phil> +valpy_lazy_string (PyObject *self, PyObject *args, PyObject *kw)
[...]
Phil> +  str_obj = create_lazy_string_object (value_address (value), length,
Phil> +				       user_encoding, value_type (value));
Phil> +  if (str_obj)
Phil> +    Py_INCREF (str_obj);

I think create_lazy_string_object returns a new reference.
That means this incref is wrong.

Tom