[PATCH 4/4] gdb/python: handle non utf-8 characters when source highlighting

Andrew Burgess aburgess@redhat.com
Tue Jan 11 13:10:45 GMT 2022


* Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> [2022-01-10 10:32:02 -0500]:

> > Unfortunately it's not as simple as bytes in bytes out.  See:
> > 
> >   https://pygments.org/docs/unicode/?highlight=encoding
> > 
> > In summary, Pygments uses unicode internally, but has some logic for
> > guessing the encoding of the incoming bytes.  This logic is better (I
> > claim) than GDB's hard-coded use UTF-8.  The link above outlines how
> > the guess is done in more detail.
> > 
> > Pygments always returns a unicode object, which is one of the reasons
> > I have GDB handle both bytes and unicode being returned from the
> > colorize API.  We could always make the API for restricted, and insist
> > on a bytes object being returned, this would just require us to
> > convert the output of Pygments to bytes before returning to GDB.
> 
> Ok, so when does "colorize" returns bytes?

  (1) Python 2 (for now), and
  (2) Never, unless a user overrides gdb.colorize.

Thanks,
Andrew



More information about the Gdb-patches mailing list