[PATCH 4/4] gdb/python: handle non utf-8 characters when source highlighting
Andrew Burgess
aburgess@redhat.com
Tue Jan 11 13:10:45 GMT 2022
* Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> [2022-01-10 10:32:02 -0500]:
> > Unfortunately it's not as simple as bytes in bytes out. See:
> >
> > https://pygments.org/docs/unicode/?highlight=encoding
> >
> > In summary, Pygments uses unicode internally, but has some logic for
> > guessing the encoding of the incoming bytes. This logic is better (I
> > claim) than GDB's hard-coded use UTF-8. The link above outlines how
> > the guess is done in more detail.
> >
> > Pygments always returns a unicode object, which is one of the reasons
> > I have GDB handle both bytes and unicode being returned from the
> > colorize API. We could always make the API for restricted, and insist
> > on a bytes object being returned, this would just require us to
> > convert the output of Pygments to bytes before returning to GDB.
>
> Ok, so when does "colorize" returns bytes?
(1) Python 2 (for now), and
(2) Never, unless a user overrides gdb.colorize.
Thanks,
Andrew
More information about the Gdb-patches
mailing list