[RFC] libopcodes based disassembler syntax highlighting
Andrew Burgess
aburgess@redhat.com
Mon Dec 6 14:28:25 GMT 2021
* Jan Beulich via Binutils <binutils@sourceware.org> [2021-12-06 10:02:21 +0100]:
> On 03.12.2021 17:03, Andrew Burgess via Binutils wrote:
> > ***** WARNING *****
> >
> > This commit is not in anyway supposed to be merge ready code. This
> > commit is being shared just to get feedback on the acceptability of
> > the general approach. If the feedback is positive I'll completely
> > rework this commit and repost a more polished set of commits.
> >
> > ***** END OF WARNING *****
> >
> > This commit is a very rough, first draft, of adding disassembly syntax
> > highlighting via libopcodes.
> >
> > The idea is to add a new function to the disassemble_info structure
> > for performing styled disassembler output. There's a new enum defined
> > which lists the various stylistic parts of the disassembler output,
> > some that I currently have are 'text' (used as the default for any
> > unstyled text), then we have 'mnemonic', 'register', 'immediate', and
> > 'symbol' for all the different parts of a disassembled instruction.
> >
> > We then update the users of the disassembler, objdump and gdb, to
> > provide both the old fprintf_func and the new fprintf_styled_func.
> >
> > Now we can update the disassemblers, one at a time, to switch over to
> > using fprintf_styled_func. This switch over can be done target at a
> > time, and I'm certainly not promising to convert all targets.
> >
> > As an example, in this commit, I've converted the risc-v disassembler,
> > which is pretty easy to change, and the x86 disassembler which is much
> > harder. In fact, as it currently stands, I would not be happy to
> > upstream the x86 disassembler parts, I've had to hack in a function
> > that applies the styling by analysing the text of the instruction
> > operands, which isn't really the idea at all. That said, it works
> > well enough, and I figured most folk will want to see what the results
> > look like on x86.
> >
> > I've also updated GDB to handle this new scheme, so we get syntax
> > highlighting there too for x86 and risc-v.
> >
> > Right now the styling is always on in objdump, which is probably not
> > what we'd want, so that would need to change.
>
> Personally I find such coloring more distracting than helpful. Hence yes,
> I would much appreciate if this mode wouldn't become the default (let
> alone being enabled unconditionally). I also think that in assembly
> dialects like x86'es AT&T syntax, register and immediate prefixes are
> already sufficient to identify what is what.
Thank you for your feedback. It doesn't sound like you'd object to
the change though, so long as you didn't have to see its output :)
>
> > @@ -49,6 +47,18 @@ enum dis_insn_type
> > dis_dref2 /* Two data references in instruction. */
> > };
> >
> > +enum disassembler_style
> > +{
> > + dis_style_text,
> > + dis_style_mnemonic,
> > + dis_style_register,
> > + dis_style_immediate,
> > + dis_style_symbol,
> > +};
>
> With x86 not using dis_style_symbol at all, what is this supposed to
> cover? How would an addend count? As part of the symbol, as an immediate,
> or as text?
But it is, just indirectly. The x86 disassembler calls
disassemble_info::print_address_func, which can end up in
objdump_print_addr_with_sym, which makes use of dis_style_symbol.
My intention was that `foo` would be a symbol, `0x100` would be an
immediate.
>
> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> > @@ -9445,6 +9445,38 @@ get_sib (disassemble_info *info, int sizeflag)
> > }
> > }
> >
> > +/* This is a total hack for now, just to get something working on x86-64.
> > + The arguments are currently built (in some cases at least) into a single
> > + string, and then printed. If we want to support styled printing then
> > + this would need to change, but that doesn't feel like an impossible
> > + task.
> > +
> > + And it doesn't have to be all done in one go, we can always just fall
> > + back to printing some text with the "default" styling (i.e. no styling)
> > + if a particular bit of the disassembler is too hard to update in the
> > + short term. */
> > +
> > +static void
> > +print_operand_with_styling (disassemble_info *info, const char *txt)
> > +{
> > + const char *tmp;
> > + enum disassembler_style style = dis_style_text;
> > +
> > + for (tmp = txt; *tmp != '\0'; ++tmp)
> > + {
> > + if (*tmp == '*' || *tmp == '(' || *tmp == ')' || *tmp == ',')
> > + style = dis_style_text;
> > + else if (tmp[0] == '0' && tmp[1] == 'x')
> > + style = dis_style_immediate;
> > + else if (tmp[0] == '%')
> > + style = dis_style_register;
> > + else if (style == dis_style_text && ISDIGIT(tmp[0]))
> > + style = dis_style_immediate;
> > +
> > + (*info->fprintf_styled_func) (info->stream, style, "%c", tmp[0]);
> > + }
> > +}
>
> While you say it's a hack and you don't want to upstream it in this
> shape, I'd still like to point out (see above) that in AT&T syntax
> immediates can be recognized from a leading $. There's an ambiguity
> of course with symbols whose names start with $. (And obviously
> there's a similar ambiguity between registers and symbols whose
> names start with %.) Like you do for %, I think that leading $ ought
> to be part of what gets emitted with dis_style_immediate.
Thank you for this useful information. I'll try to keep this in mind
if/when I ever prepare this code for a real upstream submission.
>
> Also, as to the distinction between immediate and symbol: How would
> an immediate referencing a symbol count?
I think this was covered above, numbers would be immediates, names
would be symbols.
Thanks,
Andrew
More information about the Gdb-patches
mailing list