[PATCH 2/2] libopcodes: extend the styling within the i386 disassembler

Vladimir Mezentsev vladimir.mezentsev@oracle.com
Fri Apr 29 18:16:40 GMT 2022



On 4/29/22 06:42, Andrew Burgess via Binutils wrote:
> The i386 disassembler is pretty complex.  Most disassembly is done
> indirectly; operands are built into buffers within a struct instr_info
> instance, before finally being printed later in the disassembly
> process.
>
> Sometimes the operand buffers are built in a different order to the
> order in which they will eventually be printed.
>
> Each operand can contain multiple components, e.g. multiple registers,
> immediates, other textual elements (commas, brackets, etc).
>
> When looking for how to apply styling I guess the ideal solution would
> be to move away from the operands being a single string that is built
> up, and instead have each operand be a list of "parts", where each
> part is some text and a style.  Then, when we eventually print the
> operand we would loop over the parts and print each part with the
> correct style.
>
> But it feels like a huge amount of work to move from where we are
> now to that potentially ideal solution.  Plus, the above solution
> would be pretty complex.
>
> So, instead I propose a .... different solution here, one that works
> with the existing infrastructure.
>
> As each operand is built up, piece be piece, we pass through style
> information.  This style information is then encoded into the operand
> buffer (see below for details).  After this the code can continue to
> operate as it does right now in order to manage the set of operand
> buffers.
>
> Then, as each operand is printed we can split the operand buffer into
> chunks at the style marker boundaries, with each chunk being printed
> in the correct style.
>
> For encoding the style information I use the format "~%x~".  As far as
> I can tell the '~' is not otherwise used in the i386 disassembler, so
> this should serve as a unique marker.  To speed up writing and then
> reading the style markers, I take advantage of the fact that there are
> less than 16 styles so I know the '%x' will only ever be a single hex
> character.
>
> In some (not very scientific) benchmarking on my machine,
> disassembling a reasonably large (142M) shared library, I'm not seeing
> any significant slow down in disassembler speed with this change.
>
> Most instructions are now being fully syntax highlighted when I
> disassemble using the --disassembler-color=extended-color option.  I'm
> sure that there are probably still a few corner cases that need fixing
> up, but we can come back to them later I think.
>
> When disassembler syntax highlighting is not being used, then there
> should be no user visible changes after this commit.
> ---
>   opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
>   1 file changed, 332 insertions(+), 239 deletions(-)
>
> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
> index 1e3266329c1..c94d316a03f 100644
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -42,12 +42,14 @@
>   #include <setjmp.h>
>   typedef struct instr_info instr_info;
>   
> +#define STYLE_BUFFER_SIZE 10
> +
>   static int print_insn (bfd_vma, instr_info *);
>   static void dofloat (instr_info *, int);
>   static void OP_ST (instr_info *, int, int);
>   static void OP_STi (instr_info *, int, int);
>   static int putop (instr_info *, const char *, int);
> -static void oappend (instr_info *, const char *);
> +static void oappend (instr_info *, const char *, enum disassembler_style);
>   static void append_seg (instr_info *);
>   static void OP_indirE (instr_info *, int, int);
>   static void print_operand_value (instr_info *, char *, int, bfd_vma);
> @@ -166,6 +168,8 @@ struct instr_info
>     char *obufp;
>     char *mnemonicendp;
>     char scratchbuf[100];
> +  char style_buffer[STYLE_BUFFER_SIZE];

I don't see where  style_buffer is used.
It looks like style_buffer and  STYLE_BUFFER_SIZE are not needed.

> +  char staging_area[100];

  staging_area is used only in i386_dis_printf().
Why this is not a local array inside i386_dis_printf() ?


>     unsigned char *start_codep;
>     unsigned char *insn_codep;
>     unsigned char *codep;
> @@ -248,6 +252,8 @@ struct instr_info
>   
>     enum x86_64_isa isa64;
>   
> +  int (*printf) (instr_info *ins, enum disassembler_style style,
> +		 const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>   };
>   
>   /* Mark parts used in the REX prefix.  When we are testing for
> @@ -9300,9 +9306,73 @@ get_sib (instr_info *ins, int sizeflag)
>   /* Like oappend (below), but S is a string starting with '%'.
>      In Intel syntax, the '%' is elided.  */
>   static void
> -oappend_maybe_intel (instr_info *ins, const char *s)
> +oappend_maybe_intel (instr_info *ins, const char *s,
> +		     enum disassembler_style style)
>   {
> -  oappend (ins, s + ins->intel_syntax);
> +  oappend (ins, s + ins->intel_syntax, style);
> +}
> +
> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
> +   STYLE is the default style to use in the fprintf_styled_func calls,
> +   however, FMT might include embedded style markers (see oappend_style),
> +   these embedded markers are not printed, but instead change the style
> +   used in the next fprintf_styled_func call.
> +
> +   Return non-zero to indicate the print call was a success.  */
> +
> +static int ATTRIBUTE_PRINTF_3
> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
> +		 const char *fmt, ...)
> +{
> +  va_list ap;
> +  enum disassembler_style curr_style = style;
> +  char *start, *curr;
> +
> +  va_start (ap, fmt);
> +  vsnprintf (ins->staging_area, 100, fmt, ap);

Maybe sizeof (ins->staging_area) instead of 100 is better.

As I wrote above,  staging_area  can be declared inside i386_dis_printf.


-Vladimir




More information about the Binutils mailing list